type and kind for custom dtypes

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

type and kind for custom dtypes

Alex Samuel
Hi,

I'm working on building a number of related custom dtypes, and I'm not sure how to set the type and kind fields in PyArray_Descr.  I tried using type='V' and choosing a single unused kind for all my dtypes; this mostly worked, except I found that coercions would sometimes treat values of two different dtypes as if they were the same.  But not always... sometimes my registered cast functions would be called.

Through trial and error, I've found that if I choose an unused type code for each dtype, coercion seems to work as I expect it to (no coercion unless I've provided a cast).  The kind doesn't seem to matter.

I couldn't find any guidance in the docs for how to choose these values.  Apologies if I've overlooked something.  Could someone please advise me?

More widely, is there some global registry of these codes?  Is the number of NumPy dtypes limited to the number of (UTF-8-encodable) chars?  It seems like common practice to use dtype.kind in user code.  If I use one or more for my custom dtypes, is there any mechanism to ensure they do not collide with others'?  Are there any other semantics for either field I should take into account?

Thanks,
Alex



_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: type and kind for custom dtypes

Alex Samuel
On May 5, 2019, at 10:58, Alex Samuel <[hidden email]> wrote:

Through trial and error, I've found that if I choose an unused type code for each dtype, coercion seems to work as I expect it to (no coercion unless I've provided a cast).  The kind doesn't seem to matter.

Apologies, a correction: I mixed up kind and type above.  I meant that I've found I need to choose distinct kinds for the coercion rules to treat my dtypes as distinct, rather than the type.


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: type and kind for custom dtypes

Sebastian Berg
Hi Alex,

On Sun, 2019-05-05 at 11:03 -0400, Alex Samuel wrote:

> > On May 5, 2019, at 10:58, Alex Samuel <[hidden email]> wrote:
> >
> > Through trial and error, I've found that if I choose an unused type
> > code for each dtype, coercion seems to work as I expect it to (no
> > coercion unless I've provided a cast).  The kind doesn't seem to
> > matter.
>
> Apologies, a correction: I mixed up kind and type above.  I meant
> that I've found I need to choose distinct kinds for the coercion
> rules to treat my dtypes as distinct, rather than the type.
>
It is cool to here about interest in custom dtypes.

Numpy has the concept of "same-kind" casting, which may be what bites
you here? So you have unsafe casting, but because you pick the same
"kind" numpy thinks it is OK to do it in ufuncs? There may also be
issues surrounding 0-D arrays casting differently.

I honestly do not think there is any way to ensure you do not collide
with other kinds right now, but will check more closely tomorrow. I am
currently not even quite sure how the type code really interacts when
we have usertypes, and a bit surprised about what you describe.

We are now starting the progress of trying to improve the situation
with creating custom dtypes.
There will actually be discussions about this end of next week (in
Berkeley). But in any case I would be very interested in your specific
use-case and needs, and hopefully we can help you also on your end with
the current situation. We can discuss on the list, or get in contact
privately.

Best Regards,

Sebastian


> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: type and kind for custom dtypes

Sebastian Berg
In reply to this post by Alex Samuel
OK, I looked into the code, so here is a small followup.


On Sun, 2019-05-05 at 10:58 -0400, Alex Samuel wrote:
> Hi,
>
> I'm working on building a number of related custom dtypes, and I'm
> not sure how to set the type and kind fields in PyArray_Descr.  I
> tried using type='V' and choosing a single unused kind for all my
> dtypes; this mostly worked, except I found that coercions would
> sometimes treat values of two different dtypes as if they were the
> same.  But not always... sometimes my registered cast functions would
> be called.

The reason is that when the "kind" and "itemsize" and "byte order" are
identical, the numpy code decides that data types can be cast (because
they are equivalent). So basically, the "kind" must not be equal unless
the "type"/dtype only differs in precision or similar.

(The relevant code is in multiarraymodule.c in PyArray_EquivTypes)

>
> Through trial and error, I've found that if I choose an unused type
> code for each dtype, coercion seems to work as I expect it to (no
> coercion unless I've provided a cast).  The kind doesn't seem to
> matter.
>
> I couldn't find any guidance in the docs for how to choose these
> values.  Apologies if I've overlooked something.  Could someone
> please advise me?
>
Frankly, I do not think there is any, because nobody ever created many
types (there is only quaternions and rationals publicly available).

> More widely, is there some global registry of these codes?  Is the
> number of NumPy dtypes limited to the number of (UTF-8-encodable)
> chars?  It seems like common practice to use dtype.kind in user code.
>  If I use one or more for my custom dtypes, is there any mechanism to
> ensure they do not collide with others'?  Are there any other
> semantics for either field I should take into account?

I have checked the code, and no, there appears to be no such thing
currently. I suppose (on the C-side) you could find all types, by using
their type number and then asking them.

dtype.kind is indeed used a lot, mostly to decide that a type is e.g.
an integer. My best guess right now is that the rule you saw above is
the only thing you have to take into account.

Best,

Sebastian


>
> Thanks,
> Alex
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: type and kind for custom dtypes

Alex Samuel
Thanks very much for looking into this!

> The reason is that when the "kind" and "itemsize" and "byte order" are
> identical, the numpy code decides that data types can be cast (because
> they are equivalent). So basically, the "kind" must not be equal unless
> the "type"/dtype only differs in precision or similar.
>
> (The relevant code is in multiarraymodule.c in PyArray_EquivTypes)

That makes sense, and explains why the cast-less coercion takes place for some type pairs and not for others.

> Frankly, I do not think there is any, because nobody ever created many
> types (there is only quaternions and rationals publicly available).

OK.  I'm a bit surprised to hear this, as the API for adding dtypes is actually rather straightforward!

For now, then, I will stick with my current scheme of assigning successive kind values to my dtypes, and hope for the best when running with other extension dtypes (which, it seems, may be unlikely).


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: type and kind for custom dtypes

Alex Samuel
In reply to this post by Sebastian Berg
We are now starting the progress of trying to improve the situation
with creating custom dtypes.
There will actually be discussions about this end of next week (in
Berkeley). But in any case I would be very interested in your specific
use-case and needs, and hopefully we can help you also on your end with
the current situation. We can discuss on the list, or get in contact
privately.

Unfortunately, I'm in NYC, but I'd be happy to participate however I can, whether it is to describe my use case, or help writing docs, or just chat.  


Here's some info about my project:

Ora (https://github.com/alexhsamuel/ora/) is a new date/time implementation.  The intention is to provide types with ticks-since-epoch representation (rather than YMD, HMS) with full functionality for both standalone scalar (i.e. no NumPy) and ndarray use cases.  Essentially, the convenience of datetime, with the performance of datetime64, and much of dateutil rolled in.

I've also experimented with a number of other matters, including variable width/precision/range types.  As a result I provide various time, date, and time-of-day types, for instance 32-, 64-, and 128-bit time types, and each has a corresponding dtype and complete NumPy support.  It's possible to adjust this set of types, if you are willing to recompile (C++).  That's why I'm interested in how dtypes are managed globally.

Ora has a lot of functionality that works well, and performance is good, though it's so far a solo project and there are still lots of rough edges / missing features / bugs.  I'd love to get feedback from people who work with dates and times a lot, either scalar or vectorized.

My wish list for NumPy's dtype support is,
- better docs on writing dtypes (though they are not bad)
- ability to use a scalar type that doesn't derive from a NumPy base type, so that the scalar type can be used without importing NumPy
- clear management for dtypes


Please let me know how best I could participate or help.

Regards,
Alex



_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion