Hi all,

the week has passed, and it has been discussed quite a bit longer, so I

assume that NEP 41 can effectively be accepted.

Even then, I will bring up one point again. I hope that if there is

still need for discussion, it will hopefully happen in a timely manner,

so that, I can go ahead with some changes proposed in NEP 41, and in

the event of more concrete doubts/issues there will only be few changes

that need to be undone. I would hate to revert large amount of work,

simply because an important point/issue is raised in two months instead

of two weeks.

This whole thing is fairly complex, so please do not hesitate to ask

for clarifications!

I am also very happy to do a video conference with anyone interested at

any time, or chat in private on Slack.

So just in case: I will be available around 11:00 PDT (18 UTC) this

Thursday on the NumPy Community Call zoom link [0].

As far as I am aware, there was only one (maybe 2, see point 2. below

which may be independent) discussion points.

In my proposal the DType class (i.e. `type(np.dtype("float64")`), is

the core concept and different for every scalar type. It holds all the

information on how to deal with array elements.

This is some duplication of scalar types and it means that there would

be (usually) exactly one DType for each (NumPy) scalar, possibly

exposed using:

np.dtype[scalar_type]

e.g. np.dtype[np.float64]

That does create a certain duality. For each scalar type/class, there

is a corresponding DType class. And in theory the scalar does not even

need to know that NumPy has a DType for it.

From a typing theoretical point of view this is also a bit strange. The

type of each array element is identical to the scalar type! But

although there is only one type, there are two distinct classes: one

for the scalar value, and one to explain them to NumPy and store them

in an array.

I lean in that direction because:

1. I wanted to modify scalars as little as possible, I am not sure we

will enable this initially, but this is so that:

* In principle you can create a DType for every Python type without

touching the original Python scalar.

* The scalar need not know about NumPy or DTypes thus creating no

new dependency. (you can use the scalar without installing NumPy)

2. I somewhat like that DType classes have methods that get a "self"

instance argument and are provided with the data by the array.

* This makes functions `dtype.__get_array_item__(item_memory)` is

implemented like a method:

class DType:

def __get_array_item__(self, item_memory):

return item

* There is an alternative approach to this, that I did not think

about much, though.

`item_memory` really is much like a scalar instance (it holds the

actual value), so you can argue that `item_memory` is `self` here,

and the dtype instance is the type of `item_memory` (the self).

E.g. making `__get_array_item__` live on the dtype (not on

the class). The dtype thus is the type/class of the array

element.

This is beautiful, but, in generally you still need to pass

the dtype instance itself. For example strings cannot interpret

without knowing their length. In other words, the scalar `self` is

actually the tuple `(item_memory, dtype)`, which I think is why

at least I do not have a clear grasp here. [1]

3. There may be `dtypes` without specific scalar types. I am not sure

this is actually a tidy theoretical concept, but an example is

the current Pandas Categorical.

The type of the scalars within a categorical array are arbitrary.

I am not actually sure that is theoretically tidy. E.g. Python

uses `enum.Enum`, a class factory, for a similar purpose, and

you have to use the `.value` attribute. But, desirable or not,

it would seem less straight forward to potentially allow if we

design this around the scalar type.

The main downside to using DTypes as proposed in NEP 41 in my opinion

is what I mentioned first:

We must have a DType class for every scalar class, even though at least

most scalars (i.e. all NumPy scalars, except the `object` dtype) can

easily be expanded into including all necessary information, maybe

they already include almost all of it.

In the NEP 41 framework the scalar could be build from the DType in

practice. Which may seem a bit strange. In general Scalar<->DType will

form a Unit of a sort. And this means that somewhere we have to map

scalars to DTypes.

So, in many ways, I actually do find the scalar version tidier myself.

But, I also find the "there is a DType class for every scalar

type/class" a straight forward user story even if there will be subtle

difference between DType and scalar class/type.

The point 2. may be independent of the whole scalar story, I am

conflating it here, because to me it applies more naturally in that

context.

Cheers,

Sebastian

[0] See the community meeting agenda document for the link:

https://hackmd.io/76o-IxCjQX2mOXO_wwkcpg[1] These are thoughts mainly from:

https://gist.github.com/eric-wieser/49c55bcab744b0e782f6c2740603180b#what-this-could-mean-for-dtypesand a discussion on the pull request, and I will not claim to represent

them quite correctly and especially fully here.

_______________________________________________

NumPy-Discussion mailing list

[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion