

Hi all,
When we create new datatypes, we have the option to make new choices
for the new datatypes [0] (not the existing ones).
The question is: Should every NumPy datatype have a scalar associated
and should operations like indexing return a scalar or a 0D array?
This is in my opinion a complex, almost philosophical, question, and we
do not have to settle anything for a long time. But, if we do not
decide a direction before we have many new datatypes the decision will
make itself...
So happy about any ideas, even if its just a gut feeling :).
There are various points. I would like to mostly ignore the technical
ones, but I am listing them anyway here:
* Scalars are faster (although that can be optimized likely)
* Scalars have a lower memory footprint
* The current implementation incurs a technical debt in NumPy.
(I do not think that is a general issue, though. We could
automatically create scalars for each new datatype probably.)
Advantages of having no scalars:
* No need to keep track of scalars to preserve them in ufuncs, or
libraries using `np.asarray`, do they need `np.asarray_or_scalar`?
(or decide they return always arrays, although ufuncs may not)
* Seems simpler in many ways, you always know the output will be an
array if it has to do with NumPy.
Advantages of having scalars:
* Scalars are immutable and we are used to them from Python.
A 0D array cannot be used as a dictionary key consistently [1].
I.e. without scalars as first class citizen `dict[arr1d[0]]`
cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined,
and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]
* Object arrays as we have them now make sense, `arr1d[0]` can
reasonably return a Python object. I.e. arrays feel more like
container if you can take elements out easily.
Could go both ways:
* Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array
without scalars. With scalars `arr1d[0, ...]` clarifies the
meaning. (In principle it is good to never use `arr2d[0]` to
get a 1D slice, probably moreso if scalars exist.)
Note: arrayscalars (the current NumPy scalars) are not useful in my
opinion [3]. A scalar should not be indexed or have a shape. I do not
believe in scalars pretending to be arrays.
I personally tend towards liking scalars. If Python was a language
where the array (arrayprogramming) concept was ingrained into the
language itself, I would lean the other way. But users are used to
scalars, and they "put" scalars into arrays. Array objects are in some
ways strange in Python, and I feel not having scalars detaches them
further.
Having scalars, however also means we should preserve them. I feel in
principle that is actually fairly straight forward. E.g. for ufuncs:
* np.add(scalar, scalar) > scalar
* np.add.reduce(arr, axis=None) > scalar
* np.add.reduce(arr, axis=1) > array (even if arr is 1d)
* np.add.reduce(scalar, axis=()) > array
Of course libraries that do `np.asarray` would/could basically chose to
not preserve scalars: Their signature is defined as taking strictly
array input.
Cheers,
Sebastian
[0] At best this can be a vision to decide which way they may evolve.
[1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably
strange. E.g. Quantity defines hash correctly, but does not fully
ensure immutability for 0D Quantities. Ensuring immutability in a
world where "views" are a central concept requires a writeonly copy.
[2] Arguably `.item()` would always return a scalar, but it would be a
second class citizen. (Although if it returns a scalar, at least we
already have a scalar implementation.)
[3] They are necessary due to technical debt for NumPy datatypes
though.
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


I personally have always found it weird and annoying to deal with 0D arrays, so +1 for scalars!*
Juan
*: admittedly, I have almost no grasp of the underlying NumPy implementation complexities, but I will happily take Sebastian's word that scalars can be consistent with the library.
On Fri, 21 Feb 2020, at 7:37 PM, Sebastian Berg wrote:
Hi all,
When we create new datatypes, we have the option to make new choices
for the new datatypes [0] (not the existing ones).
The question is: Should every NumPy datatype have a scalar associated
and should operations like indexing return a scalar or a 0D array?
This is in my opinion a complex, almost philosophical, question, and we
do not have to settle anything for a long time. But, if we do not
decide a direction before we have many new datatypes the decision will
make itself...
So happy about any ideas, even if its just a gut feeling :).
There are various points. I would like to mostly ignore the technical
ones, but I am listing them anyway here:
* Scalars are faster (although that can be optimized likely)
* Scalars have a lower memory footprint
* The current implementation incurs a technical debt in NumPy.
(I do not think that is a general issue, though. We could
automatically create scalars for each new datatype probably.)
Advantages of having no scalars:
* No need to keep track of scalars to preserve them in ufuncs, or
libraries using `np.asarray`, do they need `np.asarray_or_scalar`?
(or decide they return always arrays, although ufuncs may not)
* Seems simpler in many ways, you always know the output will be an
array if it has to do with NumPy.
Advantages of having scalars:
* Scalars are immutable and we are used to them from Python.
A 0D array cannot be used as a dictionary key consistently [1].
I.e. without scalars as first class citizen `dict[arr1d[0]]`
cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined,
and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]
* Object arrays as we have them now make sense, `arr1d[0]` can
reasonably return a Python object. I.e. arrays feel more like
container if you can take elements out easily.
Could go both ways:
* Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array
without scalars. With scalars `arr1d[0, ...]` clarifies the
meaning. (In principle it is good to never use `arr2d[0]` to
get a 1D slice, probably moreso if scalars exist.)
Note: arrayscalars (the current NumPy scalars) are not useful in my
opinion [3]. A scalar should not be indexed or have a shape. I do not
believe in scalars pretending to be arrays.
I personally tend towards liking scalars. If Python was a language
where the array (arrayprogramming) concept was ingrained into the
language itself, I would lean the other way. But users are used to
scalars, and they "put" scalars into arrays. Array objects are in some
ways strange in Python, and I feel not having scalars detaches them
further.
Having scalars, however also means we should preserve them. I feel in
principle that is actually fairly straight forward. E.g. for ufuncs:
* np.add(scalar, scalar) > scalar
* np.add.reduce(arr, axis=None) > scalar
* np.add.reduce(arr, axis=1) > array (even if arr is 1d)
* np.add.reduce(scalar, axis=()) > array
Of course libraries that do `np.asarray` would/could basically chose to
not preserve scalars: Their signature is defined as taking strictly
array input.
Cheers,
Sebastian
[0] At best this can be a vision to decide which way they may evolve.
[1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably
strange. E.g. Quantity defines hash correctly, but does not fully
ensure immutability for 0D Quantities. Ensuring immutability in a
world where "views" are a central concept requires a writeonly copy.
[2] Arguably `.item()` would always return a scalar, but it would be a
second class citizen. (Although if it returns a scalar, at least we
already have a scalar implementation.)
[3] They are necessary due to technical debt for NumPy datatypes
though.
_______________________________________________
NumPyDiscussion mailing list
https://mail.python.org/mailman/listinfo/numpydiscussion
Attachments:
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


Hi Sebastian,
Just to clarify the difference:
>>> x = np.float64(42)
>>> y = np.array(42, dtype=float)
Here `x` is a scalar and `y` is a 0D array, correct?
If that's the case, not having the former would be very confusing for
users (at least, that would be very confusing to me, FWIW).
If anything, I think it'd be cleaner to not have the latter, and only
have either scalars or 1D arrays (i.e., ND arrays with N>=1), but it
is probably way too late to even think about it anyway.
Cheers,
Evgeni
On Sat, Feb 22, 2020 at 4:37 AM Sebastian Berg
< [hidden email]> wrote:
>
> Hi all,
>
> When we create new datatypes, we have the option to make new choices
> for the new datatypes [0] (not the existing ones).
>
> The question is: Should every NumPy datatype have a scalar associated
> and should operations like indexing return a scalar or a 0D array?
>
> This is in my opinion a complex, almost philosophical, question, and we
> do not have to settle anything for a long time. But, if we do not
> decide a direction before we have many new datatypes the decision will
> make itself...
> So happy about any ideas, even if its just a gut feeling :).
>
> There are various points. I would like to mostly ignore the technical
> ones, but I am listing them anyway here:
>
> * Scalars are faster (although that can be optimized likely)
>
> * Scalars have a lower memory footprint
>
> * The current implementation incurs a technical debt in NumPy.
> (I do not think that is a general issue, though. We could
> automatically create scalars for each new datatype probably.)
>
> Advantages of having no scalars:
>
> * No need to keep track of scalars to preserve them in ufuncs, or
> libraries using `np.asarray`, do they need `np.asarray_or_scalar`?
> (or decide they return always arrays, although ufuncs may not)
>
> * Seems simpler in many ways, you always know the output will be an
> array if it has to do with NumPy.
>
> Advantages of having scalars:
>
> * Scalars are immutable and we are used to them from Python.
> A 0D array cannot be used as a dictionary key consistently [1].
>
> I.e. without scalars as first class citizen `dict[arr1d[0]]`
> cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined,
> and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]
>
> * Object arrays as we have them now make sense, `arr1d[0]` can
> reasonably return a Python object. I.e. arrays feel more like
> container if you can take elements out easily.
>
> Could go both ways:
>
> * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array
> without scalars. With scalars `arr1d[0, ...]` clarifies the
> meaning. (In principle it is good to never use `arr2d[0]` to
> get a 1D slice, probably moreso if scalars exist.)
>
> Note: arrayscalars (the current NumPy scalars) are not useful in my
> opinion [3]. A scalar should not be indexed or have a shape. I do not
> believe in scalars pretending to be arrays.
>
> I personally tend towards liking scalars. If Python was a language
> where the array (arrayprogramming) concept was ingrained into the
> language itself, I would lean the other way. But users are used to
> scalars, and they "put" scalars into arrays. Array objects are in some
> ways strange in Python, and I feel not having scalars detaches them
> further.
>
> Having scalars, however also means we should preserve them. I feel in
> principle that is actually fairly straight forward. E.g. for ufuncs:
>
> * np.add(scalar, scalar) > scalar
> * np.add.reduce(arr, axis=None) > scalar
> * np.add.reduce(arr, axis=1) > array (even if arr is 1d)
> * np.add.reduce(scalar, axis=()) > array
>
> Of course libraries that do `np.asarray` would/could basically chose to
> not preserve scalars: Their signature is defined as taking strictly
> array input.
>
> Cheers,
>
> Sebastian
>
>
> [0] At best this can be a vision to decide which way they may evolve.
>
> [1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably
> strange. E.g. Quantity defines hash correctly, but does not fully
> ensure immutability for 0D Quantities. Ensuring immutability in a
> world where "views" are a central concept requires a writeonly copy.
>
> [2] Arguably `.item()` would always return a scalar, but it would be a
> second class citizen. (Although if it returns a scalar, at least we
> already have a scalar implementation.)
>
> [3] They are necessary due to technical debt for NumPy datatypes
> though.
> _______________________________________________
> NumPyDiscussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpydiscussion_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


not having a hashable tuple conversion would be a strong limitation
a = tuple(np.arange(5))
versus a = tuple([np.array(i) for i in range(5)]) {a:5}
Josef On Sat, Feb 22, 2020 at 9:28 AM Evgeni Burovski < [hidden email]> wrote: Hi Sebastian,
Just to clarify the difference:
>>> x = np.float64(42)
>>> y = np.array(42, dtype=float)
Here `x` is a scalar and `y` is a 0D array, correct?
If that's the case, not having the former would be very confusing for
users (at least, that would be very confusing to me, FWIW).
If anything, I think it'd be cleaner to not have the latter, and only
have either scalars or 1D arrays (i.e., ND arrays with N>=1), but it
is probably way too late to even think about it anyway.
Cheers,
Evgeni
On Sat, Feb 22, 2020 at 4:37 AM Sebastian Berg
<[hidden email]> wrote:
>
> Hi all,
>
> When we create new datatypes, we have the option to make new choices
> for the new datatypes [0] (not the existing ones).
>
> The question is: Should every NumPy datatype have a scalar associated
> and should operations like indexing return a scalar or a 0D array?
>
> This is in my opinion a complex, almost philosophical, question, and we
> do not have to settle anything for a long time. But, if we do not
> decide a direction before we have many new datatypes the decision will
> make itself...
> So happy about any ideas, even if its just a gut feeling :).
>
> There are various points. I would like to mostly ignore the technical
> ones, but I am listing them anyway here:
>
> * Scalars are faster (although that can be optimized likely)
>
> * Scalars have a lower memory footprint
>
> * The current implementation incurs a technical debt in NumPy.
> (I do not think that is a general issue, though. We could
> automatically create scalars for each new datatype probably.)
>
> Advantages of having no scalars:
>
> * No need to keep track of scalars to preserve them in ufuncs, or
> libraries using `np.asarray`, do they need `np.asarray_or_scalar`?
> (or decide they return always arrays, although ufuncs may not)
>
> * Seems simpler in many ways, you always know the output will be an
> array if it has to do with NumPy.
>
> Advantages of having scalars:
>
> * Scalars are immutable and we are used to them from Python.
> A 0D array cannot be used as a dictionary key consistently [1].
>
> I.e. without scalars as first class citizen `dict[arr1d[0]]`
> cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined,
> and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]
>
> * Object arrays as we have them now make sense, `arr1d[0]` can
> reasonably return a Python object. I.e. arrays feel more like
> container if you can take elements out easily.
>
> Could go both ways:
>
> * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array
> without scalars. With scalars `arr1d[0, ...]` clarifies the
> meaning. (In principle it is good to never use `arr2d[0]` to
> get a 1D slice, probably moreso if scalars exist.)
>
> Note: arrayscalars (the current NumPy scalars) are not useful in my
> opinion [3]. A scalar should not be indexed or have a shape. I do not
> believe in scalars pretending to be arrays.
>
> I personally tend towards liking scalars. If Python was a language
> where the array (arrayprogramming) concept was ingrained into the
> language itself, I would lean the other way. But users are used to
> scalars, and they "put" scalars into arrays. Array objects are in some
> ways strange in Python, and I feel not having scalars detaches them
> further.
>
> Having scalars, however also means we should preserve them. I feel in
> principle that is actually fairly straight forward. E.g. for ufuncs:
>
> * np.add(scalar, scalar) > scalar
> * np.add.reduce(arr, axis=None) > scalar
> * np.add.reduce(arr, axis=1) > array (even if arr is 1d)
> * np.add.reduce(scalar, axis=()) > array
>
> Of course libraries that do `np.asarray` would/could basically chose to
> not preserve scalars: Their signature is defined as taking strictly
> array input.
>
> Cheers,
>
> Sebastian
>
>
> [0] At best this can be a vision to decide which way they may evolve.
>
> [1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably
> strange. E.g. Quantity defines hash correctly, but does not fully
> ensure immutability for 0D Quantities. Ensuring immutability in a
> world where "views" are a central concept requires a writeonly copy.
>
> [2] Arguably `.item()` would always return a scalar, but it would be a
> second class citizen. (Although if it returns a scalar, at least we
> already have a scalar implementation.)
>
> [3] They are necessary due to technical debt for NumPy datatypes
> though.
> _______________________________________________
> NumPyDiscussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


not having a hashable tuple conversion would be a strong limitation
a = tuple(np.arange(5))
versus a = tuple([np.array(i) for i in range(5)]) {a:5}
also there is the question of which scalar
.item() versus [()]
This was used in the old times in scipy.stats, and I just saw
aside: AFAIR, I use 0dim arrays also to ensure that I have a numpy dtype and not, e.g. some equivalent python type
Josef
Josef On Sat, Feb 22, 2020 at 9:28 AM Evgeni Burovski < [hidden email]> wrote: Hi Sebastian,
Just to clarify the difference:
>>> x = np.float64(42)
>>> y = np.array(42, dtype=float)
Here `x` is a scalar and `y` is a 0D array, correct?
If that's the case, not having the former would be very confusing for
users (at least, that would be very confusing to me, FWIW).
If anything, I think it'd be cleaner to not have the latter, and only
have either scalars or 1D arrays (i.e., ND arrays with N>=1), but it
is probably way too late to even think about it anyway.
Cheers,
Evgeni
On Sat, Feb 22, 2020 at 4:37 AM Sebastian Berg
<[hidden email]> wrote:
>
> Hi all,
>
> When we create new datatypes, we have the option to make new choices
> for the new datatypes [0] (not the existing ones).
>
> The question is: Should every NumPy datatype have a scalar associated
> and should operations like indexing return a scalar or a 0D array?
>
> This is in my opinion a complex, almost philosophical, question, and we
> do not have to settle anything for a long time. But, if we do not
> decide a direction before we have many new datatypes the decision will
> make itself...
> So happy about any ideas, even if its just a gut feeling :).
>
> There are various points. I would like to mostly ignore the technical
> ones, but I am listing them anyway here:
>
> * Scalars are faster (although that can be optimized likely)
>
> * Scalars have a lower memory footprint
>
> * The current implementation incurs a technical debt in NumPy.
> (I do not think that is a general issue, though. We could
> automatically create scalars for each new datatype probably.)
>
> Advantages of having no scalars:
>
> * No need to keep track of scalars to preserve them in ufuncs, or
> libraries using `np.asarray`, do they need `np.asarray_or_scalar`?
> (or decide they return always arrays, although ufuncs may not)
>
> * Seems simpler in many ways, you always know the output will be an
> array if it has to do with NumPy.
>
> Advantages of having scalars:
>
> * Scalars are immutable and we are used to them from Python.
> A 0D array cannot be used as a dictionary key consistently [1].
>
> I.e. without scalars as first class citizen `dict[arr1d[0]]`
> cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined,
> and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]
>
> * Object arrays as we have them now make sense, `arr1d[0]` can
> reasonably return a Python object. I.e. arrays feel more like
> container if you can take elements out easily.
>
> Could go both ways:
>
> * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array
> without scalars. With scalars `arr1d[0, ...]` clarifies the
> meaning. (In principle it is good to never use `arr2d[0]` to
> get a 1D slice, probably moreso if scalars exist.)
>
> Note: arrayscalars (the current NumPy scalars) are not useful in my
> opinion [3]. A scalar should not be indexed or have a shape. I do not
> believe in scalars pretending to be arrays.
>
> I personally tend towards liking scalars. If Python was a language
> where the array (arrayprogramming) concept was ingrained into the
> language itself, I would lean the other way. But users are used to
> scalars, and they "put" scalars into arrays. Array objects are in some
> ways strange in Python, and I feel not having scalars detaches them
> further.
>
> Having scalars, however also means we should preserve them. I feel in
> principle that is actually fairly straight forward. E.g. for ufuncs:
>
> * np.add(scalar, scalar) > scalar
> * np.add.reduce(arr, axis=None) > scalar
> * np.add.reduce(arr, axis=1) > array (even if arr is 1d)
> * np.add.reduce(scalar, axis=()) > array
>
> Of course libraries that do `np.asarray` would/could basically chose to
> not preserve scalars: Their signature is defined as taking strictly
> array input.
>
> Cheers,
>
> Sebastian
>
>
> [0] At best this can be a vision to decide which way they may evolve.
>
> [1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably
> strange. E.g. Quantity defines hash correctly, but does not fully
> ensure immutability for 0D Quantities. Ensuring immutability in a
> world where "views" are a central concept requires a writeonly copy.
>
> [2] Arguably `.item()` would always return a scalar, but it would be a
> second class citizen. (Although if it returns a scalar, at least we
> already have a scalar implementation.)
>
> [3] They are necessary due to technical debt for NumPy datatypes
> though.
> _______________________________________________
> NumPyDiscussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


not having a hashable tuple conversion would be a strong limitation
a = tuple(np.arange(5))
versus a = tuple([np.array(i) for i in range(5)]) {a:5}
also there is the question of which scalar
.item() versus [()]
This was used in the old times in scipy.stats, and I just saw
aside: AFAIR, I use 0dim arrays also to ensure that I have a numpy dtype and not, e.g. some equivalent python type
0dim as mutable pseudoscalar
a = np.asarray(5) a, id(a) (array(5), 844574884528)
a[()] = 1 a, id(a) (array(1), 844574884528)
maybe I never used that, In a recent similar case, I could use just a 1d list or array to work around python's muting or mutability behavior
Josef
Josef On Sat, Feb 22, 2020 at 9:28 AM Evgeni Burovski < [hidden email]> wrote: Hi Sebastian,
Just to clarify the difference:
>>> x = np.float64(42)
>>> y = np.array(42, dtype=float)
Here `x` is a scalar and `y` is a 0D array, correct?
If that's the case, not having the former would be very confusing for
users (at least, that would be very confusing to me, FWIW).
If anything, I think it'd be cleaner to not have the latter, and only
have either scalars or 1D arrays (i.e., ND arrays with N>=1), but it
is probably way too late to even think about it anyway.
Cheers,
Evgeni
On Sat, Feb 22, 2020 at 4:37 AM Sebastian Berg
<[hidden email]> wrote:
>
> Hi all,
>
> When we create new datatypes, we have the option to make new choices
> for the new datatypes [0] (not the existing ones).
>
> The question is: Should every NumPy datatype have a scalar associated
> and should operations like indexing return a scalar or a 0D array?
>
> This is in my opinion a complex, almost philosophical, question, and we
> do not have to settle anything for a long time. But, if we do not
> decide a direction before we have many new datatypes the decision will
> make itself...
> So happy about any ideas, even if its just a gut feeling :).
>
> There are various points. I would like to mostly ignore the technical
> ones, but I am listing them anyway here:
>
> * Scalars are faster (although that can be optimized likely)
>
> * Scalars have a lower memory footprint
>
> * The current implementation incurs a technical debt in NumPy.
> (I do not think that is a general issue, though. We could
> automatically create scalars for each new datatype probably.)
>
> Advantages of having no scalars:
>
> * No need to keep track of scalars to preserve them in ufuncs, or
> libraries using `np.asarray`, do they need `np.asarray_or_scalar`?
> (or decide they return always arrays, although ufuncs may not)
>
> * Seems simpler in many ways, you always know the output will be an
> array if it has to do with NumPy.
>
> Advantages of having scalars:
>
> * Scalars are immutable and we are used to them from Python.
> A 0D array cannot be used as a dictionary key consistently [1].
>
> I.e. without scalars as first class citizen `dict[arr1d[0]]`
> cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined,
> and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]
>
> * Object arrays as we have them now make sense, `arr1d[0]` can
> reasonably return a Python object. I.e. arrays feel more like
> container if you can take elements out easily.
>
> Could go both ways:
>
> * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array
> without scalars. With scalars `arr1d[0, ...]` clarifies the
> meaning. (In principle it is good to never use `arr2d[0]` to
> get a 1D slice, probably moreso if scalars exist.)
>
> Note: arrayscalars (the current NumPy scalars) are not useful in my
> opinion [3]. A scalar should not be indexed or have a shape. I do not
> believe in scalars pretending to be arrays.
>
> I personally tend towards liking scalars. If Python was a language
> where the array (arrayprogramming) concept was ingrained into the
> language itself, I would lean the other way. But users are used to
> scalars, and they "put" scalars into arrays. Array objects are in some
> ways strange in Python, and I feel not having scalars detaches them
> further.
>
> Having scalars, however also means we should preserve them. I feel in
> principle that is actually fairly straight forward. E.g. for ufuncs:
>
> * np.add(scalar, scalar) > scalar
> * np.add.reduce(arr, axis=None) > scalar
> * np.add.reduce(arr, axis=1) > array (even if arr is 1d)
> * np.add.reduce(scalar, axis=()) > array
>
> Of course libraries that do `np.asarray` would/could basically chose to
> not preserve scalars: Their signature is defined as taking strictly
> array input.
>
> Cheers,
>
> Sebastian
>
>
> [0] At best this can be a vision to decide which way they may evolve.
>
> [1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably
> strange. E.g. Quantity defines hash correctly, but does not fully
> ensure immutability for 0D Quantities. Ensuring immutability in a
> world where "views" are a central concept requires a writeonly copy.
>
> [2] Arguably `.item()` would always return a scalar, but it would be a
> second class citizen. (Although if it returns a scalar, at least we
> already have a scalar implementation.)
>
> [3] They are necessary due to technical debt for NumPy datatypes
> though.
> _______________________________________________
> NumPyDiscussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


Off the cuff, my intuition is that dtypes will want to be able to
define how scalar indexing works, and let it return objects other than
arrays. So e.g.:
 some dtypes might just return a zerod array
 some dtypes might want to return some arbitrary domainappropriate
type, like a datetime dtype might want to return datetime.datetime
objects (like how dtype(object) works now)
 some dtypes might want to go to all the trouble to define immutable
duckarray "scalar" types (like how dtype(float) and friends work now)
But I don't think we need to give that last case any special
privileges in the dtype system. For example, I don't think we need to
mandate that everyone who defines their own dtype MUST also implement
a custom duckarray type to act as the scalars, or build a whole
complex system to autogenerate such types given an arbitrary
userdefined dtype.
n
On Fri, Feb 21, 2020 at 5:37 PM Sebastian Berg
< [hidden email]> wrote:
>
> Hi all,
>
> When we create new datatypes, we have the option to make new choices
> for the new datatypes [0] (not the existing ones).
>
> The question is: Should every NumPy datatype have a scalar associated
> and should operations like indexing return a scalar or a 0D array?
>
> This is in my opinion a complex, almost philosophical, question, and we
> do not have to settle anything for a long time. But, if we do not
> decide a direction before we have many new datatypes the decision will
> make itself...
> So happy about any ideas, even if its just a gut feeling :).
>
> There are various points. I would like to mostly ignore the technical
> ones, but I am listing them anyway here:
>
> * Scalars are faster (although that can be optimized likely)
>
> * Scalars have a lower memory footprint
>
> * The current implementation incurs a technical debt in NumPy.
> (I do not think that is a general issue, though. We could
> automatically create scalars for each new datatype probably.)
>
> Advantages of having no scalars:
>
> * No need to keep track of scalars to preserve them in ufuncs, or
> libraries using `np.asarray`, do they need `np.asarray_or_scalar`?
> (or decide they return always arrays, although ufuncs may not)
>
> * Seems simpler in many ways, you always know the output will be an
> array if it has to do with NumPy.
>
> Advantages of having scalars:
>
> * Scalars are immutable and we are used to them from Python.
> A 0D array cannot be used as a dictionary key consistently [1].
>
> I.e. without scalars as first class citizen `dict[arr1d[0]]`
> cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined,
> and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]
>
> * Object arrays as we have them now make sense, `arr1d[0]` can
> reasonably return a Python object. I.e. arrays feel more like
> container if you can take elements out easily.
>
> Could go both ways:
>
> * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array
> without scalars. With scalars `arr1d[0, ...]` clarifies the
> meaning. (In principle it is good to never use `arr2d[0]` to
> get a 1D slice, probably moreso if scalars exist.)
>
> Note: arrayscalars (the current NumPy scalars) are not useful in my
> opinion [3]. A scalar should not be indexed or have a shape. I do not
> believe in scalars pretending to be arrays.
>
> I personally tend towards liking scalars. If Python was a language
> where the array (arrayprogramming) concept was ingrained into the
> language itself, I would lean the other way. But users are used to
> scalars, and they "put" scalars into arrays. Array objects are in some
> ways strange in Python, and I feel not having scalars detaches them
> further.
>
> Having scalars, however also means we should preserve them. I feel in
> principle that is actually fairly straight forward. E.g. for ufuncs:
>
> * np.add(scalar, scalar) > scalar
> * np.add.reduce(arr, axis=None) > scalar
> * np.add.reduce(arr, axis=1) > array (even if arr is 1d)
> * np.add.reduce(scalar, axis=()) > array
>
> Of course libraries that do `np.asarray` would/could basically chose to
> not preserve scalars: Their signature is defined as taking strictly
> array input.
>
> Cheers,
>
> Sebastian
>
>
> [0] At best this can be a vision to decide which way they may evolve.
>
> [1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably
> strange. E.g. Quantity defines hash correctly, but does not fully
> ensure immutability for 0D Quantities. Ensuring immutability in a
> world where "views" are a central concept requires a writeonly copy.
>
> [2] Arguably `.item()` would always return a scalar, but it would be a
> second class citizen. (Although if it returns a scalar, at least we
> already have a scalar implementation.)
>
> [3] They are necessary due to technical debt for NumPy datatypes
> though.
> _______________________________________________
> NumPyDiscussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpydiscussion
Nathaniel J. Smith  https://vorpus.org_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


Hi, Sebastian,
On 22.02.20, 02:37, "NumPyDiscussion on behalf of Sebastian Berg" <numpydiscussionbounces+hameerabbasi= [hidden email] on behalf of [hidden email]> wrote:
Hi all,
When we create new datatypes, we have the option to make new choices
for the new datatypes [0] (not the existing ones).
The question is: Should every NumPy datatype have a scalar associated
and should operations like indexing return a scalar or a 0D array?
This is in my opinion a complex, almost philosophical, question, and we
do not have to settle anything for a long time. But, if we do not
decide a direction before we have many new datatypes the decision will
make itself...
So happy about any ideas, even if its just a gut feeling :).
There are various points. I would like to mostly ignore the technical
ones, but I am listing them anyway here:
* Scalars are faster (although that can be optimized likely)
* Scalars have a lower memory footprint
* The current implementation incurs a technical debt in NumPy.
(I do not think that is a general issue, though. We could
automatically create scalars for each new datatype probably.)
Advantages of having no scalars:
* No need to keep track of scalars to preserve them in ufuncs, or
libraries using `np.asarray`, do they need `np.asarray_or_scalar`?
(or decide they return always arrays, although ufuncs may not)
* Seems simpler in many ways, you always know the output will be an
array if it has to do with NumPy.
Advantages of having scalars:
* Scalars are immutable and we are used to them from Python.
A 0D array cannot be used as a dictionary key consistently [1].
I.e. without scalars as first class citizen `dict[arr1d[0]]`
cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined,
and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]
* Object arrays as we have them now make sense, `arr1d[0]` can
reasonably return a Python object. I.e. arrays feel more like
container if you can take elements out easily.
Could go both ways:
* Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array
without scalars. With scalars `arr1d[0, ...]` clarifies the
meaning. (In principle it is good to never use `arr2d[0]` to
get a 1D slice, probably moreso if scalars exist.)
From a usability perspective, one could argue that if the dimension of the array one is indexing into is known and the user isn't advanced, then the behavior expected is one of scalars and not 0D arrays. If, however, the input dimension is unknown, then the behavior switch at 0D and the need for an extra ellipsis to ensure arrayness makes things confusing to regular users. I am file with the current behavior of indexing, as anything else would likely be a large backwardscompat break.
Note: arrayscalars (the current NumPy scalars) are not useful in my
opinion [3]. A scalar should not be indexed or have a shape. I do not
believe in scalars pretending to be arrays.
I personally tend towards liking scalars. If Python was a language
where the array (arrayprogramming) concept was ingrained into the
language itself, I would lean the other way. But users are used to
scalars, and they "put" scalars into arrays. Array objects are in some
ways strange in Python, and I feel not having scalars detaches them
further.
Having scalars, however also means we should preserve them. I feel in
principle that is actually fairly straight forward. E.g. for ufuncs:
* np.add(scalar, scalar) > scalar
* np.add.reduce(arr, axis=None) > scalar
* np.add.reduce(arr, axis=1) > array (even if arr is 1d)
* np.add.reduce(scalar, axis=()) > array
I love this idea.
Of course libraries that do `np.asarray` would/could basically chose to
not preserve scalars: Their signature is defined as taking strictly
array input.
Cheers,
Sebastian
[0] At best this can be a vision to decide which way they may evolve.
[1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably
strange. E.g. Quantity defines hash correctly, but does not fully
ensure immutability for 0D Quantities. Ensuring immutability in a
world where "views" are a central concept requires a writeonly copy.
[2] Arguably `.item()` would always return a scalar, but it would be a
second class citizen. (Although if it returns a scalar, at least we
already have a scalar implementation.)
[3] They are necessary due to technical debt for NumPy datatypes
though.
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


On Sat, 20200222 at 13:28 0800, Nathaniel Smith wrote:
> Off the cuff, my intuition is that dtypes will want to be able to
> define how scalar indexing works, and let it return objects other
> than
> arrays. So e.g.:
>
>  some dtypes might just return a zerod array
>  some dtypes might want to return some arbitrary domainappropriate
> type, like a datetime dtype might want to return datetime.datetime
> objects (like how dtype(object) works now)
>  some dtypes might want to go to all the trouble to define immutable
> duckarray "scalar" types (like how dtype(float) and friends work
> now)
Right, my assumption is that whatever we suggest is going to be what
most will choose, so we have the chance to move in a certain direction
and set a standard. This is to make code which may or may not deal with
0D arrays more reliable (more below).
>
> But I don't think we need to give that last case any special
> privileges in the dtype system. For example, I don't think we need to
> mandate that everyone who defines their own dtype MUST also implement
> a custom duckarray type to act as the scalars, or build a whole
> complex system to autogenerate such types given an arbitrary
> userdefined dtype.
(Note that "autogenerating" would be nothing more than a writeonly 0D
array, which does not implement indexing.)
There are also categoricals, for which the type may just be "object" in
practice (you could define it closer, but it seems unlikely to be
useful). And for simple numerical types, if we go the `.item()` path,
it is arguably fine if the type is just a python type.
Maybe the crux of the problem is actuall that in general
`np.asarray(arr1d[0])` does not roundtrip for the current object dtype,
and only partially for a categorical above.
As such that is fine, but right now it is hard to tell when you will
have a scalar and when a 0D array.
Maybe it is better to talk about a potentially new `np.pyobject[type]`
datatype (i.e. an object datatype with all elements having the same
python type).
Currently writing generic code with the object dtype is tricky, because
we randomly return the object instead of arrays.
What would be the preference for such a specific dtype?
* arr1d[0] > scalar or array?
* np.add(scalar, scalar) > scalar or array
* np.add.reduce(arr) > scalar or array?
I think the `np.add` case we can decide fairly independently. The main
thing is the indexing. Would we want to force a `.item()` call or not?
Forcing `.item()` is in many ways simpler, I am unsure whether it would
be inconvenient often.
And, maybe the answer is just that for datatypes that do not roundtrip
easily, `.item()` is probably preferable, and for datatypes that do
roundtrip scalars are fine.
 Sebastian
>
> On Fri, Feb 21, 2020 at 5:37 PM Sebastian Berg
> < [hidden email]> wrote:
> > Hi all,
> >
> > When we create new datatypes, we have the option to make new
> > choices
> > for the new datatypes [0] (not the existing ones).
> >
> > The question is: Should every NumPy datatype have a scalar
> > associated
> > and should operations like indexing return a scalar or a 0D array?
> >
> > This is in my opinion a complex, almost philosophical, question,
> > and we
> > do not have to settle anything for a long time. But, if we do not
> > decide a direction before we have many new datatypes the decision
> > will
> > make itself...
> > So happy about any ideas, even if its just a gut feeling :).
> >
> > There are various points. I would like to mostly ignore the
> > technical
> > ones, but I am listing them anyway here:
> >
> > * Scalars are faster (although that can be optimized likely)
> >
> > * Scalars have a lower memory footprint
> >
> > * The current implementation incurs a technical debt in NumPy.
> > (I do not think that is a general issue, though. We could
> > automatically create scalars for each new datatype probably.)
> >
> > Advantages of having no scalars:
> >
> > * No need to keep track of scalars to preserve them in ufuncs, or
> > libraries using `np.asarray`, do they need
> > `np.asarray_or_scalar`?
> > (or decide they return always arrays, although ufuncs may not)
> >
> > * Seems simpler in many ways, you always know the output will be
> > an
> > array if it has to do with NumPy.
> >
> > Advantages of having scalars:
> >
> > * Scalars are immutable and we are used to them from Python.
> > A 0D array cannot be used as a dictionary key consistently
> > [1].
> >
> > I.e. without scalars as first class citizen `dict[arr1d[0]]`
> > cannot work, `dict[arr1d[0].item()]` may (if `.item()` is
> > defined,
> > and e.g. `dict[arr1d[0].frozen()]` could make a copy to work.
> > [2]
> >
> > * Object arrays as we have them now make sense, `arr1d[0]` can
> > reasonably return a Python object. I.e. arrays feel more like
> > container if you can take elements out easily.
> >
> > Could go both ways:
> >
> > * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array
> > without scalars. With scalars `arr1d[0, ...]` clarifies the
> > meaning. (In principle it is good to never use `arr2d[0]` to
> > get a 1D slice, probably moreso if scalars exist.)
> >
> > Note: arrayscalars (the current NumPy scalars) are not useful in
> > my
> > opinion [3]. A scalar should not be indexed or have a shape. I do
> > not
> > believe in scalars pretending to be arrays.
> >
> > I personally tend towards liking scalars. If Python was a language
> > where the array (arrayprogramming) concept was ingrained into the
> > language itself, I would lean the other way. But users are used to
> > scalars, and they "put" scalars into arrays. Array objects are in
> > some
> > ways strange in Python, and I feel not having scalars detaches them
> > further.
> >
> > Having scalars, however also means we should preserve them. I feel
> > in
> > principle that is actually fairly straight forward. E.g. for
> > ufuncs:
> >
> > * np.add(scalar, scalar) > scalar
> > * np.add.reduce(arr, axis=None) > scalar
> > * np.add.reduce(arr, axis=1) > array (even if arr is 1d)
> > * np.add.reduce(scalar, axis=()) > array
> >
> > Of course libraries that do `np.asarray` would/could basically
> > chose to
> > not preserve scalars: Their signature is defined as taking strictly
> > array input.
> >
> > Cheers,
> >
> > Sebastian
> >
> >
> > [0] At best this can be a vision to decide which way they may
> > evolve.
> >
> > [1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is
> > arguably
> > strange. E.g. Quantity defines hash correctly, but does not fully
> > ensure immutability for 0D Quantities. Ensuring immutability in a
> > world where "views" are a central concept requires a writeonly
> > copy.
> >
> > [2] Arguably `.item()` would always return a scalar, but it would
> > be a
> > second class citizen. (Although if it returns a scalar, at least we
> > already have a scalar implementation.)
> >
> > [3] They are necessary due to technical debt for NumPy datatypes
> > though.
> > _______________________________________________
> > NumPyDiscussion mailing list
> > [hidden email]
> > https://mail.python.org/mailman/listinfo/numpydiscussion>
>
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


I have some thoughts on scalars from playing with ndarray ducktypes
(__array_function__), eg a MaskedArray ndarrayducktype, for which I
wanted an associated "MaskedScalar" type.
In summary, the ways scalars currently work makes ducktyping
(duckscalars) difficult:
* numpy scalar types are not subclassable, so my duckscalars aren't
subclasses of numpy scalars and aren't in the type hierarchy
* even if scalars were subclassable, I would have to subclass each
scalar datatype individually to make masked versions
* lots of code checks `np.isinstance(var, np.float64)` which breaks
for my duckscalars
* it was difficult to distinguish between a duckscalar and a duck0d
array. The method I used in the end seems hacky.
This has led to some daydreams about how scalars should work, and also
led me last to read through your NEPs 40/41 with specific focus on what
you said about scalars, and was about to post there until I saw this
discussion. I agree with what you said in the NEPs about not making
scalars be dtype instances.
Here is what ducktypes led me to:
If we are able to do something like define a `np.numpy_scalar` type
covering all numpy scalars, which has a `.dtype` attribute like you
describe in the NEPs, then that would seem to solve the ducktype
problems above. Ducktype implementors would need to make a "duckscalar"
type in parallel to their "duckndarray" type, but I found that to be
pretty easy using an abstract class in my MaskedArray ducktype, since
the MaskedArray and MaskedScalar share a lot of behavior.
A numpy_scalar type would also help solve some objectarray problems if
the object scalars are wrapped in the np_scalar type. A long time ago I
started to try to fix up various funny/strange behaviors of object
datatypes, but there are lots of special cases, and the main problem was
that the returned objects (eg from indexing) were not numpy types and
did not support numpy attributes or indexing. Wrapping the returned
object in `np.numpy_scalar` might add an extra slight annoyance to
people who want to unwrap the object, but I think it would make object
arrays less buggy and make code using object arrays easier to reason
about and debug.
Finally, a few random votes/comments based on the other emails on the list:
I think scalars have a place in numpy (rather than just reusing 0d
arrays), since there is a clear use in having hashable, immutable
scalars. Structured scalars should probably be immutable.
I agree with your suggestion that scalars should not be indexable. Thus,
my duckscalars (and proposed numpy_scalar) would not be indexable.
However, I think they should encode their datatype though a .dtype
attribute like ndarrays, rather than by inheritance.
Also, something to think about is that currently numpy scalars satisfy
the property `isinstance(np.float64(1), float)`, i.e they are within the
python numerical type hierarchy. 0d arrays do not have this property. My
proposal above would break this. I'm not sure what to think about
whether this is a good property to maintain or not.
Cheers,
Allan
On 2/21/20 8:37 PM, Sebastian Berg wrote:
> Hi all,
>
> When we create new datatypes, we have the option to make new choices
> for the new datatypes [0] (not the existing ones).
>
> The question is: Should every NumPy datatype have a scalar associated
> and should operations like indexing return a scalar or a 0D array?
>
> This is in my opinion a complex, almost philosophical, question, and we
> do not have to settle anything for a long time. But, if we do not
> decide a direction before we have many new datatypes the decision will
> make itself...
> So happy about any ideas, even if its just a gut feeling :).
>
> There are various points. I would like to mostly ignore the technical
> ones, but I am listing them anyway here:
>
> * Scalars are faster (although that can be optimized likely)
>
> * Scalars have a lower memory footprint
>
> * The current implementation incurs a technical debt in NumPy.
> (I do not think that is a general issue, though. We could
> automatically create scalars for each new datatype probably.)
>
> Advantages of having no scalars:
>
> * No need to keep track of scalars to preserve them in ufuncs, or
> libraries using `np.asarray`, do they need `np.asarray_or_scalar`?
> (or decide they return always arrays, although ufuncs may not)
>
> * Seems simpler in many ways, you always know the output will be an
> array if it has to do with NumPy.
>
> Advantages of having scalars:
>
> * Scalars are immutable and we are used to them from Python.
> A 0D array cannot be used as a dictionary key consistently [1].
>
> I.e. without scalars as first class citizen `dict[arr1d[0]]`
> cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined,
> and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]
>
> * Object arrays as we have them now make sense, `arr1d[0]` can
> reasonably return a Python object. I.e. arrays feel more like
> container if you can take elements out easily.
>
> Could go both ways:
>
> * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array
> without scalars. With scalars `arr1d[0, ...]` clarifies the
> meaning. (In principle it is good to never use `arr2d[0]` to
> get a 1D slice, probably moreso if scalars exist.)
>
> Note: arrayscalars (the current NumPy scalars) are not useful in my
> opinion [3]. A scalar should not be indexed or have a shape. I do not
> believe in scalars pretending to be arrays.
>
> I personally tend towards liking scalars. If Python was a language
> where the array (arrayprogramming) concept was ingrained into the
> language itself, I would lean the other way. But users are used to
> scalars, and they "put" scalars into arrays. Array objects are in some
> ways strange in Python, and I feel not having scalars detaches them
> further.
>
> Having scalars, however also means we should preserve them. I feel in
> principle that is actually fairly straight forward. E.g. for ufuncs:
>
> * np.add(scalar, scalar) > scalar
> * np.add.reduce(arr, axis=None) > scalar
> * np.add.reduce(arr, axis=1) > array (even if arr is 1d)
> * np.add.reduce(scalar, axis=()) > array
>
> Of course libraries that do `np.asarray` would/could basically chose to
> not preserve scalars: Their signature is defined as taking strictly
> array input.
>
> Cheers,
>
> Sebastian
>
>
> [0] At best this can be a vision to decide which way they may evolve.
>
> [1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably
> strange. E.g. Quantity defines hash correctly, but does not fully
> ensure immutability for 0D Quantities. Ensuring immutability in a
> world where "views" are a central concept requires a writeonly copy.
>
> [2] Arguably `.item()` would always return a scalar, but it would be a
> second class citizen. (Although if it returns a scalar, at least we
> already have a scalar implementation.)
>
> [3] They are necessary due to technical debt for NumPy datatypes
> though.
>
>
> _______________________________________________
> NumPyDiscussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpydiscussion>
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


The fact that `isinstance(np.float64(1), float)` raises the problem that the current
implementation of np.float64 scalars breaks the Liskov substitution principle:
`sequence_or_array[round(x)]` works if `x` is a float, but breaks down if x is
a np.float64.
See https://github.com/numpy/numpy/issues/11810, where the issue is discussed in the
broader setting of the semantics of `np.round` vs. python3 `round`.
I do not have a strong opinion here, except that if np.float64’s are within the python
number hierarchy they should be PEP 3141 compliant (which currently they are not.)
Stefano
> On 25 Feb 2020, at 00:03, [hidden email] wrote:
>
> Also, something to think about is that currently numpy scalars satisfy
> the property `isinstance(np.float64(1), float)`, i.e they are within the
> python numerical type hierarchy. 0d arrays do not have this property. My
> proposal above would break this. I'm not sure what to think about
> whether this is a good property to maintain or not.
>
> Cheers,
> Allan
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


I've always found the duality of zerod arrays an scalars confusing, and I'm sure I'm not alone.
Having both is just plain weird.
But, backward compatibility aside, could we have ONLY Scalars?
When we index into an array, the dimensionality is reduced by one, so indexing into a 1D array has to get us something: but the zerod array is a really weird object  do we really need it?
There is certainly a need for more numpylike scalars: more than the built in data types, and some handy attributes and methods, like dtype, .itemsize, etc. But could we make an enhanced scalar that had everything we actually need from a zerod array?
The key point would be mutability  but do we really need mutable scalars? I can't think of any time I've needed that, when I couldn't have used a 1d array of length 1.
Is there a use case for zerod arrays that could not be met with an enhanced scalar?
CHB
On Mon, Feb 24, 2020 at 12:30 PM Allan Haldane < [hidden email]> wrote: I have some thoughts on scalars from playing with ndarray ducktypes
(__array_function__), eg a MaskedArray ndarrayducktype, for which I
wanted an associated "MaskedScalar" type.
In summary, the ways scalars currently work makes ducktyping
(duckscalars) difficult:
* numpy scalar types are not subclassable, so my duckscalars aren't
subclasses of numpy scalars and aren't in the type hierarchy
* even if scalars were subclassable, I would have to subclass each
scalar datatype individually to make masked versions
* lots of code checks `np.isinstance(var, np.float64)` which breaks
for my duckscalars
* it was difficult to distinguish between a duckscalar and a duck0d
array. The method I used in the end seems hacky.
This has led to some daydreams about how scalars should work, and also
led me last to read through your NEPs 40/41 with specific focus on what
you said about scalars, and was about to post there until I saw this
discussion. I agree with what you said in the NEPs about not making
scalars be dtype instances.
Here is what ducktypes led me to:
If we are able to do something like define a `np.numpy_scalar` type
covering all numpy scalars, which has a `.dtype` attribute like you
describe in the NEPs, then that would seem to solve the ducktype
problems above. Ducktype implementors would need to make a "duckscalar"
type in parallel to their "duckndarray" type, but I found that to be
pretty easy using an abstract class in my MaskedArray ducktype, since
the MaskedArray and MaskedScalar share a lot of behavior.
A numpy_scalar type would also help solve some objectarray problems if
the object scalars are wrapped in the np_scalar type. A long time ago I
started to try to fix up various funny/strange behaviors of object
datatypes, but there are lots of special cases, and the main problem was
that the returned objects (eg from indexing) were not numpy types and
did not support numpy attributes or indexing. Wrapping the returned
object in `np.numpy_scalar` might add an extra slight annoyance to
people who want to unwrap the object, but I think it would make object
arrays less buggy and make code using object arrays easier to reason
about and debug.
Finally, a few random votes/comments based on the other emails on the list:
I think scalars have a place in numpy (rather than just reusing 0d
arrays), since there is a clear use in having hashable, immutable
scalars. Structured scalars should probably be immutable.
I agree with your suggestion that scalars should not be indexable. Thus,
my duckscalars (and proposed numpy_scalar) would not be indexable.
However, I think they should encode their datatype though a .dtype
attribute like ndarrays, rather than by inheritance.
Also, something to think about is that currently numpy scalars satisfy
the property `isinstance(np.float64(1), float)`, i.e they are within the
python numerical type hierarchy. 0d arrays do not have this property. My
proposal above would break this. I'm not sure what to think about
whether this is a good property to maintain or not.
Cheers,
Allan
On 2/21/20 8:37 PM, Sebastian Berg wrote:
> Hi all,
>
> When we create new datatypes, we have the option to make new choices
> for the new datatypes [0] (not the existing ones).
>
> The question is: Should every NumPy datatype have a scalar associated
> and should operations like indexing return a scalar or a 0D array?
>
> This is in my opinion a complex, almost philosophical, question, and we
> do not have to settle anything for a long time. But, if we do not
> decide a direction before we have many new datatypes the decision will
> make itself...
> So happy about any ideas, even if its just a gut feeling :).
>
> There are various points. I would like to mostly ignore the technical
> ones, but I am listing them anyway here:
>
> * Scalars are faster (although that can be optimized likely)
>
> * Scalars have a lower memory footprint
>
> * The current implementation incurs a technical debt in NumPy.
> (I do not think that is a general issue, though. We could
> automatically create scalars for each new datatype probably.)
>
> Advantages of having no scalars:
>
> * No need to keep track of scalars to preserve them in ufuncs, or
> libraries using `np.asarray`, do they need `np.asarray_or_scalar`?
> (or decide they return always arrays, although ufuncs may not)
>
> * Seems simpler in many ways, you always know the output will be an
> array if it has to do with NumPy.
>
> Advantages of having scalars:
>
> * Scalars are immutable and we are used to them from Python.
> A 0D array cannot be used as a dictionary key consistently [1].
>
> I.e. without scalars as first class citizen `dict[arr1d[0]]`
> cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined,
> and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]
>
> * Object arrays as we have them now make sense, `arr1d[0]` can
> reasonably return a Python object. I.e. arrays feel more like
> container if you can take elements out easily.
>
> Could go both ways:
>
> * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array
> without scalars. With scalars `arr1d[0, ...]` clarifies the
> meaning. (In principle it is good to never use `arr2d[0]` to
> get a 1D slice, probably moreso if scalars exist.)
>
> Note: arrayscalars (the current NumPy scalars) are not useful in my
> opinion [3]. A scalar should not be indexed or have a shape. I do not
> believe in scalars pretending to be arrays.
>
> I personally tend towards liking scalars. If Python was a language
> where the array (arrayprogramming) concept was ingrained into the
> language itself, I would lean the other way. But users are used to
> scalars, and they "put" scalars into arrays. Array objects are in some
> ways strange in Python, and I feel not having scalars detaches them
> further.
>
> Having scalars, however also means we should preserve them. I feel in
> principle that is actually fairly straight forward. E.g. for ufuncs:
>
> * np.add(scalar, scalar) > scalar
> * np.add.reduce(arr, axis=None) > scalar
> * np.add.reduce(arr, axis=1) > array (even if arr is 1d)
> * np.add.reduce(scalar, axis=()) > array
>
> Of course libraries that do `np.asarray` would/could basically chose to
> not preserve scalars: Their signature is defined as taking strictly
> array input.
>
> Cheers,
>
> Sebastian
>
>
> [0] At best this can be a vision to decide which way they may evolve.
>
> [1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably
> strange. E.g. Quantity defines hash correctly, but does not fully
> ensure immutability for 0D Quantities. Ensuring immutability in a
> world where "views" are a central concept requires a writeonly copy.
>
> [2] Arguably `.item()` would always return a scalar, but it would be a
> second class citizen. (Although if it returns a scalar, at least we
> already have a scalar implementation.)
>
> [3] They are necessary due to technical debt for NumPy datatypes
> though.
>
>
> _______________________________________________
> NumPyDiscussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpydiscussion
>
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion
 Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 5266959 voice 7600 Sand Point Way NE (206) 5266329 fax Seattle, WA 98115 (206) 5266317 main reception [hidden email]
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


On Mon, 20200323 at 11:45 0700, Chris Barker wrote:
> I've always found the duality of zerod arrays an scalars confusing,
> and
> I'm sure I'm not alone.
>
> Having both is just plain weird.
I guess so, it is a tricky situation, and I do not really have an
answer.
>
> But, backward compatibility aside, could we have ONLY Scalars?
>
> When we index into an array, the dimensionality is reduced by one, so
> indexing into a 1D array has to get us something: but the zerod
> array is a
> really weird object  do we really need it?
>
Well, it is hard to write functions that work on NDimensions (where N
can be 0), if the 0D array does not exist. You can get away with
scalars in most cases, because they pretend to be arrays in most cases
(aside from mutability).
But I am pretty sure we have a bunch of cases that need
`res = np.asarray(res)` simply because `res` is ND but could then be
silently converted to a scalar. E.g. see
https://github.com/numpy/numpy/issues/13105 for an issue about this
(although it does not actually list any specific problems).
 Sebastian
> There is certainly a need for more numpylike scalars: more than the
> built
> in data types, and some handy attributes and methods, like dtype,
> .itemsize, etc. But could we make an enhanced scalar that had
> everything we
> actually need from a zerod array?
>
> The key point would be mutability  but do we really need mutable
> scalars?
> I can't think of any time I've needed that, when I couldn't have used
> a 1d
> array of length 1.
>
> Is there a use case for zerod arrays that could not be met with an
> enhanced scalar?
>
> CHB
>
>
>
>
>
>
>
> On Mon, Feb 24, 2020 at 12:30 PM Allan Haldane <
> [hidden email]>
> wrote:
>
> > I have some thoughts on scalars from playing with ndarray ducktypes
> > (__array_function__), eg a MaskedArray ndarrayducktype, for which
> > I
> > wanted an associated "MaskedScalar" type.
> >
> > In summary, the ways scalars currently work makes ducktyping
> > (duckscalars) difficult:
> >
> > * numpy scalar types are not subclassable, so my duckscalars
> > aren't
> > subclasses of numpy scalars and aren't in the type hierarchy
> > * even if scalars were subclassable, I would have to subclass
> > each
> > scalar datatype individually to make masked versions
> > * lots of code checks `np.isinstance(var, np.float64)` which
> > breaks
> > for my duckscalars
> > * it was difficult to distinguish between a duckscalar and a
> > duck0d
> > array. The method I used in the end seems hacky.
> >
> > This has led to some daydreams about how scalars should work, and
> > also
> > led me last to read through your NEPs 40/41 with specific focus on
> > what
> > you said about scalars, and was about to post there until I saw
> > this
> > discussion. I agree with what you said in the NEPs about not making
> > scalars be dtype instances.
> >
> > Here is what ducktypes led me to:
> >
> > If we are able to do something like define a `np.numpy_scalar` type
> > covering all numpy scalars, which has a `.dtype` attribute like you
> > describe in the NEPs, then that would seem to solve the ducktype
> > problems above. Ducktype implementors would need to make a "duck
> > scalar"
> > type in parallel to their "duckndarray" type, but I found that to
> > be
> > pretty easy using an abstract class in my MaskedArray ducktype,
> > since
> > the MaskedArray and MaskedScalar share a lot of behavior.
> >
> > A numpy_scalar type would also help solve some objectarray
> > problems if
> > the object scalars are wrapped in the np_scalar type. A long time
> > ago I
> > started to try to fix up various funny/strange behaviors of object
> > datatypes, but there are lots of special cases, and the main
> > problem was
> > that the returned objects (eg from indexing) were not numpy types
> > and
> > did not support numpy attributes or indexing. Wrapping the returned
> > object in `np.numpy_scalar` might add an extra slight annoyance to
> > people who want to unwrap the object, but I think it would make
> > object
> > arrays less buggy and make code using object arrays easier to
> > reason
> > about and debug.
> >
> > Finally, a few random votes/comments based on the other emails on
> > the list:
> >
> > I think scalars have a place in numpy (rather than just reusing 0d
> > arrays), since there is a clear use in having hashable, immutable
> > scalars. Structured scalars should probably be immutable.
> >
> > I agree with your suggestion that scalars should not be indexable.
> > Thus,
> > my duckscalars (and proposed numpy_scalar) would not be indexable.
> > However, I think they should encode their datatype though a .dtype
> > attribute like ndarrays, rather than by inheritance.
> >
> > Also, something to think about is that currently numpy scalars
> > satisfy
> > the property `isinstance(np.float64(1), float)`, i.e they are
> > within the
> > python numerical type hierarchy. 0d arrays do not have this
> > property. My
> > proposal above would break this. I'm not sure what to think about
> > whether this is a good property to maintain or not.
> >
> > Cheers,
> > Allan
> >
> >
> >
> > On 2/21/20 8:37 PM, Sebastian Berg wrote:
> > > Hi all,
> > >
> > > When we create new datatypes, we have the option to make new
> > > choices
> > > for the new datatypes [0] (not the existing ones).
> > >
> > > The question is: Should every NumPy datatype have a scalar
> > > associated
> > > and should operations like indexing return a scalar or a 0D
> > > array?
> > >
> > > This is in my opinion a complex, almost philosophical, question,
> > > and we
> > > do not have to settle anything for a long time. But, if we do not
> > > decide a direction before we have many new datatypes the decision
> > > will
> > > make itself...
> > > So happy about any ideas, even if its just a gut feeling :).
> > >
> > > There are various points. I would like to mostly ignore the
> > > technical
> > > ones, but I am listing them anyway here:
> > >
> > > * Scalars are faster (although that can be optimized likely)
> > >
> > > * Scalars have a lower memory footprint
> > >
> > > * The current implementation incurs a technical debt in NumPy.
> > > (I do not think that is a general issue, though. We could
> > > automatically create scalars for each new datatype probably.)
> > >
> > > Advantages of having no scalars:
> > >
> > > * No need to keep track of scalars to preserve them in ufuncs,
> > > or
> > > libraries using `np.asarray`, do they need
> > > `np.asarray_or_scalar`?
> > > (or decide they return always arrays, although ufuncs may
> > > not)
> > >
> > > * Seems simpler in many ways, you always know the output will
> > > be an
> > > array if it has to do with NumPy.
> > >
> > > Advantages of having scalars:
> > >
> > > * Scalars are immutable and we are used to them from Python.
> > > A 0D array cannot be used as a dictionary key consistently
> > > [1].
> > >
> > > I.e. without scalars as first class citizen `dict[arr1d[0]]`
> > > cannot work, `dict[arr1d[0].item()]` may (if `.item()` is
> > > defined,
> > > and e.g. `dict[arr1d[0].frozen()]` could make a copy to work.
> > > [2]
> > >
> > > * Object arrays as we have them now make sense, `arr1d[0]` can
> > > reasonably return a Python object. I.e. arrays feel more like
> > > container if you can take elements out easily.
> > >
> > > Could go both ways:
> > >
> > > * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the
> > > array
> > > without scalars. With scalars `arr1d[0, ...]` clarifies the
> > > meaning. (In principle it is good to never use `arr2d[0]` to
> > > get a 1D slice, probably moreso if scalars exist.)
> > >
> > > Note: arrayscalars (the current NumPy scalars) are not useful in
> > > my
> > > opinion [3]. A scalar should not be indexed or have a shape. I do
> > > not
> > > believe in scalars pretending to be arrays.
> > >
> > > I personally tend towards liking scalars. If Python was a
> > > language
> > > where the array (arrayprogramming) concept was ingrained into
> > > the
> > > language itself, I would lean the other way. But users are used
> > > to
> > > scalars, and they "put" scalars into arrays. Array objects are in
> > > some
> > > ways strange in Python, and I feel not having scalars detaches
> > > them
> > > further.
> > >
> > > Having scalars, however also means we should preserve them. I
> > > feel in
> > > principle that is actually fairly straight forward. E.g. for
> > > ufuncs:
> > >
> > > * np.add(scalar, scalar) > scalar
> > > * np.add.reduce(arr, axis=None) > scalar
> > > * np.add.reduce(arr, axis=1) > array (even if arr is 1d)
> > > * np.add.reduce(scalar, axis=()) > array
> > >
> > > Of course libraries that do `np.asarray` would/could basically
> > > chose to
> > > not preserve scalars: Their signature is defined as taking
> > > strictly
> > > array input.
> > >
> > > Cheers,
> > >
> > > Sebastian
> > >
> > >
> > > [0] At best this can be a vision to decide which way they may
> > > evolve.
> > >
> > > [1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is
> > > arguably
> > > strange. E.g. Quantity defines hash correctly, but does not fully
> > > ensure immutability for 0D Quantities. Ensuring immutability in
> > > a
> > > world where "views" are a central concept requires a writeonly
> > > copy.
> > >
> > > [2] Arguably `.item()` would always return a scalar, but it would
> > > be a
> > > second class citizen. (Although if it returns a scalar, at least
> > > we
> > > already have a scalar implementation.)
> > >
> > > [3] They are necessary due to technical debt for NumPy datatypes
> > > though.
> > >
> > >
> > > _______________________________________________
> > > NumPyDiscussion mailing list
> > > [hidden email]
> > > https://mail.python.org/mailman/listinfo/numpydiscussion> > >
> >
> > _______________________________________________
> > NumPyDiscussion mailing list
> > [hidden email]
> > https://mail.python.org/mailman/listinfo/numpydiscussion> >
>
> _______________________________________________
> NumPyDiscussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpydiscussion_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion

