New DTypes: Are scalars a central concept in NumPy or not?

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

New DTypes: Are scalars a central concept in NumPy or not?

Sebastian Berg
Hi all,

When we create new datatypes, we have the option to make new choices
for the new datatypes [0] (not the existing ones).

The question is: Should every NumPy datatype have a scalar associated
and should operations like indexing return a scalar or a 0-D array?

This is in my opinion a complex, almost philosophical, question, and we
do not have to settle anything for a long time. But, if we do not
decide a direction before we have many new datatypes the decision will
make itself...
So happy about any ideas, even if its just a gut feeling :).

There are various points. I would like to mostly ignore the technical
ones, but I am listing them anyway here:

  * Scalars are faster (although that can be optimized likely)

  * Scalars have a lower memory footprint

  * The current implementation incurs a technical debt in NumPy.
    (I do not think that is a general issue, though. We could
    automatically create scalars for each new datatype probably.)

Advantages of having no scalars:

  * No need to keep track of scalars to preserve them in ufuncs, or
    libraries using `np.asarray`, do they need `np.asarray_or_scalar`?
    (or decide they return always arrays, although ufuncs may not)

  * Seems simpler in many ways, you always know the output will be an
    array if it has to do with NumPy.

Advantages of having scalars:

  * Scalars are immutable and we are used to them from Python.
    A 0-D array cannot be used as a dictionary key consistently [1].

    I.e. without scalars as first class citizen `dict[arr1d[0]]`
    cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined,
    and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]

  * Object arrays as we have them now make sense, `arr1d[0]` can
    reasonably return a Python object. I.e. arrays feel more like
    container if you can take elements out easily.

Could go both ways:

  * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array
    without scalars. With scalars `arr1d[0, ...]` clarifies the
    meaning. (In principle it is good to never use `arr2d[0]` to
    get a 1D slice, probably more-so if scalars exist.)

Note: array-scalars (the current NumPy scalars) are not useful in my
opinion [3]. A scalar should not be indexed or have a shape. I do not
believe in scalars pretending to be arrays.

I personally tend towards liking scalars.  If Python was a language
where the array (array-programming) concept was ingrained into the
language itself, I would lean the other way. But users are used to
scalars, and they "put" scalars into arrays. Array objects are in some
ways strange in Python, and I feel not having scalars detaches them
further.

Having scalars, however also means we should preserve them. I feel in
principle that is actually fairly straight forward. E.g. for ufuncs:

   * np.add(scalar, scalar) -> scalar
   * np.add.reduce(arr, axis=None) -> scalar
   * np.add.reduce(arr, axis=1) -> array (even if arr is 1d)
   * np.add.reduce(scalar, axis=()) -> array

Of course libraries that do `np.asarray` would/could basically chose to
not preserve scalars: Their signature is defined as taking strictly
array input.

Cheers,

Sebastian


[0] At best this can be a vision to decide which way they may evolve.

[1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably
strange. E.g. Quantity defines hash correctly, but does not fully
ensure immutability for 0-D Quantities. Ensuring immutability in a
world where "views" are a central concept requires a write-only copy.

[2] Arguably `.item()` would always return a scalar, but it would be a
second class citizen. (Although if it returns a scalar, at least we
already have a scalar implementation.)

[3] They are necessary due to technical debt for NumPy datatypes
though.

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: New DTypes: Are scalars a central concept in NumPy or not?

Juan Nunez-Iglesias-2
I personally have always found it weird and annoying to deal with 0-D arrays, so +1 for scalars!*

Juan

*: admittedly, I have almost no grasp of the underlying NumPy implementation complexities, but I will happily take Sebastian's word that scalars can be consistent with the library.

On Fri, 21 Feb 2020, at 7:37 PM, Sebastian Berg wrote:
Hi all,

When we create new datatypes, we have the option to make new choices
for the new datatypes [0] (not the existing ones).

The question is: Should every NumPy datatype have a scalar associated
and should operations like indexing return a scalar or a 0-D array?

This is in my opinion a complex, almost philosophical, question, and we
do not have to settle anything for a long time. But, if we do not
decide a direction before we have many new datatypes the decision will
make itself...
So happy about any ideas, even if its just a gut feeling :).

There are various points. I would like to mostly ignore the technical
ones, but I am listing them anyway here:

  * Scalars are faster (although that can be optimized likely)

  * Scalars have a lower memory footprint

  * The current implementation incurs a technical debt in NumPy.
    (I do not think that is a general issue, though. We could
    automatically create scalars for each new datatype probably.)

Advantages of having no scalars:

  * No need to keep track of scalars to preserve them in ufuncs, or
    libraries using `np.asarray`, do they need `np.asarray_or_scalar`?
    (or decide they return always arrays, although ufuncs may not)

  * Seems simpler in many ways, you always know the output will be an
    array if it has to do with NumPy.

Advantages of having scalars:

  * Scalars are immutable and we are used to them from Python.
    A 0-D array cannot be used as a dictionary key consistently [1].

    I.e. without scalars as first class citizen `dict[arr1d[0]]`
    cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined,
    and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]

  * Object arrays as we have them now make sense, `arr1d[0]` can
    reasonably return a Python object. I.e. arrays feel more like
    container if you can take elements out easily.

Could go both ways:

  * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array
    without scalars. With scalars `arr1d[0, ...]` clarifies the
    meaning. (In principle it is good to never use `arr2d[0]` to
    get a 1D slice, probably more-so if scalars exist.)

Note: array-scalars (the current NumPy scalars) are not useful in my
opinion [3]. A scalar should not be indexed or have a shape. I do not
believe in scalars pretending to be arrays.

I personally tend towards liking scalars.  If Python was a language
where the array (array-programming) concept was ingrained into the
language itself, I would lean the other way. But users are used to
scalars, and they "put" scalars into arrays. Array objects are in some
ways strange in Python, and I feel not having scalars detaches them
further.

Having scalars, however also means we should preserve them. I feel in
principle that is actually fairly straight forward. E.g. for ufuncs:

   * np.add(scalar, scalar) -> scalar
   * np.add.reduce(arr, axis=None) -> scalar
   * np.add.reduce(arr, axis=1) -> array (even if arr is 1d)
   * np.add.reduce(scalar, axis=()) -> array

Of course libraries that do `np.asarray` would/could basically chose to
not preserve scalars: Their signature is defined as taking strictly
array input.

Cheers,

Sebastian


[0] At best this can be a vision to decide which way they may evolve.

[1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably
strange. E.g. Quantity defines hash correctly, but does not fully 
ensure immutability for 0-D Quantities. Ensuring immutability in a
world where "views" are a central concept requires a write-only copy.

[2] Arguably `.item()` would always return a scalar, but it would be a
second class citizen. (Although if it returns a scalar, at least we
already have a scalar implementation.)

[3] They are necessary due to technical debt for NumPy datatypes
though.

_______________________________________________
NumPy-Discussion mailing list
https://mail.python.org/mailman/listinfo/numpy-discussion


Attachments:
  • signature.asc


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: New DTypes: Are scalars a central concept in NumPy or not?

Evgeni Burovski
In reply to this post by Sebastian Berg
Hi Sebastian,

Just to clarify the difference:

>>> x = np.float64(42)
>>> y = np.array(42, dtype=float)

Here `x` is a scalar and `y` is a 0D array, correct?
If that's the case, not having the former would be very confusing for
users (at least, that would be very confusing to me, FWIW).

If anything, I think it'd be cleaner to not have the latter, and only
have either scalars or 1D arrays (i.e., N-D arrays with N>=1), but it
is probably way too late to even think about it anyway.

Cheers,

Evgeni

On Sat, Feb 22, 2020 at 4:37 AM Sebastian Berg
<[hidden email]> wrote:

>
> Hi all,
>
> When we create new datatypes, we have the option to make new choices
> for the new datatypes [0] (not the existing ones).
>
> The question is: Should every NumPy datatype have a scalar associated
> and should operations like indexing return a scalar or a 0-D array?
>
> This is in my opinion a complex, almost philosophical, question, and we
> do not have to settle anything for a long time. But, if we do not
> decide a direction before we have many new datatypes the decision will
> make itself...
> So happy about any ideas, even if its just a gut feeling :).
>
> There are various points. I would like to mostly ignore the technical
> ones, but I am listing them anyway here:
>
>   * Scalars are faster (although that can be optimized likely)
>
>   * Scalars have a lower memory footprint
>
>   * The current implementation incurs a technical debt in NumPy.
>     (I do not think that is a general issue, though. We could
>     automatically create scalars for each new datatype probably.)
>
> Advantages of having no scalars:
>
>   * No need to keep track of scalars to preserve them in ufuncs, or
>     libraries using `np.asarray`, do they need `np.asarray_or_scalar`?
>     (or decide they return always arrays, although ufuncs may not)
>
>   * Seems simpler in many ways, you always know the output will be an
>     array if it has to do with NumPy.
>
> Advantages of having scalars:
>
>   * Scalars are immutable and we are used to them from Python.
>     A 0-D array cannot be used as a dictionary key consistently [1].
>
>     I.e. without scalars as first class citizen `dict[arr1d[0]]`
>     cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined,
>     and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]
>
>   * Object arrays as we have them now make sense, `arr1d[0]` can
>     reasonably return a Python object. I.e. arrays feel more like
>     container if you can take elements out easily.
>
> Could go both ways:
>
>   * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array
>     without scalars. With scalars `arr1d[0, ...]` clarifies the
>     meaning. (In principle it is good to never use `arr2d[0]` to
>     get a 1D slice, probably more-so if scalars exist.)
>
> Note: array-scalars (the current NumPy scalars) are not useful in my
> opinion [3]. A scalar should not be indexed or have a shape. I do not
> believe in scalars pretending to be arrays.
>
> I personally tend towards liking scalars.  If Python was a language
> where the array (array-programming) concept was ingrained into the
> language itself, I would lean the other way. But users are used to
> scalars, and they "put" scalars into arrays. Array objects are in some
> ways strange in Python, and I feel not having scalars detaches them
> further.
>
> Having scalars, however also means we should preserve them. I feel in
> principle that is actually fairly straight forward. E.g. for ufuncs:
>
>    * np.add(scalar, scalar) -> scalar
>    * np.add.reduce(arr, axis=None) -> scalar
>    * np.add.reduce(arr, axis=1) -> array (even if arr is 1d)
>    * np.add.reduce(scalar, axis=()) -> array
>
> Of course libraries that do `np.asarray` would/could basically chose to
> not preserve scalars: Their signature is defined as taking strictly
> array input.
>
> Cheers,
>
> Sebastian
>
>
> [0] At best this can be a vision to decide which way they may evolve.
>
> [1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably
> strange. E.g. Quantity defines hash correctly, but does not fully
> ensure immutability for 0-D Quantities. Ensuring immutability in a
> world where "views" are a central concept requires a write-only copy.
>
> [2] Arguably `.item()` would always return a scalar, but it would be a
> second class citizen. (Although if it returns a scalar, at least we
> already have a scalar implementation.)
>
> [3] They are necessary due to technical debt for NumPy datatypes
> though.
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: New DTypes: Are scalars a central concept in NumPy or not?

josef.pktd
not having a hashable tuple conversion would be a strong limitation

a = tuple(np.arange(5))
versus
a = tuple([np.array(i) for i in range(5)])
{a:5}

Josef

On Sat, Feb 22, 2020 at 9:28 AM Evgeni Burovski <[hidden email]> wrote:
Hi Sebastian,

Just to clarify the difference:

>>> x = np.float64(42)
>>> y = np.array(42, dtype=float)

Here `x` is a scalar and `y` is a 0D array, correct?
If that's the case, not having the former would be very confusing for
users (at least, that would be very confusing to me, FWIW).

If anything, I think it'd be cleaner to not have the latter, and only
have either scalars or 1D arrays (i.e., N-D arrays with N>=1), but it
is probably way too late to even think about it anyway.

Cheers,

Evgeni

On Sat, Feb 22, 2020 at 4:37 AM Sebastian Berg
<[hidden email]> wrote:
>
> Hi all,
>
> When we create new datatypes, we have the option to make new choices
> for the new datatypes [0] (not the existing ones).
>
> The question is: Should every NumPy datatype have a scalar associated
> and should operations like indexing return a scalar or a 0-D array?
>
> This is in my opinion a complex, almost philosophical, question, and we
> do not have to settle anything for a long time. But, if we do not
> decide a direction before we have many new datatypes the decision will
> make itself...
> So happy about any ideas, even if its just a gut feeling :).
>
> There are various points. I would like to mostly ignore the technical
> ones, but I am listing them anyway here:
>
>   * Scalars are faster (although that can be optimized likely)
>
>   * Scalars have a lower memory footprint
>
>   * The current implementation incurs a technical debt in NumPy.
>     (I do not think that is a general issue, though. We could
>     automatically create scalars for each new datatype probably.)
>
> Advantages of having no scalars:
>
>   * No need to keep track of scalars to preserve them in ufuncs, or
>     libraries using `np.asarray`, do they need `np.asarray_or_scalar`?
>     (or decide they return always arrays, although ufuncs may not)
>
>   * Seems simpler in many ways, you always know the output will be an
>     array if it has to do with NumPy.
>
> Advantages of having scalars:
>
>   * Scalars are immutable and we are used to them from Python.
>     A 0-D array cannot be used as a dictionary key consistently [1].
>
>     I.e. without scalars as first class citizen `dict[arr1d[0]]`
>     cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined,
>     and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]
>
>   * Object arrays as we have them now make sense, `arr1d[0]` can
>     reasonably return a Python object. I.e. arrays feel more like
>     container if you can take elements out easily.
>
> Could go both ways:
>
>   * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array
>     without scalars. With scalars `arr1d[0, ...]` clarifies the
>     meaning. (In principle it is good to never use `arr2d[0]` to
>     get a 1D slice, probably more-so if scalars exist.)
>
> Note: array-scalars (the current NumPy scalars) are not useful in my
> opinion [3]. A scalar should not be indexed or have a shape. I do not
> believe in scalars pretending to be arrays.
>
> I personally tend towards liking scalars.  If Python was a language
> where the array (array-programming) concept was ingrained into the
> language itself, I would lean the other way. But users are used to
> scalars, and they "put" scalars into arrays. Array objects are in some
> ways strange in Python, and I feel not having scalars detaches them
> further.
>
> Having scalars, however also means we should preserve them. I feel in
> principle that is actually fairly straight forward. E.g. for ufuncs:
>
>    * np.add(scalar, scalar) -> scalar
>    * np.add.reduce(arr, axis=None) -> scalar
>    * np.add.reduce(arr, axis=1) -> array (even if arr is 1d)
>    * np.add.reduce(scalar, axis=()) -> array
>
> Of course libraries that do `np.asarray` would/could basically chose to
> not preserve scalars: Their signature is defined as taking strictly
> array input.
>
> Cheers,
>
> Sebastian
>
>
> [0] At best this can be a vision to decide which way they may evolve.
>
> [1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably
> strange. E.g. Quantity defines hash correctly, but does not fully
> ensure immutability for 0-D Quantities. Ensuring immutability in a
> world where "views" are a central concept requires a write-only copy.
>
> [2] Arguably `.item()` would always return a scalar, but it would be a
> second class citizen. (Although if it returns a scalar, at least we
> already have a scalar implementation.)
>
> [3] They are necessary due to technical debt for NumPy datatypes
> though.
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: New DTypes: Are scalars a central concept in NumPy or not?

josef.pktd


On Sat, Feb 22, 2020 at 9:34 AM <[hidden email]> wrote:
not having a hashable tuple conversion would be a strong limitation

a = tuple(np.arange(5))
versus
a = tuple([np.array(i) for i in range(5)])
{a:5}

also there is the question of which scalar

.item() versus [()]

This was used in the old times in scipy.stats, and I just saw

aside:
AFAIR, I use 0-dim arrays also to ensure that I have a numpy dtype and not, e.g. some equivalent python type

Josef
 

Josef

On Sat, Feb 22, 2020 at 9:28 AM Evgeni Burovski <[hidden email]> wrote:
Hi Sebastian,

Just to clarify the difference:

>>> x = np.float64(42)
>>> y = np.array(42, dtype=float)

Here `x` is a scalar and `y` is a 0D array, correct?
If that's the case, not having the former would be very confusing for
users (at least, that would be very confusing to me, FWIW).

If anything, I think it'd be cleaner to not have the latter, and only
have either scalars or 1D arrays (i.e., N-D arrays with N>=1), but it
is probably way too late to even think about it anyway.

Cheers,

Evgeni

On Sat, Feb 22, 2020 at 4:37 AM Sebastian Berg
<[hidden email]> wrote:
>
> Hi all,
>
> When we create new datatypes, we have the option to make new choices
> for the new datatypes [0] (not the existing ones).
>
> The question is: Should every NumPy datatype have a scalar associated
> and should operations like indexing return a scalar or a 0-D array?
>
> This is in my opinion a complex, almost philosophical, question, and we
> do not have to settle anything for a long time. But, if we do not
> decide a direction before we have many new datatypes the decision will
> make itself...
> So happy about any ideas, even if its just a gut feeling :).
>
> There are various points. I would like to mostly ignore the technical
> ones, but I am listing them anyway here:
>
>   * Scalars are faster (although that can be optimized likely)
>
>   * Scalars have a lower memory footprint
>
>   * The current implementation incurs a technical debt in NumPy.
>     (I do not think that is a general issue, though. We could
>     automatically create scalars for each new datatype probably.)
>
> Advantages of having no scalars:
>
>   * No need to keep track of scalars to preserve them in ufuncs, or
>     libraries using `np.asarray`, do they need `np.asarray_or_scalar`?
>     (or decide they return always arrays, although ufuncs may not)
>
>   * Seems simpler in many ways, you always know the output will be an
>     array if it has to do with NumPy.
>
> Advantages of having scalars:
>
>   * Scalars are immutable and we are used to them from Python.
>     A 0-D array cannot be used as a dictionary key consistently [1].
>
>     I.e. without scalars as first class citizen `dict[arr1d[0]]`
>     cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined,
>     and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]
>
>   * Object arrays as we have them now make sense, `arr1d[0]` can
>     reasonably return a Python object. I.e. arrays feel more like
>     container if you can take elements out easily.
>
> Could go both ways:
>
>   * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array
>     without scalars. With scalars `arr1d[0, ...]` clarifies the
>     meaning. (In principle it is good to never use `arr2d[0]` to
>     get a 1D slice, probably more-so if scalars exist.)
>
> Note: array-scalars (the current NumPy scalars) are not useful in my
> opinion [3]. A scalar should not be indexed or have a shape. I do not
> believe in scalars pretending to be arrays.
>
> I personally tend towards liking scalars.  If Python was a language
> where the array (array-programming) concept was ingrained into the
> language itself, I would lean the other way. But users are used to
> scalars, and they "put" scalars into arrays. Array objects are in some
> ways strange in Python, and I feel not having scalars detaches them
> further.
>
> Having scalars, however also means we should preserve them. I feel in
> principle that is actually fairly straight forward. E.g. for ufuncs:
>
>    * np.add(scalar, scalar) -> scalar
>    * np.add.reduce(arr, axis=None) -> scalar
>    * np.add.reduce(arr, axis=1) -> array (even if arr is 1d)
>    * np.add.reduce(scalar, axis=()) -> array
>
> Of course libraries that do `np.asarray` would/could basically chose to
> not preserve scalars: Their signature is defined as taking strictly
> array input.
>
> Cheers,
>
> Sebastian
>
>
> [0] At best this can be a vision to decide which way they may evolve.
>
> [1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably
> strange. E.g. Quantity defines hash correctly, but does not fully
> ensure immutability for 0-D Quantities. Ensuring immutability in a
> world where "views" are a central concept requires a write-only copy.
>
> [2] Arguably `.item()` would always return a scalar, but it would be a
> second class citizen. (Although if it returns a scalar, at least we
> already have a scalar implementation.)
>
> [3] They are necessary due to technical debt for NumPy datatypes
> though.
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: New DTypes: Are scalars a central concept in NumPy or not?

josef.pktd


On Sat, Feb 22, 2020 at 9:41 AM <[hidden email]> wrote:


On Sat, Feb 22, 2020 at 9:34 AM <[hidden email]> wrote:
not having a hashable tuple conversion would be a strong limitation

a = tuple(np.arange(5))
versus
a = tuple([np.array(i) for i in range(5)])
{a:5}

also there is the question of which scalar

.item() versus [()]

This was used in the old times in scipy.stats, and I just saw

aside:
AFAIR, I use 0-dim arrays also to ensure that I have a numpy dtype and not, e.g. some equivalent python type

0-dim as mutable pseudo-scalar


a = np.asarray(5)
a, id(a)
(array(5), 844574884528)

a[()] = 1
a, id(a)
(array(1), 844574884528)

maybe I never used that, 
In a recent similar case, I could use just a 1-d list or array to work around python's muting or mutability behavior


Josef
 

Josef

On Sat, Feb 22, 2020 at 9:28 AM Evgeni Burovski <[hidden email]> wrote:
Hi Sebastian,

Just to clarify the difference:

>>> x = np.float64(42)
>>> y = np.array(42, dtype=float)

Here `x` is a scalar and `y` is a 0D array, correct?
If that's the case, not having the former would be very confusing for
users (at least, that would be very confusing to me, FWIW).

If anything, I think it'd be cleaner to not have the latter, and only
have either scalars or 1D arrays (i.e., N-D arrays with N>=1), but it
is probably way too late to even think about it anyway.

Cheers,

Evgeni

On Sat, Feb 22, 2020 at 4:37 AM Sebastian Berg
<[hidden email]> wrote:
>
> Hi all,
>
> When we create new datatypes, we have the option to make new choices
> for the new datatypes [0] (not the existing ones).
>
> The question is: Should every NumPy datatype have a scalar associated
> and should operations like indexing return a scalar or a 0-D array?
>
> This is in my opinion a complex, almost philosophical, question, and we
> do not have to settle anything for a long time. But, if we do not
> decide a direction before we have many new datatypes the decision will
> make itself...
> So happy about any ideas, even if its just a gut feeling :).
>
> There are various points. I would like to mostly ignore the technical
> ones, but I am listing them anyway here:
>
>   * Scalars are faster (although that can be optimized likely)
>
>   * Scalars have a lower memory footprint
>
>   * The current implementation incurs a technical debt in NumPy.
>     (I do not think that is a general issue, though. We could
>     automatically create scalars for each new datatype probably.)
>
> Advantages of having no scalars:
>
>   * No need to keep track of scalars to preserve them in ufuncs, or
>     libraries using `np.asarray`, do they need `np.asarray_or_scalar`?
>     (or decide they return always arrays, although ufuncs may not)
>
>   * Seems simpler in many ways, you always know the output will be an
>     array if it has to do with NumPy.
>
> Advantages of having scalars:
>
>   * Scalars are immutable and we are used to them from Python.
>     A 0-D array cannot be used as a dictionary key consistently [1].
>
>     I.e. without scalars as first class citizen `dict[arr1d[0]]`
>     cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined,
>     and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]
>
>   * Object arrays as we have them now make sense, `arr1d[0]` can
>     reasonably return a Python object. I.e. arrays feel more like
>     container if you can take elements out easily.
>
> Could go both ways:
>
>   * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array
>     without scalars. With scalars `arr1d[0, ...]` clarifies the
>     meaning. (In principle it is good to never use `arr2d[0]` to
>     get a 1D slice, probably more-so if scalars exist.)
>
> Note: array-scalars (the current NumPy scalars) are not useful in my
> opinion [3]. A scalar should not be indexed or have a shape. I do not
> believe in scalars pretending to be arrays.
>
> I personally tend towards liking scalars.  If Python was a language
> where the array (array-programming) concept was ingrained into the
> language itself, I would lean the other way. But users are used to
> scalars, and they "put" scalars into arrays. Array objects are in some
> ways strange in Python, and I feel not having scalars detaches them
> further.
>
> Having scalars, however also means we should preserve them. I feel in
> principle that is actually fairly straight forward. E.g. for ufuncs:
>
>    * np.add(scalar, scalar) -> scalar
>    * np.add.reduce(arr, axis=None) -> scalar
>    * np.add.reduce(arr, axis=1) -> array (even if arr is 1d)
>    * np.add.reduce(scalar, axis=()) -> array
>
> Of course libraries that do `np.asarray` would/could basically chose to
> not preserve scalars: Their signature is defined as taking strictly
> array input.
>
> Cheers,
>
> Sebastian
>
>
> [0] At best this can be a vision to decide which way they may evolve.
>
> [1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably
> strange. E.g. Quantity defines hash correctly, but does not fully
> ensure immutability for 0-D Quantities. Ensuring immutability in a
> world where "views" are a central concept requires a write-only copy.
>
> [2] Arguably `.item()` would always return a scalar, but it would be a
> second class citizen. (Although if it returns a scalar, at least we
> already have a scalar implementation.)
>
> [3] They are necessary due to technical debt for NumPy datatypes
> though.
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: New DTypes: Are scalars a central concept in NumPy or not?

Nathaniel Smith
In reply to this post by Sebastian Berg
Off the cuff, my intuition is that dtypes will want to be able to
define how scalar indexing works, and let it return objects other than
arrays. So e.g.:

- some dtypes might just return a zero-d array
- some dtypes might want to return some arbitrary domain-appropriate
type, like a datetime dtype might want to return datetime.datetime
objects (like how dtype(object) works now)
- some dtypes might want to go to all the trouble to define immutable
duck-array "scalar" types (like how dtype(float) and friends work now)

But I don't think we need to give that last case any special
privileges in the dtype system. For example, I don't think we need to
mandate that everyone who defines their own dtype MUST also implement
a custom duck-array type to act as the scalars, or build a whole
complex system to auto-generate such types given an arbitrary
user-defined dtype.

-n

On Fri, Feb 21, 2020 at 5:37 PM Sebastian Berg
<[hidden email]> wrote:

>
> Hi all,
>
> When we create new datatypes, we have the option to make new choices
> for the new datatypes [0] (not the existing ones).
>
> The question is: Should every NumPy datatype have a scalar associated
> and should operations like indexing return a scalar or a 0-D array?
>
> This is in my opinion a complex, almost philosophical, question, and we
> do not have to settle anything for a long time. But, if we do not
> decide a direction before we have many new datatypes the decision will
> make itself...
> So happy about any ideas, even if its just a gut feeling :).
>
> There are various points. I would like to mostly ignore the technical
> ones, but I am listing them anyway here:
>
>   * Scalars are faster (although that can be optimized likely)
>
>   * Scalars have a lower memory footprint
>
>   * The current implementation incurs a technical debt in NumPy.
>     (I do not think that is a general issue, though. We could
>     automatically create scalars for each new datatype probably.)
>
> Advantages of having no scalars:
>
>   * No need to keep track of scalars to preserve them in ufuncs, or
>     libraries using `np.asarray`, do they need `np.asarray_or_scalar`?
>     (or decide they return always arrays, although ufuncs may not)
>
>   * Seems simpler in many ways, you always know the output will be an
>     array if it has to do with NumPy.
>
> Advantages of having scalars:
>
>   * Scalars are immutable and we are used to them from Python.
>     A 0-D array cannot be used as a dictionary key consistently [1].
>
>     I.e. without scalars as first class citizen `dict[arr1d[0]]`
>     cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined,
>     and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]
>
>   * Object arrays as we have them now make sense, `arr1d[0]` can
>     reasonably return a Python object. I.e. arrays feel more like
>     container if you can take elements out easily.
>
> Could go both ways:
>
>   * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array
>     without scalars. With scalars `arr1d[0, ...]` clarifies the
>     meaning. (In principle it is good to never use `arr2d[0]` to
>     get a 1D slice, probably more-so if scalars exist.)
>
> Note: array-scalars (the current NumPy scalars) are not useful in my
> opinion [3]. A scalar should not be indexed or have a shape. I do not
> believe in scalars pretending to be arrays.
>
> I personally tend towards liking scalars.  If Python was a language
> where the array (array-programming) concept was ingrained into the
> language itself, I would lean the other way. But users are used to
> scalars, and they "put" scalars into arrays. Array objects are in some
> ways strange in Python, and I feel not having scalars detaches them
> further.
>
> Having scalars, however also means we should preserve them. I feel in
> principle that is actually fairly straight forward. E.g. for ufuncs:
>
>    * np.add(scalar, scalar) -> scalar
>    * np.add.reduce(arr, axis=None) -> scalar
>    * np.add.reduce(arr, axis=1) -> array (even if arr is 1d)
>    * np.add.reduce(scalar, axis=()) -> array
>
> Of course libraries that do `np.asarray` would/could basically chose to
> not preserve scalars: Their signature is defined as taking strictly
> array input.
>
> Cheers,
>
> Sebastian
>
>
> [0] At best this can be a vision to decide which way they may evolve.
>
> [1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably
> strange. E.g. Quantity defines hash correctly, but does not fully
> ensure immutability for 0-D Quantities. Ensuring immutability in a
> world where "views" are a central concept requires a write-only copy.
>
> [2] Arguably `.item()` would always return a scalar, but it would be a
> second class citizen. (Although if it returns a scalar, at least we
> already have a scalar implementation.)
>
> [3] They are necessary due to technical debt for NumPy datatypes
> though.
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion



--
Nathaniel J. Smith -- https://vorpus.org
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: New DTypes: Are scalars a central concept in NumPy or not?

Hameer Abbasi
In reply to this post by Sebastian Berg
Hi, Sebastian,

On 22.02.20, 02:37, "NumPy-Discussion on behalf of Sebastian Berg" <numpy-discussion-bounces+hameerabbasi=[hidden email] on behalf of [hidden email]> wrote:

    Hi all,
   
    When we create new datatypes, we have the option to make new choices
    for the new datatypes [0] (not the existing ones).
   
    The question is: Should every NumPy datatype have a scalar associated
    and should operations like indexing return a scalar or a 0-D array?
   
    This is in my opinion a complex, almost philosophical, question, and we
    do not have to settle anything for a long time. But, if we do not
    decide a direction before we have many new datatypes the decision will
    make itself...
    So happy about any ideas, even if its just a gut feeling :).
   
    There are various points. I would like to mostly ignore the technical
    ones, but I am listing them anyway here:
   
      * Scalars are faster (although that can be optimized likely)
   
      * Scalars have a lower memory footprint
   
      * The current implementation incurs a technical debt in NumPy.
        (I do not think that is a general issue, though. We could
        automatically create scalars for each new datatype probably.)
   
    Advantages of having no scalars:
   
      * No need to keep track of scalars to preserve them in ufuncs, or
        libraries using `np.asarray`, do they need `np.asarray_or_scalar`?
        (or decide they return always arrays, although ufuncs may not)
   
      * Seems simpler in many ways, you always know the output will be an
        array if it has to do with NumPy.
   
    Advantages of having scalars:
   
      * Scalars are immutable and we are used to them from Python.
        A 0-D array cannot be used as a dictionary key consistently [1].
   
        I.e. without scalars as first class citizen `dict[arr1d[0]]`
        cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined,
        and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]
   
      * Object arrays as we have them now make sense, `arr1d[0]` can
        reasonably return a Python object. I.e. arrays feel more like
        container if you can take elements out easily.
   
    Could go both ways:
   
      * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array
        without scalars. With scalars `arr1d[0, ...]` clarifies the
        meaning. (In principle it is good to never use `arr2d[0]` to
        get a 1D slice, probably more-so if scalars exist.)

From a usability perspective, one could argue that if the dimension of the array one is indexing into is known and the user isn't advanced, then the behavior expected is one of scalars and not 0D arrays. If, however, the input dimension is unknown, then the behavior switch at 0D and the need for an extra ellipsis to ensure array-ness makes things confusing to regular users. I am file with the current behavior of indexing, as anything else would likely be a large backwards-compat break.

   
    Note: array-scalars (the current NumPy scalars) are not useful in my
    opinion [3]. A scalar should not be indexed or have a shape. I do not
    believe in scalars pretending to be arrays.
   
    I personally tend towards liking scalars.  If Python was a language
    where the array (array-programming) concept was ingrained into the
    language itself, I would lean the other way. But users are used to
    scalars, and they "put" scalars into arrays. Array objects are in some
    ways strange in Python, and I feel not having scalars detaches them
    further.
   
    Having scalars, however also means we should preserve them. I feel in
    principle that is actually fairly straight forward. E.g. for ufuncs:
   
       * np.add(scalar, scalar) -> scalar
       * np.add.reduce(arr, axis=None) -> scalar
       * np.add.reduce(arr, axis=1) -> array (even if arr is 1d)
       * np.add.reduce(scalar, axis=()) -> array

I love this idea.
   
    Of course libraries that do `np.asarray` would/could basically chose to
    not preserve scalars: Their signature is defined as taking strictly
    array input.
   
    Cheers,
   
    Sebastian
   
   
    [0] At best this can be a vision to decide which way they may evolve.
   
    [1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably
    strange. E.g. Quantity defines hash correctly, but does not fully
    ensure immutability for 0-D Quantities. Ensuring immutability in a
    world where "views" are a central concept requires a write-only copy.
   
    [2] Arguably `.item()` would always return a scalar, but it would be a
    second class citizen. (Although if it returns a scalar, at least we
    already have a scalar implementation.)
   
    [3] They are necessary due to technical debt for NumPy datatypes
    though.
    _______________________________________________
    NumPy-Discussion mailing list
    [hidden email]
    https://mail.python.org/mailman/listinfo/numpy-discussion
   
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: New DTypes: Are scalars a central concept in NumPy or not?

Sebastian Berg
In reply to this post by Nathaniel Smith
On Sat, 2020-02-22 at 13:28 -0800, Nathaniel Smith wrote:

> Off the cuff, my intuition is that dtypes will want to be able to
> define how scalar indexing works, and let it return objects other
> than
> arrays. So e.g.:
>
> - some dtypes might just return a zero-d array
> - some dtypes might want to return some arbitrary domain-appropriate
> type, like a datetime dtype might want to return datetime.datetime
> objects (like how dtype(object) works now)
> - some dtypes might want to go to all the trouble to define immutable
> duck-array "scalar" types (like how dtype(float) and friends work
> now)
Right, my assumption is that whatever we suggest is going to be what
most will choose, so we have the chance to move in a certain direction
and set a standard. This is to make code which may or may not deal with
0-D arrays more reliable (more below).

>
> But I don't think we need to give that last case any special
> privileges in the dtype system. For example, I don't think we need to
> mandate that everyone who defines their own dtype MUST also implement
> a custom duck-array type to act as the scalars, or build a whole
> complex system to auto-generate such types given an arbitrary
> user-defined dtype.

(Note that "autogenerating" would be nothing more than a write-only 0-D
array, which does not implement indexing.)


There are also categoricals, for which the type may just be "object" in
practice (you could define it closer, but it seems unlikely to be
useful). And for simple numerical types, if we go the `.item()` path,
it is arguably fine if the type is just a python type.

Maybe the crux of the problem is actuall that in general
`np.asarray(arr1d[0])` does not roundtrip for the current object dtype,
and only partially for a categorical above.
As such that is fine, but right now it is hard to tell when you will
have a scalar and when a 0D array.

Maybe it is better to talk about a potentially new `np.pyobject[type]`
datatype (i.e. an object datatype with all elements having the same
python type).
Currently writing generic code with the object dtype is tricky, because
we randomly return the object instead of arrays.
What would be the preference for such a specific dtype?

   * arr1d[0] -> scalar or array?
   * np.add(scalar, scalar) -> scalar or array
   * np.add.reduce(arr) -> scalar or array?

I think the `np.add` case we can decide fairly independently. The main
thing is the indexing. Would we want to force a `.item()` call or not?
Forcing `.item()` is in many ways simpler, I am unsure whether it would
be inconvenient often.

And, maybe the answer is just that for datatypes that do not round-trip
easily, `.item()` is probably preferable, and for datatypes that do
round-trip scalars are fine.

- Sebastian



>
> On Fri, Feb 21, 2020 at 5:37 PM Sebastian Berg
> <[hidden email]> wrote:
> > Hi all,
> >
> > When we create new datatypes, we have the option to make new
> > choices
> > for the new datatypes [0] (not the existing ones).
> >
> > The question is: Should every NumPy datatype have a scalar
> > associated
> > and should operations like indexing return a scalar or a 0-D array?
> >
> > This is in my opinion a complex, almost philosophical, question,
> > and we
> > do not have to settle anything for a long time. But, if we do not
> > decide a direction before we have many new datatypes the decision
> > will
> > make itself...
> > So happy about any ideas, even if its just a gut feeling :).
> >
> > There are various points. I would like to mostly ignore the
> > technical
> > ones, but I am listing them anyway here:
> >
> >   * Scalars are faster (although that can be optimized likely)
> >
> >   * Scalars have a lower memory footprint
> >
> >   * The current implementation incurs a technical debt in NumPy.
> >     (I do not think that is a general issue, though. We could
> >     automatically create scalars for each new datatype probably.)
> >
> > Advantages of having no scalars:
> >
> >   * No need to keep track of scalars to preserve them in ufuncs, or
> >     libraries using `np.asarray`, do they need
> > `np.asarray_or_scalar`?
> >     (or decide they return always arrays, although ufuncs may not)
> >
> >   * Seems simpler in many ways, you always know the output will be
> > an
> >     array if it has to do with NumPy.
> >
> > Advantages of having scalars:
> >
> >   * Scalars are immutable and we are used to them from Python.
> >     A 0-D array cannot be used as a dictionary key consistently
> > [1].
> >
> >     I.e. without scalars as first class citizen `dict[arr1d[0]]`
> >     cannot work, `dict[arr1d[0].item()]` may (if `.item()` is
> > defined,
> >     and e.g. `dict[arr1d[0].frozen()]` could make a copy to work.
> > [2]
> >
> >   * Object arrays as we have them now make sense, `arr1d[0]` can
> >     reasonably return a Python object. I.e. arrays feel more like
> >     container if you can take elements out easily.
> >
> > Could go both ways:
> >
> >   * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array
> >     without scalars. With scalars `arr1d[0, ...]` clarifies the
> >     meaning. (In principle it is good to never use `arr2d[0]` to
> >     get a 1D slice, probably more-so if scalars exist.)
> >
> > Note: array-scalars (the current NumPy scalars) are not useful in
> > my
> > opinion [3]. A scalar should not be indexed or have a shape. I do
> > not
> > believe in scalars pretending to be arrays.
> >
> > I personally tend towards liking scalars.  If Python was a language
> > where the array (array-programming) concept was ingrained into the
> > language itself, I would lean the other way. But users are used to
> > scalars, and they "put" scalars into arrays. Array objects are in
> > some
> > ways strange in Python, and I feel not having scalars detaches them
> > further.
> >
> > Having scalars, however also means we should preserve them. I feel
> > in
> > principle that is actually fairly straight forward. E.g. for
> > ufuncs:
> >
> >    * np.add(scalar, scalar) -> scalar
> >    * np.add.reduce(arr, axis=None) -> scalar
> >    * np.add.reduce(arr, axis=1) -> array (even if arr is 1d)
> >    * np.add.reduce(scalar, axis=()) -> array
> >
> > Of course libraries that do `np.asarray` would/could basically
> > chose to
> > not preserve scalars: Their signature is defined as taking strictly
> > array input.
> >
> > Cheers,
> >
> > Sebastian
> >
> >
> > [0] At best this can be a vision to decide which way they may
> > evolve.
> >
> > [1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is
> > arguably
> > strange. E.g. Quantity defines hash correctly, but does not fully
> > ensure immutability for 0-D Quantities. Ensuring immutability in a
> > world where "views" are a central concept requires a write-only
> > copy.
> >
> > [2] Arguably `.item()` would always return a scalar, but it would
> > be a
> > second class citizen. (Although if it returns a scalar, at least we
> > already have a scalar implementation.)
> >
> > [3] They are necessary due to technical debt for NumPy datatypes
> > though.
> > _______________________________________________
> > NumPy-Discussion mailing list
> > [hidden email]
> > https://mail.python.org/mailman/listinfo/numpy-discussion
>
>

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: New DTypes: Are scalars a central concept in NumPy or not?

Allan Haldane
In reply to this post by Sebastian Berg
I have some thoughts on scalars from playing with ndarray ducktypes
(__array_function__), eg a MaskedArray ndarray-ducktype, for which I
wanted an associated "MaskedScalar" type.

In summary, the ways scalars currently work makes ducktyping
(duck-scalars) difficult:

  * numpy scalar types are not subclassable, so my duck-scalars aren't
    subclasses of numpy scalars and aren't in the type hierarchy
  * even if scalars were subclassable, I would have to subclass each
    scalar datatype individually to make masked versions
  * lots of code checks  `np.isinstance(var, np.float64)` which breaks
    for my duck-scalars
  * it was difficult to distinguish between a duck-scalar and a duck-0d
    array. The method I used in the end seems hacky.

This has led to some daydreams about how scalars should work, and also
led me last to read through your NEPs 40/41 with specific focus on what
you said about scalars, and was about to post there until I saw this
discussion. I agree with what you said in the NEPs about not making
scalars be dtype instances.

Here is what ducktypes led me to:

If we are able to do something like define a `np.numpy_scalar` type
covering all numpy scalars, which has a `.dtype` attribute like you
describe in the NEPs, then that would seem to solve the ducktype
problems above. Ducktype implementors would need to make a "duck-scalar"
type in parallel to their "duck-ndarray" type, but I found that to be
pretty easy using an abstract class in my MaskedArray ducktype, since
the MaskedArray and MaskedScalar share a lot of behavior.

A numpy_scalar type would also help solve some object-array problems if
the object scalars are wrapped in the np_scalar type. A long time ago I
started to try to fix up various funny/strange behaviors of object
datatypes, but there are lots of special cases, and the main problem was
that the returned objects (eg from indexing) were not numpy types and
did not support numpy attributes or indexing. Wrapping the returned
object in `np.numpy_scalar` might add an extra slight annoyance to
people who want to unwrap the object, but I think it would make object
arrays less buggy and make code using object arrays easier to reason
about and debug.

Finally, a few random votes/comments based on the other emails on the list:

I think scalars have a place in numpy (rather than just reusing 0d
arrays), since there is a clear use in having hashable, immutable
scalars. Structured scalars should probably be immutable.

I agree with your suggestion that scalars should not be indexable. Thus,
my duck-scalars (and proposed numpy_scalar) would not be indexable.
However, I think they should encode their datatype though a .dtype
attribute like ndarrays, rather than by inheritance.

Also, something to think about is that currently numpy scalars satisfy
the property `isinstance(np.float64(1), float)`, i.e they are within the
python numerical type hierarchy. 0d arrays do not have this property. My
proposal above would break this. I'm not sure what to think about
whether this is a good property to maintain or not.

Cheers,
Allan



On 2/21/20 8:37 PM, Sebastian Berg wrote:

> Hi all,
>
> When we create new datatypes, we have the option to make new choices
> for the new datatypes [0] (not the existing ones).
>
> The question is: Should every NumPy datatype have a scalar associated
> and should operations like indexing return a scalar or a 0-D array?
>
> This is in my opinion a complex, almost philosophical, question, and we
> do not have to settle anything for a long time. But, if we do not
> decide a direction before we have many new datatypes the decision will
> make itself...
> So happy about any ideas, even if its just a gut feeling :).
>
> There are various points. I would like to mostly ignore the technical
> ones, but I am listing them anyway here:
>
>   * Scalars are faster (although that can be optimized likely)
>
>   * Scalars have a lower memory footprint
>
>   * The current implementation incurs a technical debt in NumPy.
>     (I do not think that is a general issue, though. We could
>     automatically create scalars for each new datatype probably.)
>
> Advantages of having no scalars:
>
>   * No need to keep track of scalars to preserve them in ufuncs, or
>     libraries using `np.asarray`, do they need `np.asarray_or_scalar`?
>     (or decide they return always arrays, although ufuncs may not)
>
>   * Seems simpler in many ways, you always know the output will be an
>     array if it has to do with NumPy.
>
> Advantages of having scalars:
>
>   * Scalars are immutable and we are used to them from Python.
>     A 0-D array cannot be used as a dictionary key consistently [1].
>
>     I.e. without scalars as first class citizen `dict[arr1d[0]]`
>     cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined,
>     and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]
>
>   * Object arrays as we have them now make sense, `arr1d[0]` can
>     reasonably return a Python object. I.e. arrays feel more like
>     container if you can take elements out easily.
>
> Could go both ways:
>
>   * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array
>     without scalars. With scalars `arr1d[0, ...]` clarifies the
>     meaning. (In principle it is good to never use `arr2d[0]` to
>     get a 1D slice, probably more-so if scalars exist.)
>
> Note: array-scalars (the current NumPy scalars) are not useful in my
> opinion [3]. A scalar should not be indexed or have a shape. I do not
> believe in scalars pretending to be arrays.
>
> I personally tend towards liking scalars.  If Python was a language
> where the array (array-programming) concept was ingrained into the
> language itself, I would lean the other way. But users are used to
> scalars, and they "put" scalars into arrays. Array objects are in some
> ways strange in Python, and I feel not having scalars detaches them
> further.
>
> Having scalars, however also means we should preserve them. I feel in
> principle that is actually fairly straight forward. E.g. for ufuncs:
>
>    * np.add(scalar, scalar) -> scalar
>    * np.add.reduce(arr, axis=None) -> scalar
>    * np.add.reduce(arr, axis=1) -> array (even if arr is 1d)
>    * np.add.reduce(scalar, axis=()) -> array
>
> Of course libraries that do `np.asarray` would/could basically chose to
> not preserve scalars: Their signature is defined as taking strictly
> array input.
>
> Cheers,
>
> Sebastian
>
>
> [0] At best this can be a vision to decide which way they may evolve.
>
> [1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably
> strange. E.g. Quantity defines hash correctly, but does not fully
> ensure immutability for 0-D Quantities. Ensuring immutability in a
> world where "views" are a central concept requires a write-only copy.
>
> [2] Arguably `.item()` would always return a scalar, but it would be a
> second class citizen. (Although if it returns a scalar, at least we
> already have a scalar implementation.)
>
> [3] They are necessary due to technical debt for NumPy datatypes
> though.
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
>

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: New DTypes: Are scalars a central concept in NumPy or not?

Stefano Miccoli
In reply to this post by Sebastian Berg
The fact that `isinstance(np.float64(1), float)` raises the problem that the current
implementation of np.float64 scalars breaks the Liskov substitution principle:
`sequence_or_array[round(x)]` works if `x` is a float, but breaks down if x is
a np.float64.

See https://github.com/numpy/numpy/issues/11810, where the issue is discussed in the
broader setting of the semantics of `np.round` vs. python3 `round`.

I do not have a strong opinion here, except that if np.float64’s are within the python
number hierarchy they should be PEP 3141 compliant (which currently they are not.)

Stefano

> On 25 Feb 2020, at 00:03, [hidden email] wrote:
>
> Also, something to think about is that currently numpy scalars satisfy
> the property `isinstance(np.float64(1), float)`, i.e they are within the
> python numerical type hierarchy. 0d arrays do not have this property. My
> proposal above would break this. I'm not sure what to think about
> whether this is a good property to maintain or not.
>
> Cheers,
> Allan

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: New DTypes: Are scalars a central concept in NumPy or not?

Chris Barker - NOAA Federal
In reply to this post by Allan Haldane
I've always found the duality of zero-d arrays an scalars confusing, and I'm sure I'm not alone.

Having both is just plain weird.

But, backward compatibility aside, could we have ONLY Scalars?

When we index into an array, the dimensionality is reduced by one, so indexing into a 1D array has to get us something: but the zero-d array is a really weird object -- do we really need it?

There is certainly a need for more numpy-like scalars: more than the built in data types, and some handy attributes and methods, like dtype, .itemsize, etc. But could we make an enhanced scalar that had everything we actually need from a zero-d array?

The key point would be mutability -- but do we really need mutable scalars? I can't think of any time I've needed that, when I couldn't have used a 1-d array of length 1. 

Is there a use case for zero-d arrays that could not be met with an enhanced scalar?

-CHB







On Mon, Feb 24, 2020 at 12:30 PM Allan Haldane <[hidden email]> wrote:
I have some thoughts on scalars from playing with ndarray ducktypes
(__array_function__), eg a MaskedArray ndarray-ducktype, for which I
wanted an associated "MaskedScalar" type.

In summary, the ways scalars currently work makes ducktyping
(duck-scalars) difficult:

  * numpy scalar types are not subclassable, so my duck-scalars aren't
    subclasses of numpy scalars and aren't in the type hierarchy
  * even if scalars were subclassable, I would have to subclass each
    scalar datatype individually to make masked versions
  * lots of code checks  `np.isinstance(var, np.float64)` which breaks
    for my duck-scalars
  * it was difficult to distinguish between a duck-scalar and a duck-0d
    array. The method I used in the end seems hacky.

This has led to some daydreams about how scalars should work, and also
led me last to read through your NEPs 40/41 with specific focus on what
you said about scalars, and was about to post there until I saw this
discussion. I agree with what you said in the NEPs about not making
scalars be dtype instances.

Here is what ducktypes led me to:

If we are able to do something like define a `np.numpy_scalar` type
covering all numpy scalars, which has a `.dtype` attribute like you
describe in the NEPs, then that would seem to solve the ducktype
problems above. Ducktype implementors would need to make a "duck-scalar"
type in parallel to their "duck-ndarray" type, but I found that to be
pretty easy using an abstract class in my MaskedArray ducktype, since
the MaskedArray and MaskedScalar share a lot of behavior.

A numpy_scalar type would also help solve some object-array problems if
the object scalars are wrapped in the np_scalar type. A long time ago I
started to try to fix up various funny/strange behaviors of object
datatypes, but there are lots of special cases, and the main problem was
that the returned objects (eg from indexing) were not numpy types and
did not support numpy attributes or indexing. Wrapping the returned
object in `np.numpy_scalar` might add an extra slight annoyance to
people who want to unwrap the object, but I think it would make object
arrays less buggy and make code using object arrays easier to reason
about and debug.

Finally, a few random votes/comments based on the other emails on the list:

I think scalars have a place in numpy (rather than just reusing 0d
arrays), since there is a clear use in having hashable, immutable
scalars. Structured scalars should probably be immutable.

I agree with your suggestion that scalars should not be indexable. Thus,
my duck-scalars (and proposed numpy_scalar) would not be indexable.
However, I think they should encode their datatype though a .dtype
attribute like ndarrays, rather than by inheritance.

Also, something to think about is that currently numpy scalars satisfy
the property `isinstance(np.float64(1), float)`, i.e they are within the
python numerical type hierarchy. 0d arrays do not have this property. My
proposal above would break this. I'm not sure what to think about
whether this is a good property to maintain or not.

Cheers,
Allan



On 2/21/20 8:37 PM, Sebastian Berg wrote:
> Hi all,
>
> When we create new datatypes, we have the option to make new choices
> for the new datatypes [0] (not the existing ones).
>
> The question is: Should every NumPy datatype have a scalar associated
> and should operations like indexing return a scalar or a 0-D array?
>
> This is in my opinion a complex, almost philosophical, question, and we
> do not have to settle anything for a long time. But, if we do not
> decide a direction before we have many new datatypes the decision will
> make itself...
> So happy about any ideas, even if its just a gut feeling :).
>
> There are various points. I would like to mostly ignore the technical
> ones, but I am listing them anyway here:
>
>   * Scalars are faster (although that can be optimized likely)
>
>   * Scalars have a lower memory footprint
>
>   * The current implementation incurs a technical debt in NumPy.
>     (I do not think that is a general issue, though. We could
>     automatically create scalars for each new datatype probably.)
>
> Advantages of having no scalars:
>
>   * No need to keep track of scalars to preserve them in ufuncs, or
>     libraries using `np.asarray`, do they need `np.asarray_or_scalar`?
>     (or decide they return always arrays, although ufuncs may not)
>
>   * Seems simpler in many ways, you always know the output will be an
>     array if it has to do with NumPy.
>
> Advantages of having scalars:
>
>   * Scalars are immutable and we are used to them from Python.
>     A 0-D array cannot be used as a dictionary key consistently [1].
>
>     I.e. without scalars as first class citizen `dict[arr1d[0]]`
>     cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined,
>     and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]
>
>   * Object arrays as we have them now make sense, `arr1d[0]` can
>     reasonably return a Python object. I.e. arrays feel more like
>     container if you can take elements out easily.
>
> Could go both ways:
>
>   * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array
>     without scalars. With scalars `arr1d[0, ...]` clarifies the
>     meaning. (In principle it is good to never use `arr2d[0]` to
>     get a 1D slice, probably more-so if scalars exist.)
>
> Note: array-scalars (the current NumPy scalars) are not useful in my
> opinion [3]. A scalar should not be indexed or have a shape. I do not
> believe in scalars pretending to be arrays.
>
> I personally tend towards liking scalars.  If Python was a language
> where the array (array-programming) concept was ingrained into the
> language itself, I would lean the other way. But users are used to
> scalars, and they "put" scalars into arrays. Array objects are in some
> ways strange in Python, and I feel not having scalars detaches them
> further.
>
> Having scalars, however also means we should preserve them. I feel in
> principle that is actually fairly straight forward. E.g. for ufuncs:
>
>    * np.add(scalar, scalar) -> scalar
>    * np.add.reduce(arr, axis=None) -> scalar
>    * np.add.reduce(arr, axis=1) -> array (even if arr is 1d)
>    * np.add.reduce(scalar, axis=()) -> array
>
> Of course libraries that do `np.asarray` would/could basically chose to
> not preserve scalars: Their signature is defined as taking strictly
> array input.
>
> Cheers,
>
> Sebastian
>
>
> [0] At best this can be a vision to decide which way they may evolve.
>
> [1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably
> strange. E.g. Quantity defines hash correctly, but does not fully
> ensure immutability for 0-D Quantities. Ensuring immutability in a
> world where "views" are a central concept requires a write-only copy.
>
> [2] Arguably `.item()` would always return a scalar, but it would be a
> second class citizen. (Although if it returns a scalar, at least we
> already have a scalar implementation.)
>
> [3] They are necessary due to technical debt for NumPy datatypes
> though.
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
>

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion


--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

[hidden email]

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: New DTypes: Are scalars a central concept in NumPy or not?

Sebastian Berg
On Mon, 2020-03-23 at 11:45 -0700, Chris Barker wrote:
> I've always found the duality of zero-d arrays an scalars confusing,
> and
> I'm sure I'm not alone.
>
> Having both is just plain weird.

I guess so, it is a tricky situation, and I do not really have an
answer.

>
> But, backward compatibility aside, could we have ONLY Scalars?
>
> When we index into an array, the dimensionality is reduced by one, so
> indexing into a 1D array has to get us something: but the zero-d
> array is a
> really weird object -- do we really need it?
>

Well, it is hard to write functions that work on N-Dimensions (where N
can be 0), if the 0-D array does not exist. You can get away with
scalars in most cases, because they pretend to be arrays in most cases
(aside from mutability).
But I am pretty sure we have a bunch of cases that need
`res = np.asarray(res)` simply because `res` is N-D but could then be
silently converted to a scalar. E.g. see
https://github.com/numpy/numpy/issues/13105 for an issue about this
(although it does not actually list any specific problems).

- Sebastian


> There is certainly a need for more numpy-like scalars: more than the
> built
> in data types, and some handy attributes and methods, like dtype,
> .itemsize, etc. But could we make an enhanced scalar that had
> everything we
> actually need from a zero-d array?
>
> The key point would be mutability -- but do we really need mutable
> scalars?
> I can't think of any time I've needed that, when I couldn't have used
> a 1-d
> array of length 1.
>
> Is there a use case for zero-d arrays that could not be met with an
> enhanced scalar?
>
> -CHB
>
>
>
>
>
>
>
> On Mon, Feb 24, 2020 at 12:30 PM Allan Haldane <
> [hidden email]>
> wrote:
>
> > I have some thoughts on scalars from playing with ndarray ducktypes
> > (__array_function__), eg a MaskedArray ndarray-ducktype, for which
> > I
> > wanted an associated "MaskedScalar" type.
> >
> > In summary, the ways scalars currently work makes ducktyping
> > (duck-scalars) difficult:
> >
> >   * numpy scalar types are not subclassable, so my duck-scalars
> > aren't
> >     subclasses of numpy scalars and aren't in the type hierarchy
> >   * even if scalars were subclassable, I would have to subclass
> > each
> >     scalar datatype individually to make masked versions
> >   * lots of code checks  `np.isinstance(var, np.float64)` which
> > breaks
> >     for my duck-scalars
> >   * it was difficult to distinguish between a duck-scalar and a
> > duck-0d
> >     array. The method I used in the end seems hacky.
> >
> > This has led to some daydreams about how scalars should work, and
> > also
> > led me last to read through your NEPs 40/41 with specific focus on
> > what
> > you said about scalars, and was about to post there until I saw
> > this
> > discussion. I agree with what you said in the NEPs about not making
> > scalars be dtype instances.
> >
> > Here is what ducktypes led me to:
> >
> > If we are able to do something like define a `np.numpy_scalar` type
> > covering all numpy scalars, which has a `.dtype` attribute like you
> > describe in the NEPs, then that would seem to solve the ducktype
> > problems above. Ducktype implementors would need to make a "duck-
> > scalar"
> > type in parallel to their "duck-ndarray" type, but I found that to
> > be
> > pretty easy using an abstract class in my MaskedArray ducktype,
> > since
> > the MaskedArray and MaskedScalar share a lot of behavior.
> >
> > A numpy_scalar type would also help solve some object-array
> > problems if
> > the object scalars are wrapped in the np_scalar type. A long time
> > ago I
> > started to try to fix up various funny/strange behaviors of object
> > datatypes, but there are lots of special cases, and the main
> > problem was
> > that the returned objects (eg from indexing) were not numpy types
> > and
> > did not support numpy attributes or indexing. Wrapping the returned
> > object in `np.numpy_scalar` might add an extra slight annoyance to
> > people who want to unwrap the object, but I think it would make
> > object
> > arrays less buggy and make code using object arrays easier to
> > reason
> > about and debug.
> >
> > Finally, a few random votes/comments based on the other emails on
> > the list:
> >
> > I think scalars have a place in numpy (rather than just reusing 0d
> > arrays), since there is a clear use in having hashable, immutable
> > scalars. Structured scalars should probably be immutable.
> >
> > I agree with your suggestion that scalars should not be indexable.
> > Thus,
> > my duck-scalars (and proposed numpy_scalar) would not be indexable.
> > However, I think they should encode their datatype though a .dtype
> > attribute like ndarrays, rather than by inheritance.
> >
> > Also, something to think about is that currently numpy scalars
> > satisfy
> > the property `isinstance(np.float64(1), float)`, i.e they are
> > within the
> > python numerical type hierarchy. 0d arrays do not have this
> > property. My
> > proposal above would break this. I'm not sure what to think about
> > whether this is a good property to maintain or not.
> >
> > Cheers,
> > Allan
> >
> >
> >
> > On 2/21/20 8:37 PM, Sebastian Berg wrote:
> > > Hi all,
> > >
> > > When we create new datatypes, we have the option to make new
> > > choices
> > > for the new datatypes [0] (not the existing ones).
> > >
> > > The question is: Should every NumPy datatype have a scalar
> > > associated
> > > and should operations like indexing return a scalar or a 0-D
> > > array?
> > >
> > > This is in my opinion a complex, almost philosophical, question,
> > > and we
> > > do not have to settle anything for a long time. But, if we do not
> > > decide a direction before we have many new datatypes the decision
> > > will
> > > make itself...
> > > So happy about any ideas, even if its just a gut feeling :).
> > >
> > > There are various points. I would like to mostly ignore the
> > > technical
> > > ones, but I am listing them anyway here:
> > >
> > >   * Scalars are faster (although that can be optimized likely)
> > >
> > >   * Scalars have a lower memory footprint
> > >
> > >   * The current implementation incurs a technical debt in NumPy.
> > >     (I do not think that is a general issue, though. We could
> > >     automatically create scalars for each new datatype probably.)
> > >
> > > Advantages of having no scalars:
> > >
> > >   * No need to keep track of scalars to preserve them in ufuncs,
> > > or
> > >     libraries using `np.asarray`, do they need
> > > `np.asarray_or_scalar`?
> > >     (or decide they return always arrays, although ufuncs may
> > > not)
> > >
> > >   * Seems simpler in many ways, you always know the output will
> > > be an
> > >     array if it has to do with NumPy.
> > >
> > > Advantages of having scalars:
> > >
> > >   * Scalars are immutable and we are used to them from Python.
> > >     A 0-D array cannot be used as a dictionary key consistently
> > > [1].
> > >
> > >     I.e. without scalars as first class citizen `dict[arr1d[0]]`
> > >     cannot work, `dict[arr1d[0].item()]` may (if `.item()` is
> > > defined,
> > >     and e.g. `dict[arr1d[0].frozen()]` could make a copy to work.
> > > [2]
> > >
> > >   * Object arrays as we have them now make sense, `arr1d[0]` can
> > >     reasonably return a Python object. I.e. arrays feel more like
> > >     container if you can take elements out easily.
> > >
> > > Could go both ways:
> > >
> > >   * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the
> > > array
> > >     without scalars. With scalars `arr1d[0, ...]` clarifies the
> > >     meaning. (In principle it is good to never use `arr2d[0]` to
> > >     get a 1D slice, probably more-so if scalars exist.)
> > >
> > > Note: array-scalars (the current NumPy scalars) are not useful in
> > > my
> > > opinion [3]. A scalar should not be indexed or have a shape. I do
> > > not
> > > believe in scalars pretending to be arrays.
> > >
> > > I personally tend towards liking scalars.  If Python was a
> > > language
> > > where the array (array-programming) concept was ingrained into
> > > the
> > > language itself, I would lean the other way. But users are used
> > > to
> > > scalars, and they "put" scalars into arrays. Array objects are in
> > > some
> > > ways strange in Python, and I feel not having scalars detaches
> > > them
> > > further.
> > >
> > > Having scalars, however also means we should preserve them. I
> > > feel in
> > > principle that is actually fairly straight forward. E.g. for
> > > ufuncs:
> > >
> > >    * np.add(scalar, scalar) -> scalar
> > >    * np.add.reduce(arr, axis=None) -> scalar
> > >    * np.add.reduce(arr, axis=1) -> array (even if arr is 1d)
> > >    * np.add.reduce(scalar, axis=()) -> array
> > >
> > > Of course libraries that do `np.asarray` would/could basically
> > > chose to
> > > not preserve scalars: Their signature is defined as taking
> > > strictly
> > > array input.
> > >
> > > Cheers,
> > >
> > > Sebastian
> > >
> > >
> > > [0] At best this can be a vision to decide which way they may
> > > evolve.
> > >
> > > [1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is
> > > arguably
> > > strange. E.g. Quantity defines hash correctly, but does not fully
> > > ensure immutability for 0-D Quantities. Ensuring immutability in
> > > a
> > > world where "views" are a central concept requires a write-only
> > > copy.
> > >
> > > [2] Arguably `.item()` would always return a scalar, but it would
> > > be a
> > > second class citizen. (Although if it returns a scalar, at least
> > > we
> > > already have a scalar implementation.)
> > >
> > > [3] They are necessary due to technical debt for NumPy datatypes
> > > though.
> > >
> > >
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > [hidden email]
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > [hidden email]
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: New DTypes: Are scalars a central concept in NumPy or not?

Chris Barker - NOAA Federal
sorry to have fallen off the numpy grid for a bit, but:

On Mon, Mar 23, 2020 at 1:37 PM Sebastian Berg <[hidden email]> wrote:
On Mon, 2020-03-23 at 11:45 -0700, Chris Barker wrote:
> But, backward compatibility aside, could we have ONLY Scalars?
 
Well, it is hard to write functions that work on N-Dimensions (where N
can be 0), if the 0-D array does not exist. You can get away with
scalars in most cases, because they pretend to be arrays in most cases
(aside from mutability).
 
But I am pretty sure we have a bunch of cases that need
`res = np.asarray(res)` simply because `res` is N-D but could then be
silently converted to a scalar. E.g. see
https://github.com/numpy/numpy/issues/13105 for an issue about this
(although it does not actually list any specific problems).

I'm not sure this is insolvable (again, backwards compatibility aside) -- after all, one of the key issues is that it's undetermined what the rank should be of: array(a_scalar) -- 0-d is the only unambiguous answer, but then it's not really an array in the usual sense anyway. So in theory, we could not allow that conversion without specifying a rank.

at the end of the day, there has to be some endpoint on how far you can reduce the rank of an array and have it work -- why not have 1 be the lower limit?

-CHB





 
- Sebastian


> There is certainly a need for more numpy-like scalars: more than the
> built
> in data types, and some handy attributes and methods, like dtype,
> .itemsize, etc. But could we make an enhanced scalar that had
> everything we
> actually need from a zero-d array?
>
> The key point would be mutability -- but do we really need mutable
> scalars?
> I can't think of any time I've needed that, when I couldn't have used
> a 1-d
> array of length 1.
>
> Is there a use case for zero-d arrays that could not be met with an
> enhanced scalar?
>
> -CHB
>
>
>
>
>
>
>
> On Mon, Feb 24, 2020 at 12:30 PM Allan Haldane <
> [hidden email]>
> wrote:
>
> > I have some thoughts on scalars from playing with ndarray ducktypes
> > (__array_function__), eg a MaskedArray ndarray-ducktype, for which
> > I
> > wanted an associated "MaskedScalar" type.
> >
> > In summary, the ways scalars currently work makes ducktyping
> > (duck-scalars) difficult:
> >
> >   * numpy scalar types are not subclassable, so my duck-scalars
> > aren't
> >     subclasses of numpy scalars and aren't in the type hierarchy
> >   * even if scalars were subclassable, I would have to subclass
> > each
> >     scalar datatype individually to make masked versions
> >   * lots of code checks  `np.isinstance(var, np.float64)` which
> > breaks
> >     for my duck-scalars
> >   * it was difficult to distinguish between a duck-scalar and a
> > duck-0d
> >     array. The method I used in the end seems hacky.
> >
> > This has led to some daydreams about how scalars should work, and
> > also
> > led me last to read through your NEPs 40/41 with specific focus on
> > what
> > you said about scalars, and was about to post there until I saw
> > this
> > discussion. I agree with what you said in the NEPs about not making
> > scalars be dtype instances.
> >
> > Here is what ducktypes led me to:
> >
> > If we are able to do something like define a `np.numpy_scalar` type
> > covering all numpy scalars, which has a `.dtype` attribute like you
> > describe in the NEPs, then that would seem to solve the ducktype
> > problems above. Ducktype implementors would need to make a "duck-
> > scalar"
> > type in parallel to their "duck-ndarray" type, but I found that to
> > be
> > pretty easy using an abstract class in my MaskedArray ducktype,
> > since
> > the MaskedArray and MaskedScalar share a lot of behavior.
> >
> > A numpy_scalar type would also help solve some object-array
> > problems if
> > the object scalars are wrapped in the np_scalar type. A long time
> > ago I
> > started to try to fix up various funny/strange behaviors of object
> > datatypes, but there are lots of special cases, and the main
> > problem was
> > that the returned objects (eg from indexing) were not numpy types
> > and
> > did not support numpy attributes or indexing. Wrapping the returned
> > object in `np.numpy_scalar` might add an extra slight annoyance to
> > people who want to unwrap the object, but I think it would make
> > object
> > arrays less buggy and make code using object arrays easier to
> > reason
> > about and debug.
> >
> > Finally, a few random votes/comments based on the other emails on
> > the list:
> >
> > I think scalars have a place in numpy (rather than just reusing 0d
> > arrays), since there is a clear use in having hashable, immutable
> > scalars. Structured scalars should probably be immutable.
> >
> > I agree with your suggestion that scalars should not be indexable.
> > Thus,
> > my duck-scalars (and proposed numpy_scalar) would not be indexable.
> > However, I think they should encode their datatype though a .dtype
> > attribute like ndarrays, rather than by inheritance.
> >
> > Also, something to think about is that currently numpy scalars
> > satisfy
> > the property `isinstance(np.float64(1), float)`, i.e they are
> > within the
> > python numerical type hierarchy. 0d arrays do not have this
> > property. My
> > proposal above would break this. I'm not sure what to think about
> > whether this is a good property to maintain or not.
> >
> > Cheers,
> > Allan
> >
> >
> >
> > On 2/21/20 8:37 PM, Sebastian Berg wrote:
> > > Hi all,
> > >
> > > When we create new datatypes, we have the option to make new
> > > choices
> > > for the new datatypes [0] (not the existing ones).
> > >
> > > The question is: Should every NumPy datatype have a scalar
> > > associated
> > > and should operations like indexing return a scalar or a 0-D
> > > array?
> > >
> > > This is in my opinion a complex, almost philosophical, question,
> > > and we
> > > do not have to settle anything for a long time. But, if we do not
> > > decide a direction before we have many new datatypes the decision
> > > will
> > > make itself...
> > > So happy about any ideas, even if its just a gut feeling :).
> > >
> > > There are various points. I would like to mostly ignore the
> > > technical
> > > ones, but I am listing them anyway here:
> > >
> > >   * Scalars are faster (although that can be optimized likely)
> > >
> > >   * Scalars have a lower memory footprint
> > >
> > >   * The current implementation incurs a technical debt in NumPy.
> > >     (I do not think that is a general issue, though. We could
> > >     automatically create scalars for each new datatype probably.)
> > >
> > > Advantages of having no scalars:
> > >
> > >   * No need to keep track of scalars to preserve them in ufuncs,
> > > or
> > >     libraries using `np.asarray`, do they need
> > > `np.asarray_or_scalar`?
> > >     (or decide they return always arrays, although ufuncs may
> > > not)
> > >
> > >   * Seems simpler in many ways, you always know the output will
> > > be an
> > >     array if it has to do with NumPy.
> > >
> > > Advantages of having scalars:
> > >
> > >   * Scalars are immutable and we are used to them from Python.
> > >     A 0-D array cannot be used as a dictionary key consistently
> > > [1].
> > >
> > >     I.e. without scalars as first class citizen `dict[arr1d[0]]`
> > >     cannot work, `dict[arr1d[0].item()]` may (if `.item()` is
> > > defined,
> > >     and e.g. `dict[arr1d[0].frozen()]` could make a copy to work.
> > > [2]
> > >
> > >   * Object arrays as we have them now make sense, `arr1d[0]` can
> > >     reasonably return a Python object. I.e. arrays feel more like
> > >     container if you can take elements out easily.
> > >
> > > Could go both ways:
> > >
> > >   * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the
> > > array
> > >     without scalars. With scalars `arr1d[0, ...]` clarifies the
> > >     meaning. (In principle it is good to never use `arr2d[0]` to
> > >     get a 1D slice, probably more-so if scalars exist.)
> > >
> > > Note: array-scalars (the current NumPy scalars) are not useful in
> > > my
> > > opinion [3]. A scalar should not be indexed or have a shape. I do
> > > not
> > > believe in scalars pretending to be arrays.
> > >
> > > I personally tend towards liking scalars.  If Python was a
> > > language
> > > where the array (array-programming) concept was ingrained into
> > > the
> > > language itself, I would lean the other way. But users are used
> > > to
> > > scalars, and they "put" scalars into arrays. Array objects are in
> > > some
> > > ways strange in Python, and I feel not having scalars detaches
> > > them
> > > further.
> > >
> > > Having scalars, however also means we should preserve them. I
> > > feel in
> > > principle that is actually fairly straight forward. E.g. for
> > > ufuncs:
> > >
> > >    * np.add(scalar, scalar) -> scalar
> > >    * np.add.reduce(arr, axis=None) -> scalar
> > >    * np.add.reduce(arr, axis=1) -> array (even if arr is 1d)
> > >    * np.add.reduce(scalar, axis=()) -> array
> > >
> > > Of course libraries that do `np.asarray` would/could basically
> > > chose to
> > > not preserve scalars: Their signature is defined as taking
> > > strictly
> > > array input.
> > >
> > > Cheers,
> > >
> > > Sebastian
> > >
> > >
> > > [0] At best this can be a vision to decide which way they may
> > > evolve.
> > >
> > > [1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is
> > > arguably
> > > strange. E.g. Quantity defines hash correctly, but does not fully
> > > ensure immutability for 0-D Quantities. Ensuring immutability in
> > > a
> > > world where "views" are a central concept requires a write-only
> > > copy.
> > >
> > > [2] Arguably `.item()` would always return a scalar, but it would
> > > be a
> > > second class citizen. (Although if it returns a scalar, at least
> > > we
> > > already have a scalar implementation.)
> > >
> > > [3] They are necessary due to technical debt for NumPy datatypes
> > > though.
> > >
> > >
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > [hidden email]
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > [hidden email]
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion


--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

[hidden email]

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: New DTypes: Are scalars a central concept in NumPy or not?

Sebastian Berg
On Wed, 2020-04-08 at 12:37 -0700, Chris Barker wrote:

> sorry to have fallen off the numpy grid for a bit, but:
>
> On Mon, Mar 23, 2020 at 1:37 PM Sebastian Berg <
> [hidden email]>
> wrote:
>
> > On Mon, 2020-03-23 at 11:45 -0700, Chris Barker wrote:
> > > But, backward compatibility aside, could we have ONLY Scalars?
> > Well, it is hard to write functions that work on N-Dimensions
> > (where N
> > can be 0), if the 0-D array does not exist. You can get away with
> > scalars in most cases, because they pretend to be arrays in most
> > cases
> > (aside from mutability).
> >
> > But I am pretty sure we have a bunch of cases that need
> > `res = np.asarray(res)` simply because `res` is N-D but could then
> > be
> > silently converted to a scalar. E.g. see
> > https://github.com/numpy/numpy/issues/13105 for an issue about this
> > (although it does not actually list any specific problems).
> >
>
> I'm not sure this is insolvable (again, backwards compatibility
> aside) --
> after all, one of the key issues is that it's undetermined what the
> rank
> should be of: array(a_scalar) -- 0-d is the only unambiguous answer,
> but
> then it's not really an array in the usual sense anyway. So in
> theory, we
> could not allow that conversion without specifying a rank.
So as a (silly) example, the following does not generalize to 0d, even
though it should:

def weird_normalize_by_trace_inplace(stacked_matrices)
    """Devides matrices by their trace but retains sign
    (works in-place, and thus e.g. not for integer arrays)

    Parameters
    ----------
    stacked_matrices : (..., N, M) ndarray
    """
    assert stacked_matrices.shape[-1] == stacked_matrices.shape[-2]

    trace = np.trace(stacked_matrices, axis1=-2, axis2=-1)
    trace[trace < 0] *= -1
    stacked_matrices /= trace

Sure that function does not make sense and you could rewrite it, but
the fact is that in that function you want to conditionally modify
trace in-place, but trace can be 0d and the "conditional" modification
breaks down.

- Sebastian


>
> at the end of the day, there has to be some endpoint on how far you
> can
> reduce the rank of an array and have it work -- why not have 1 be the
> lower
> limit?
>
> -CHB
>
>
>
>
>
>
>
> > - Sebastian
> >
> >
> > > There is certainly a need for more numpy-like scalars: more than
> > > the
> > > built
> > > in data types, and some handy attributes and methods, like dtype,
> > > .itemsize, etc. But could we make an enhanced scalar that had
> > > everything we
> > > actually need from a zero-d array?
> > >
> > > The key point would be mutability -- but do we really need
> > > mutable
> > > scalars?
> > > I can't think of any time I've needed that, when I couldn't have
> > > used
> > > a 1-d
> > > array of length 1.
> > >
> > > Is there a use case for zero-d arrays that could not be met with
> > > an
> > > enhanced scalar?
> > >
> > > -CHB
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Mon, Feb 24, 2020 at 12:30 PM Allan Haldane <
> > > [hidden email]>
> > > wrote:
> > >
> > > > I have some thoughts on scalars from playing with ndarray
> > > > ducktypes
> > > > (__array_function__), eg a MaskedArray ndarray-ducktype, for
> > > > which
> > > > I
> > > > wanted an associated "MaskedScalar" type.
> > > >
> > > > In summary, the ways scalars currently work makes ducktyping
> > > > (duck-scalars) difficult:
> > > >
> > > >   * numpy scalar types are not subclassable, so my duck-scalars
> > > > aren't
> > > >     subclasses of numpy scalars and aren't in the type
> > > > hierarchy
> > > >   * even if scalars were subclassable, I would have to subclass
> > > > each
> > > >     scalar datatype individually to make masked versions
> > > >   * lots of code checks  `np.isinstance(var, np.float64)` which
> > > > breaks
> > > >     for my duck-scalars
> > > >   * it was difficult to distinguish between a duck-scalar and a
> > > > duck-0d
> > > >     array. The method I used in the end seems hacky.
> > > >
> > > > This has led to some daydreams about how scalars should work,
> > > > and
> > > > also
> > > > led me last to read through your NEPs 40/41 with specific focus
> > > > on
> > > > what
> > > > you said about scalars, and was about to post there until I saw
> > > > this
> > > > discussion. I agree with what you said in the NEPs about not
> > > > making
> > > > scalars be dtype instances.
> > > >
> > > > Here is what ducktypes led me to:
> > > >
> > > > If we are able to do something like define a `np.numpy_scalar`
> > > > type
> > > > covering all numpy scalars, which has a `.dtype` attribute like
> > > > you
> > > > describe in the NEPs, then that would seem to solve the
> > > > ducktype
> > > > problems above. Ducktype implementors would need to make a
> > > > "duck-
> > > > scalar"
> > > > type in parallel to their "duck-ndarray" type, but I found that
> > > > to
> > > > be
> > > > pretty easy using an abstract class in my MaskedArray ducktype,
> > > > since
> > > > the MaskedArray and MaskedScalar share a lot of behavior.
> > > >
> > > > A numpy_scalar type would also help solve some object-array
> > > > problems if
> > > > the object scalars are wrapped in the np_scalar type. A long
> > > > time
> > > > ago I
> > > > started to try to fix up various funny/strange behaviors of
> > > > object
> > > > datatypes, but there are lots of special cases, and the main
> > > > problem was
> > > > that the returned objects (eg from indexing) were not numpy
> > > > types
> > > > and
> > > > did not support numpy attributes or indexing. Wrapping the
> > > > returned
> > > > object in `np.numpy_scalar` might add an extra slight annoyance
> > > > to
> > > > people who want to unwrap the object, but I think it would make
> > > > object
> > > > arrays less buggy and make code using object arrays easier to
> > > > reason
> > > > about and debug.
> > > >
> > > > Finally, a few random votes/comments based on the other emails
> > > > on
> > > > the list:
> > > >
> > > > I think scalars have a place in numpy (rather than just reusing
> > > > 0d
> > > > arrays), since there is a clear use in having hashable,
> > > > immutable
> > > > scalars. Structured scalars should probably be immutable.
> > > >
> > > > I agree with your suggestion that scalars should not be
> > > > indexable.
> > > > Thus,
> > > > my duck-scalars (and proposed numpy_scalar) would not be
> > > > indexable.
> > > > However, I think they should encode their datatype though a
> > > > .dtype
> > > > attribute like ndarrays, rather than by inheritance.
> > > >
> > > > Also, something to think about is that currently numpy scalars
> > > > satisfy
> > > > the property `isinstance(np.float64(1), float)`, i.e they are
> > > > within the
> > > > python numerical type hierarchy. 0d arrays do not have this
> > > > property. My
> > > > proposal above would break this. I'm not sure what to think
> > > > about
> > > > whether this is a good property to maintain or not.
> > > >
> > > > Cheers,
> > > > Allan
> > > >
> > > >
> > > >
> > > > On 2/21/20 8:37 PM, Sebastian Berg wrote:
> > > > > Hi all,
> > > > >
> > > > > When we create new datatypes, we have the option to make new
> > > > > choices
> > > > > for the new datatypes [0] (not the existing ones).
> > > > >
> > > > > The question is: Should every NumPy datatype have a scalar
> > > > > associated
> > > > > and should operations like indexing return a scalar or a 0-D
> > > > > array?
> > > > >
> > > > > This is in my opinion a complex, almost philosophical,
> > > > > question,
> > > > > and we
> > > > > do not have to settle anything for a long time. But, if we do
> > > > > not
> > > > > decide a direction before we have many new datatypes the
> > > > > decision
> > > > > will
> > > > > make itself...
> > > > > So happy about any ideas, even if its just a gut feeling :).
> > > > >
> > > > > There are various points. I would like to mostly ignore the
> > > > > technical
> > > > > ones, but I am listing them anyway here:
> > > > >
> > > > >   * Scalars are faster (although that can be optimized
> > > > > likely)
> > > > >
> > > > >   * Scalars have a lower memory footprint
> > > > >
> > > > >   * The current implementation incurs a technical debt in
> > > > > NumPy.
> > > > >     (I do not think that is a general issue, though. We could
> > > > >     automatically create scalars for each new datatype
> > > > > probably.)
> > > > >
> > > > > Advantages of having no scalars:
> > > > >
> > > > >   * No need to keep track of scalars to preserve them in
> > > > > ufuncs,
> > > > > or
> > > > >     libraries using `np.asarray`, do they need
> > > > > `np.asarray_or_scalar`?
> > > > >     (or decide they return always arrays, although ufuncs may
> > > > > not)
> > > > >
> > > > >   * Seems simpler in many ways, you always know the output
> > > > > will
> > > > > be an
> > > > >     array if it has to do with NumPy.
> > > > >
> > > > > Advantages of having scalars:
> > > > >
> > > > >   * Scalars are immutable and we are used to them from
> > > > > Python.
> > > > >     A 0-D array cannot be used as a dictionary key
> > > > > consistently
> > > > > [1].
> > > > >
> > > > >     I.e. without scalars as first class citizen
> > > > > `dict[arr1d[0]]`
> > > > >     cannot work, `dict[arr1d[0].item()]` may (if `.item()` is
> > > > > defined,
> > > > >     and e.g. `dict[arr1d[0].frozen()]` could make a copy to
> > > > > work.
> > > > > [2]
> > > > >
> > > > >   * Object arrays as we have them now make sense, `arr1d[0]`
> > > > > can
> > > > >     reasonably return a Python object. I.e. arrays feel more
> > > > > like
> > > > >     container if you can take elements out easily.
> > > > >
> > > > > Could go both ways:
> > > > >
> > > > >   * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the
> > > > > array
> > > > >     without scalars. With scalars `arr1d[0, ...]` clarifies
> > > > > the
> > > > >     meaning. (In principle it is good to never use `arr2d[0]`
> > > > > to
> > > > >     get a 1D slice, probably more-so if scalars exist.)
> > > > >
> > > > > Note: array-scalars (the current NumPy scalars) are not
> > > > > useful in
> > > > > my
> > > > > opinion [3]. A scalar should not be indexed or have a shape.
> > > > > I do
> > > > > not
> > > > > believe in scalars pretending to be arrays.
> > > > >
> > > > > I personally tend towards liking scalars.  If Python was a
> > > > > language
> > > > > where the array (array-programming) concept was ingrained
> > > > > into
> > > > > the
> > > > > language itself, I would lean the other way. But users are
> > > > > used
> > > > > to
> > > > > scalars, and they "put" scalars into arrays. Array objects
> > > > > are in
> > > > > some
> > > > > ways strange in Python, and I feel not having scalars
> > > > > detaches
> > > > > them
> > > > > further.
> > > > >
> > > > > Having scalars, however also means we should preserve them. I
> > > > > feel in
> > > > > principle that is actually fairly straight forward. E.g. for
> > > > > ufuncs:
> > > > >
> > > > >    * np.add(scalar, scalar) -> scalar
> > > > >    * np.add.reduce(arr, axis=None) -> scalar
> > > > >    * np.add.reduce(arr, axis=1) -> array (even if arr is 1d)
> > > > >    * np.add.reduce(scalar, axis=()) -> array
> > > > >
> > > > > Of course libraries that do `np.asarray` would/could
> > > > > basically
> > > > > chose to
> > > > > not preserve scalars: Their signature is defined as taking
> > > > > strictly
> > > > > array input.
> > > > >
> > > > > Cheers,
> > > > >
> > > > > Sebastian
> > > > >
> > > > >
> > > > > [0] At best this can be a vision to decide which way they may
> > > > > evolve.
> > > > >
> > > > > [1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is
> > > > > arguably
> > > > > strange. E.g. Quantity defines hash correctly, but does not
> > > > > fully
> > > > > ensure immutability for 0-D Quantities. Ensuring immutability
> > > > > in
> > > > > a
> > > > > world where "views" are a central concept requires a write-
> > > > > only
> > > > > copy.
> > > > >
> > > > > [2] Arguably `.item()` would always return a scalar, but it
> > > > > would
> > > > > be a
> > > > > second class citizen. (Although if it returns a scalar, at
> > > > > least
> > > > > we
> > > > > already have a scalar implementation.)
> > > > >
> > > > > [3] They are necessary due to technical debt for NumPy
> > > > > datatypes
> > > > > though.
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > NumPy-Discussion mailing list
> > > > > [hidden email]
> > > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > > >
> > > >
> > > > _______________________________________________
> > > > NumPy-Discussion mailing list
> > > > [hidden email]
> > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > >
> > >
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > [hidden email]
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > [hidden email]
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: New DTypes: Are scalars a central concept in NumPy or not?

Chris Barker - NOAA Federal
On Wed, Apr 8, 2020 at 1:17 PM Sebastian Berg <[hidden email]> wrote:
> > > But, backward compatibility aside, could we have ONLY Scalars?
> > Well, it is hard to write functions that work on N-Dimensions
> > (where N
> > can be 0), if the 0-D array does not exist. 
 
So as a (silly) example, the following does not generalize to 0d, even
though it should:

def weird_normalize_by_trace_inplace(stacked_matrices)
    """Devides matrices by their trace but retains sign
    (works in-place, and thus e.g. not for integer arrays)

    Parameters
    ----------
    stacked_matrices : (..., N, M) ndarray
    """
    assert stacked_matrices.shape[-1] == stacked_matrices.shape[-2]

    trace = np.trace(stacked_matrices, axis1=-2, axis2=-1)
    trace[trace < 0] *= -1
    stacked_matrices /= trace

Sure that function does not make sense and you could rewrite it, but
the fact is that in that function you want to conditionally modify
trace in-place, but trace can be 0d and the "conditional" modification
breaks down.

I guess that's what I'm getting at -- there is always an endpoint to reducing the rank. a function that's designed to work on a "stack" of something doesn't have to work on a single something, when it can, instead, work on a "stack" of hight one.

Isn't the trace of a matrix always a scalar? and thus the trace(s) of a stack of matrixes would always  be 1-D?

So that function should do something like:

stacked_matrixes.shape = (-1, M, M)

yes?

and then it would always work.

Again, backwards compatibility, but there is a reason the np.atleast_*() functions exist -- you often need to make sure your inputs have the dimensionality expected.

-CHB

--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

[hidden email]

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion