NEP: array API standard adoption (NEP 47)

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

NEP: array API standard adoption (NEP 47)

ralfgommers
Hi all,

Here is a NEP, written together with Stephan Hoyer and Aaron Meurer, for discussion on adoption of the array API standard (https://data-apis.github.io/array-api/latest/). This will add a new numpy.array_api submodule containing that standardized API. The main purpose of this API is to be able to write code that is portable to other array/tensor libraries like CuPy, PyTorch, JAX, TensorFlow, Dask, and MXNet.

We expect this NEP to remain in draft state for quite a while, while we're gaining experience with using it in downstream libraries, discuss adding it to other array libraries, and finishing some of the loose ends (e.g., specifications for linear algebra functions that aren't merged yet, see https://github.com/data-apis/array-api/pulls) in the API standard itself.

See https://mail.python.org/pipermail/numpy-discussion/2020-November/081181.html for an initial discussion about this topic.

Please keep high-level discussion here and detailed comments on https://github.com/numpy/numpy/pull/18456. Also, you can access a rendered version of the NEP from that PR (see PR description for how), which may be helpful.

Cheers,
Ralf


Abstract
--------

We propose to adopt the `Python array API standard`_, developed by the
`Consortium for Python Data API Standards`_. Implementing this as a separate
new namespace in NumPy will allow authors of libraries which depend on NumPy
as well as end users to write code that is portable between NumPy and all
other array/tensor libraries that adopt this standard.

.. note::

    We expect that this NEP will remain in a draft state for quite a while.
    Given the large scope we don't expect to propose it for acceptance any
    time soon; instead, we want to solicit feedback on both the high-level
    design and implementation, and learn what needs describing better in this
    NEP or changing in either the implementation or the array API standard
    itself.


Motivation and Scope
--------------------

Python users have a wealth of choice for libraries and frameworks for
numerical computing, data science, machine learning, and deep learning. New
frameworks pushing forward the state of the art in these fields are appearing
every year. One unintended consequence of all this activity and creativity
has been fragmentation in multidimensional array (a.k.a. tensor) libraries -
which are the fundamental data structure for these fields. Choices include
NumPy, Tensorflow, PyTorch, Dask, JAX, CuPy, MXNet, and others.

The APIs of each of these libraries are largely similar, but with enough
differences that it’s quite difficult to write code that works with multiple
(or all) of these libraries. The array API standard aims to address that
issue, by specifying an API for the most common ways arrays are constructed
and used. The proposed API is quite similar to NumPy's API, and deviates mainly
in places where (a) NumPy made design choices that are inherently not portable
to other implementations, and (b) where other libraries consistently deviated
from NumPy on purpose because NumPy's design turned out to have issues or
unnecessary complexity.

For a longer discussion on the purpose of the array API standard we refer to
the `Purpose and Scope section of the array API standard <https://data-apis.github.io/array-api/latest/purpose_and_scope.html>`__
and the two blog posts announcing the formation of the Consortium [1]_ and
the release of the first draft version of the standard for community review [2]_.

The scope of this NEP includes:

- Adopting the 2021 version of the array API standard
- Adding a separate namespace, tentatively named ``numpy.array_api``
- Changes needed/desired outside of the new namespace, for example new dunder
  methods on the ``ndarray`` object
- Implementation choices, and differences between functions in the new
  namespace with those in the main ``numpy`` namespace
- A new array object conforming to the array API standard
- Maintenance effort and testing strategy
- Impact on NumPy's total exposed API surface and on other future and
  under-discussion design choices
- Relation to existing and proposed NumPy array protocols
  (``__array_ufunc__``, ``__array_function__``, ``__array_module__``).
- Required improvements to existing NumPy functionality

Out of scope for this NEP are:

- Changes in the array API standard itself. Those are likely to come up
  during review of this NEP, but should be upstreamed as needed and this NEP
  subsequently updated.


Usage and Impact
----------------

*This section will be fleshed out later, for now we refer to the use cases given
in* `the array API standard Use Cases section <https://data-apis.github.io/array-api/latest/use_cases.html>`__

In addition to those use cases, the new namespace contains functionality that
is widely used and supported by many array libraries. As such, it is a good
set of functions to teach to newcomers to NumPy and recommend as "best
practice". That contrasts with NumPy's main namespace, which contains many
functions and objects that have been superceded or we consider mistakes - but
that we can't remove because of backwards compatibility reasons.

The usage of the ``numpy.array_api`` namespace by downstream libraries is
intended to enable them to consume multiple kinds of arrays, *without having
to have a hard dependency on all of those array libraries*:

.. image:: _static/nep-0047-library-dependencies.png

Adoption in downstream libraries
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The prototype implementation of the ``array_api`` namespace will be used with
SciPy, scikit-learn and other libraries of interest that depend on NumPy, in
order to get more experience with the design and find out if any important
parts are missing.

The pattern to support multiple array libraries is intended to be something
like::

    def somefunc(x, y):
        # Retrieves standard namespace. Raises if x and y have different
        # namespaces.  See Appendix for possible get_namespace implementation
        xp = get_namespace(x, y)
        out = xp.mean(x, axis=0) + 2*xp.std(y, axis=0)
        return out

The ``get_namespace`` call is effectively the library author opting in to
using the standard API namespace, and thereby explicitly supporting
all conforming array libraries.


The ``asarray`` / ``asanyarray`` pattern
````````````````````````````````````````

Many existing libraries use the same ``asarray`` (or ``asanyarray``) pattern
as NumPy itself does; accepting any object that can be coerced into a ``np.ndarray``.
We consider this design pattern problematic - keeping in mind the Zen of
Python, *"explicit is better than implicit"*, as well as the pattern being
historically problematic in the SciPy ecosystem for ``ndarray`` subclasses
and with over-eager object creation. All other array/tensor libraries are
more strict, and that works out fine in practice. We would advise authors of
new libraries to avoid the ``asarray`` pattern. Instead they should either
accept just NumPy arrays or, if they want to support multiple kinds of
arrays, check if the incoming array object supports the array API standard
by checking for ``__array_namespace__`` as shown in the example above.

Existing libraries can do such a check as well, and only call ``asarray`` if
the check fails. This is very similar to the ``__duckarray__`` idea in
:ref:`NEP30`.


.. _adoption-application-code:

Adoption in application code
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The new namespace can be seen by end users as a cleaned up and slimmed down
version of NumPy's main namespace. Encouraging end users to use this
namespace like::

    import numpy.array_api as xp

    x = xp.linspace(0, 2*xp.pi, num=100)
    y = xp.cos(x)

seems perfectly reasonable, and potentially beneficial - users get offered only
one function for each purpose (the one we consider best-practice), and they
then write code that is more easily portable to other libraries.


Backward compatibility
----------------------

No deprecations or removals of existing NumPy APIs or other backwards
incompatible changes are proposed.


High-level design
-----------------

The array API standard consists of approximately 120 objects, all of which
have a direct NumPy equivalent. This figure shows what is included at a high level:

.. image:: _static/nep-0047-scope-of-array-API.png

The most important changes compared to what NumPy currently offers are:

- A new array object which:

    - conforms to the casting rules and indexing behaviour specified by the
      standard,
    - does not have methods other than dunder methods,
    - does not support the full range of NumPy indexing behaviour. Advanced
      indexing with integers is not supported. Only boolean indexing
      with a single (possibly multi-dimensional) boolean array is supported.
      An indexing expression that selects a single element returns a 0-D array
      rather than a scalar.

- Functions in the ``array_api`` namespace:

    - do not accept ``array_like`` inputs, only NumPy arrays and Python scalars
    - do not support ``__array_ufunc__`` and ``__array_function__``,
    - use positional-only and keyword-only parameters in their signatures,
    - have inline type annotations,
    - may have minor changes to signatures and semantics of individual
      functions compared to their equivalents already present in NumPy,
    - only support dtype literals, not format strings or other ways of
      specifying dtypes

- DLPack_ support will be added to NumPy,
- New syntax for "device support" will be added, through a ``.device``
  attribute on the new array object, and ``device=`` keywords in array creation
  functions in the ``array_api`` namespace,
- Casting rules that differ from those NumPy currently has. Output dtypes can
  be derived from input dtypes (i.e. no value-based casting), and 0-D arrays
  are treated like >=1-D arrays.
- Not all dtypes NumPy has are part of the standard. Only boolean, signed and
  unsigned integers, and floating-point dtypes up to ``float64`` are supported.
  Complex dtypes are expected to be added in the next version of the standard.
  Extended precision, string, void, object and datetime dtypes, as well as
  structured dtypes, are not included.

Improvements to existing NumPy functionality that are needed include:

- Add support for stacks of matrices to some functions in ``numpy.linalg``
  that are currently missing such support.
- Add the ``keepdims`` keyword to ``np.argmin`` and ``np.argmax``.
- Add a "never copy" mode to ``np.asarray``.


Functions in the ``array_api`` namespace
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Let's start with an example of a function implementation that shows the most
important differences with the equivalent function in the main namespace::

    def max(x: array, /, *,
            axis: Optional[Union[int, Tuple[int, ...]]] = None,
            keepdims: bool = False
        ) -> array:
        """
        Array API compatible wrapper for :py:func:`np.max <numpy.max>`.
        """
        return np.max._implementation(x, axis=axis, keepdims=keepdims)

This function does not accept ``array_like`` inputs, only ``ndarray``. There
are multiple reasons for this. Other array libraries all work like this.
Letting the user do coercion of lists, generators, or other foreign objects
separately results in a cleaner design with less unexpected behaviour.
It's higher-performance - less overhead from ``asarray`` calls. Static typing
is easier. Subclasses will work as expected. And the slight increase in verbosity
because users have to explicitly coerce to ``ndarray`` on rare occasions
seems like a small price to pay.

This function does not support ``__array_ufunc__`` nor ``__array_function__``.
These protocols serve a similar purpose as the array API standard module itself,
but through a different mechanisms. Because only ``ndarray`` instances are accepted,
dispatching via one of these protocols isn't useful anymore.

This function uses positional-only parameters in its signature. This makes code
more portable - writing ``max(x=x, ...)`` is no longer valid, hence if other
libraries call the first parameter ``input`` rather than ``x``, that is fine.
The rationale for keyword-only parameters (not shown in the above example) is
two-fold: clarity of end user code, and it being easier to extend the signature
in the future with keywords in the desired order.

This function has inline type annotations. Inline annotations are far easier to
maintain than separate stub files. And because the types are simple, this will
not result in a large amount of clutter with type aliases or unions like in the
current stub files NumPy has.


DLPack support for zero-copy data interchange
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The ability to convert one kind of array into another kind is valuable, and
indeed necessary when downstream libraries want to support multiple kinds of
arrays. This requires a well-specified data exchange protocol. NumPy already
supports two of these, namely the buffer protocol (i.e., PEP 3118), and
the ``__array_interface__`` (Python side) / ``__array_struct__`` (C side)
protocol. Both work similarly, letting the "producer" describe how the data
is laid out in memory so the "consumer" can construct its own kind of array
with a view on that data.

DLPack works in a very similar way. The main reasons to prefer DLPack over
the options already present in NumPy are:

1. DLPack is the only protocol with device support (e.g., GPUs using CUDA or
   ROCm drivers, or OpenCL devices). NumPy is CPU-only, but other array
   libraries are not. Having one protocol per device isn't tenable, hence
   device support is a must.
2. Widespread support. DLPack has the widest adoption of all protocols, only
   NumPy is missing support. And the experiences of other libraries with it
   are positive. This contrasts with the protocols NumPy does support, which
   are used very little - when other libraries want to interoperate with
   NumPy, they typically use the (more limited, and NumPy-specific)
   ``__array__`` protocol.

Adding support for DLPack to NumPy entails:

- Adding a ``ndarray.__dlpack__`` method
- Adding a ``from_dlpack`` function, which takes as input an object
  supporting ``__dlpack__``, and returns an ``ndarray``.

DLPack is currently a ~200 LoC header, and is meant to be included directly, so
no external dependency is needed. Implementation should be straightforward.


Syntax for device support
~~~~~~~~~~~~~~~~~~~~~~~~~

NumPy itself is CPU-only, so it clearly doesn't have a need for device support.
However, other libraries (e.g. TensorFlow, PyTorch, JAX, MXNet) support
multiple types of devices: CPU, GPU, TPU, and more exotic hardware.
To write portable code on systems with multiple devices, it's often necessary
to create new arrays on the same device as some other array, or check that
two arrays live on the same device. Hence syntax for that is needed.

The array object will have a ``.device`` attribute which enables comparing
devices of different arrays (they only should compare equal if both arrays are
from the same library and it's the same hardware device). Furthermore,
``device=`` keywords in array creation functions are needed. For example::

    def empty(shape: Union[int, Tuple[int, ...]], /, *,
              dtype: Optional[dtype] = None,
              device: Optional[device] = None) -> array:
        """
        Array API compatible wrapper for :py:func:`np.empty <numpy.empty>`.
        """
        return np.empty(shape, dtype=dtype, device=device)

The implementation for NumPy may be as simple as setting the device attribute to
the string ``'cpu'`` and raising an exception if array creation functions
encounter any other value.


Dtypes and casting rules
~~~~~~~~~~~~~~~~~~~~~~~~

The supported dtypes in this namespace are boolean, 8/16/32/64-bit signed and
unsigned integer, and 32/64-bit floating-point dtypes. These will be added to
the namespace as dtype literals with the expected names (e.g., ``bool``,
``uint16``, ``float64``).

The most obvious omissions are the complex dtypes. The rationale for the lack
of complex support in the first version of the array API standard is that several
libraries (PyTorch, MXNet) are still in the process of adding support for
complex dtypes. The next version of the standard is expected to include ``complex64``
and ``complex128`` (see `this issue <https://github.com/data-apis/array-api/issues/102>`__
for more details).

Specifying dtypes to functions, e.g. via the ``dtype=`` keyword, is expected
to only use the dtype literals. Format strings, Python builtin dtypes, or
string representations of the dtype literals are not accepted - this will
improve readability and portability of code at little cost.

Casting rules are only defined between different dtypes of the same kind. The
rationale for this is that mixed-kind (e.g., integer to floating-point)
casting behavior differs between libraries. NumPy's mixed-kind casting
behavior doesn't need to be changed or restricted, it only needs to be
documented that if users use mixed-kind casting, their code may not be
portable.

.. image:: _static/nep-0047-casting-rules-lattice.png

*Type promotion diagram. Promotion between any two types is given by their
join on this lattice. Only the types of participating arrays matter, not
their values. Dashed lines indicate that behaviour for Python scalars is
undefined on overflow. Boolean, integer and floating-point dtypes are not
connected, indicating mixed-kind promotion is undefined.*

The most important difference between the casting rules in NumPy and in the
array API standard is how scalars and 0-dimensional arrays are handled. In
the standard, array scalars do not exist and 0-dimensional arrays follow the
same casting rules as higher-dimensional arrays.

See the `Type Promotion Rules section of the array API standard <https://data-apis.github.io/array-api/latest/API_specification/type_promotion.html>`__
for more details.

.. note::

    It is not clear what the best way is to support the different casting rules
    for 0-dimensional arrays and no value-based casting. One option may be to
    implement this second set of casting rules, keep them private, mark the
    array API functions with a private attribute that says they adhere to
    these different rules, and let the casting machinery check whether for
    that attribute.

    This needs discussion.


Indexing
~~~~~~~~

An indexing expression that would return a scalar with ``ndarray``, e.g.
``arr_2d[0, 0]``, will return a 0-D array with the new array object. There are
several reasons for that: array scalars are largely considered a design mistake
which no other array library copied; it works better for non-CPU libraries
(typically arrays can live on the device, scalars live on the host); and it's
simply a consistent design. To get a Python scalar out of a 0-D array, one can
simply use the builtin for the type, e.g. ``float(arr_0d)``.

The other `indexing modes in the standard <https://data-apis.github.io/array-api/latest/API_specification/indexing.html>`__
do work largely the same as they do for ``numpy.ndarray``. One noteworthy
difference is that clipping in slice indexing (e.g., ``a[:n]`` where ``n`` is
larger than the size of the first axis) is unspecified behaviour, because
that kind of check can be expensive on accelerators.

The lack of advanced indexing, and boolean indexing being limited to a single
n-D boolean array, is due to those indexing modes not being suitable for all
types of arrays or JIT compilation. Their absence does not seem to be
problematic; if a user or library author wants to use them, they can do so
through zero-copy conversion to ``numpy.ndarray``. This will signal correctly
to whomever reads the code that it is then NumPy-specific rather than portable
to all conforming array types.



The array object
~~~~~~~~~~~~~~~~

The array object in the standard does not have methods other than dunder
methods. The rationale for that is that not all array libraries have methods
on their array object (e.g., TensorFlow does not). It also provides only a
single way of doing something, rather than have functions and methods that
are effectively duplicate.

Mixing operations that may produce views (e.g., indexing, ``nonzero``)
in combination with mutation (e.g., item or slice assignment) is
`explicitly documented in the standard to not be supported <https://data-apis.github.io/array-api/latest/design_topics/copies_views_and_mutation.html>`__.
This cannot easily be prohibited in the array object itself; instead this will
be guidance to the user via documentation.

The standard current does not prescribe a name for the array object itself.
We propose to simply name it ``ndarray``. This is the most obvious name, and
because of the separate namespace should not clash with ``numpy.ndarray``.


Implementation
--------------

.. note::

    This section needs a lot more detail, which will gradually be added when
    the implementation progresses.

A prototype of the ``array_api`` namespace can be found in
https://github.com/data-apis/numpy/tree/array-api/numpy/_array_api.
The docstring in its ``__init__.py`` has notes on completeness of the
implementation. The code for the wrapper functions also contains ``# Note:``
comments everywhere there is a difference with the NumPy API.
Two important parts that are not implemented yet are the new array object and
DLPack support. Functions may need changes to ensure the changed casting rules
are respected.

The array object
~~~~~~~~~~~~~~~~

Regarding the array object implementation, we plan to start with a regular
Python class that wraps a ``numpy.ndarray`` instance. Attributes and methods
can forward to that wrapped instance, applying input validation and
implementing changed behaviour as needed.

The casting rules are probably the most challenging part. The in-progress
dtype system refactor (NEPs 40-43) should make implementing the correct casting
behaviour easier - it is already moving away from value-based casting for
example.


The dtype objects
~~~~~~~~~~~~~~~~~

We must be able to compare dtypes for equality, and expressions like these must
be possible::

    np.array_api.some_func(..., dtype=x.dtype)

The above implies it would be nice to have ``np.array_api.float32 ==
np.array_api.ndarray(...).dtype``.

Dtypes should not be assumed to have a class hierarchy by users, however we are
free to implement it with a class hierarchy if that's convenient. We considered
the following options to implement dtype objects:

1. Alias dtypes to those in the main namespace. E.g., ``np.array_api.float32 =
   np.float32``.
2. Make the dtypes instances of ``np.dtype``. E.g., ``np.array_api.float32 =
   np.dtype(np.float32)``.
3. Create new singleton classes with only the required methods/attributes
   (currently just ``__eq__``).

It seems like (2) would be easiest from the perspective of interacting with
functions outside the main namespace. And (3) would adhere best to the
standard.

TBD: the standard does not yet have a good way to inspect properties of a
dtype, to ask questions like "is this an integer dtype?". Perhaps this is easy
enough to do for users, like so::

    def _get_dtype(dt_or_arr):
        return dt_or_arr.dtype if hasattr(dt_or_arr, 'dtype') else dt_or_arr

    def is_floating(dtype_or_array):
        dtype = _get_dtype(dtype_or_array)
        return dtype in (float32, float64)

    def is_integer(dtype_or_array):
        dtype = _get_dtype(dtype_or_array)
        return dtype in (uint8, uint16, uint32, uint64, int8, int16, int32, int64)

However it could make sense to add to the standard. Note that NumPy itself
currently does not have a great for asking such questions, see
`gh-17325 <https://github.com/numpy/numpy/issues/17325>`__.



_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: NEP: array API standard adoption (NEP 47)

Sebastian Berg
On Sun, 2021-02-21 at 17:30 +0100, Ralf Gommers wrote:

> Hi all,
>
> Here is a NEP, written together with Stephan Hoyer and Aaron Meurer,
> for
> discussion on adoption of the array API standard (
> https://data-apis.github.io/array-api/latest/). This will add a new
> numpy.array_api submodule containing that standardized API. The main
> purpose of this API is to be able to write code that is portable to
> other
> array/tensor libraries like CuPy, PyTorch, JAX, TensorFlow, Dask, and
> MXNet.
>
> We expect this NEP to remain in draft state for quite a while, while
> we're
> gaining experience with using it in downstream libraries, discuss
> adding it
> to other array libraries, and finishing some of the loose ends (e.g.,
> specifications for linear algebra functions that aren't merged yet,
> see
> https://github.com/data-apis/array-api/pulls) in the API standard
> itself.

There is too much to unpack in a day, I hope I did not miss something
particularly important while reading.
Do you have plans to try some of this outside of NumPy, or maybe make a
repo in the numpy org for it?


Some thoughts:

The DLPack integration: I honestly think we can split that out, maybe
even without or at least a very short NEP.  It seems like a good
addition to me.  And a simple one, especially if we don't need to
integrate it into `np.array(...)`.

---

It seems the current idea is to create a new NumPy array subclass. That
sounds good, but I am a bit worried how that is going to interact with
actual NumPy arrays.
Most SciPy users, will still use NumPy proper.  So SciPy must not
return this "random" subclass, when the user did pass in a NumPy array
(or even by default).

At that point, what will the SciPy dev do to juggle the fact that you
would like to use the new API internally, but the interface must still
default to NumPy (and may depend on the input)?
This is probably not very tricky, but I am slightly worried about what
happens when things get mixed up.  Also if a user passes this "minimal
array" into a current NumPy API function, it will often not work if it
is a subclass.

---

Related to that: how important is it to keep that namespace a "minimal"
implementation, rather than a "conforming" one?  For example, would you
want to reject `numpy_api.array([1, 2, 3, 4], dtype="i,i")`? Or just
`dtype="complex128"`?
Maybe I got the wrong impression though.  Is the aim for a minimal
implementation, but you are OK as long as it is a conforming one?

---

The implementation mentions bypassing `__array_function__` in the
current implementation.  That requires a semi-formalization of how
`__array_function__` should be "bypassed".  I think that is useful, but
I have also been wondering about going the pytorch route of in-lining
the check to avoid the current overheads. That avoids a bit jumping
between C and python and multiple function calls, but might mean that
`func._implementation` is a bit in the way.

Besides `_implementation` usually does support array subclasses and may
still dispatch again internally at this time!

---

I am somewhat worried that getting the promotion (and other quirks) to
where you want could be very tricky, unless we are patient enough to
wait for NumPy proper to evolve.
Hopefully I am just too pessimistic and e.g. a mild form of code
duplication can solve all of that.  Probably time and trial-and-error
will be the judge on that...


Cheers,

Sebastian


>
> See
> https://mail.python.org/pipermail/numpy-discussion/2020-November/081181.html
> for an initial discussion about this topic.
>
> Please keep high-level discussion here and detailed comments on
> https://github.com/numpy/numpy/pull/18456. Also, you can access a
> rendered
> version of the NEP from that PR (see PR description for how), which
> may be
> helpful.
> Cheers,
> Ralf
>
>
> Abstract
> --------
>
> We propose to adopt the `Python array API standard`_, developed by
> the
> `Consortium for Python Data API Standards`_. Implementing this as a
> separate
> new namespace in NumPy will allow authors of libraries which depend
> on NumPy
> as well as end users to write code that is portable between NumPy and
> all
> other array/tensor libraries that adopt this standard.
>
> .. note::
>
>     We expect that this NEP will remain in a draft state for quite a
> while.
>     Given the large scope we don't expect to propose it for
> acceptance any
>     time soon; instead, we want to solicit feedback on both the high-
> level
>     design and implementation, and learn what needs describing better
> in
> this
>     NEP or changing in either the implementation or the array API
> standard
>     itself.
>
>
> Motivation and Scope
> --------------------
>
> Python users have a wealth of choice for libraries and frameworks for
> numerical computing, data science, machine learning, and deep
> learning. New
> frameworks pushing forward the state of the art in these fields are
> appearing
> every year. One unintended consequence of all this activity and
> creativity
> has been fragmentation in multidimensional array (a.k.a. tensor)
> libraries -
> which are the fundamental data structure for these fields. Choices
> include
> NumPy, Tensorflow, PyTorch, Dask, JAX, CuPy, MXNet, and others.
>
> The APIs of each of these libraries are largely similar, but with
> enough
> differences that it’s quite difficult to write code that works with
> multiple
> (or all) of these libraries. The array API standard aims to address
> that
> issue, by specifying an API for the most common ways arrays are
> constructed
> and used. The proposed API is quite similar to NumPy's API, and
> deviates
> mainly
> in places where (a) NumPy made design choices that are inherently not
> portable
> to other implementations, and (b) where other libraries consistently
> deviated
> from NumPy on purpose because NumPy's design turned out to have
> issues or
> unnecessary complexity.
>
> For a longer discussion on the purpose of the array API standard we
> refer to
> the `Purpose and Scope section of the array API standard <
> https://data-apis.github.io/array-api/latest/purpose_and_scope.html>`
> __
> and the two blog posts announcing the formation of the Consortium
> [1]_ and
> the release of the first draft version of the standard for community
> review
> [2]_.
>
> The scope of this NEP includes:
>
> - Adopting the 2021 version of the array API standard
> - Adding a separate namespace, tentatively named ``numpy.array_api``
> - Changes needed/desired outside of the new namespace, for example
> new
> dunder
>   methods on the ``ndarray`` object
> - Implementation choices, and differences between functions in the
> new
>   namespace with those in the main ``numpy`` namespace
> - A new array object conforming to the array API standard
> - Maintenance effort and testing strategy
> - Impact on NumPy's total exposed API surface and on other future and
>   under-discussion design choices
> - Relation to existing and proposed NumPy array protocols
>   (``__array_ufunc__``, ``__array_function__``,
> ``__array_module__``).
> - Required improvements to existing NumPy functionality
>
> Out of scope for this NEP are:
>
> - Changes in the array API standard itself. Those are likely to come
> up
>   during review of this NEP, but should be upstreamed as needed and
> this NEP
>   subsequently updated.
>
>
> Usage and Impact
> ----------------
>
> *This section will be fleshed out later, for now we refer to the use
> cases
> given
> in* `the array API standard Use Cases section <
> https://data-apis.github.io/array-api/latest/use_cases.html>`__
>
> In addition to those use cases, the new namespace contains
> functionality
> that
> is widely used and supported by many array libraries. As such, it is
> a good
> set of functions to teach to newcomers to NumPy and recommend as
> "best
> practice". That contrasts with NumPy's main namespace, which contains
> many
> functions and objects that have been superceded or we consider
> mistakes -
> but
> that we can't remove because of backwards compatibility reasons.
>
> The usage of the ``numpy.array_api`` namespace by downstream
> libraries is
> intended to enable them to consume multiple kinds of arrays, *without
> having
> to have a hard dependency on all of those array libraries*:
>
> .. image:: _static/nep-0047-library-dependencies.png
>
> Adoption in downstream libraries
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> The prototype implementation of the ``array_api`` namespace will be
> used
> with
> SciPy, scikit-learn and other libraries of interest that depend on
> NumPy, in
> order to get more experience with the design and find out if any
> important
> parts are missing.
>
> The pattern to support multiple array libraries is intended to be
> something
> like::
>
>     def somefunc(x, y):
>         # Retrieves standard namespace. Raises if x and y have
> different
>         # namespaces.  See Appendix for possible get_namespace
> implementation
>         xp = get_namespace(x, y)
>         out = xp.mean(x, axis=0) + 2*xp.std(y, axis=0)
>         return out
>
> The ``get_namespace`` call is effectively the library author opting
> in to
> using the standard API namespace, and thereby explicitly supporting
> all conforming array libraries.
>
>
> The ``asarray`` / ``asanyarray`` pattern
> ````````````````````````````````````````
>
> Many existing libraries use the same ``asarray`` (or ``asanyarray``)
> pattern
> as NumPy itself does; accepting any object that can be coerced into a
> ``np.ndarray``.
> We consider this design pattern problematic - keeping in mind the Zen
> of
> Python, *"explicit is better than implicit"*, as well as the pattern
> being
> historically problematic in the SciPy ecosystem for ``ndarray``
> subclasses
> and with over-eager object creation. All other array/tensor libraries
> are
> more strict, and that works out fine in practice. We would advise
> authors of
> new libraries to avoid the ``asarray`` pattern. Instead they should
> either
> accept just NumPy arrays or, if they want to support multiple kinds
> of
> arrays, check if the incoming array object supports the array API
> standard
> by checking for ``__array_namespace__`` as shown in the example
> above.
>
> Existing libraries can do such a check as well, and only call
> ``asarray`` if
> the check fails. This is very similar to the ``__duckarray__`` idea
> in
> :ref:`NEP30`.
>
>
> .. _adoption-application-code:
>
> Adoption in application code
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> The new namespace can be seen by end users as a cleaned up and
> slimmed down
> version of NumPy's main namespace. Encouraging end users to use this
> namespace like::
>
>     import numpy.array_api as xp
>
>     x = xp.linspace(0, 2*xp.pi, num=100)
>     y = xp.cos(x)
>
> seems perfectly reasonable, and potentially beneficial - users get
> offered
> only
> one function for each purpose (the one we consider best-practice),
> and they
> then write code that is more easily portable to other libraries.
>
>
> Backward compatibility
> ----------------------
>
> No deprecations or removals of existing NumPy APIs or other backwards
> incompatible changes are proposed.
>
>
> High-level design
> -----------------
>
> The array API standard consists of approximately 120 objects, all of
> which
> have a direct NumPy equivalent. This figure shows what is included at
> a
> high level:
>
> .. image:: _static/nep-0047-scope-of-array-API.png
>
> The most important changes compared to what NumPy currently offers
> are:
>
> - A new array object which:
>
>     - conforms to the casting rules and indexing behaviour specified
> by the
>       standard,
>     - does not have methods other than dunder methods,
>     - does not support the full range of NumPy indexing behaviour.
> Advanced
>       indexing with integers is not supported. Only boolean indexing
>       with a single (possibly multi-dimensional) boolean array is
> supported.
>       An indexing expression that selects a single element returns a
> 0-D
> array
>       rather than a scalar.
>
> - Functions in the ``array_api`` namespace:
>
>     - do not accept ``array_like`` inputs, only NumPy arrays and
> Python
> scalars
>     - do not support ``__array_ufunc__`` and ``__array_function__``,
>     - use positional-only and keyword-only parameters in their
> signatures,
>     - have inline type annotations,
>     - may have minor changes to signatures and semantics of
> individual
>       functions compared to their equivalents already present in
> NumPy,
>     - only support dtype literals, not format strings or other ways
> of
>       specifying dtypes
>
> - DLPack_ support will be added to NumPy,
> - New syntax for "device support" will be added, through a
> ``.device``
>   attribute on the new array object, and ``device=`` keywords in
> array
> creation
>   functions in the ``array_api`` namespace,
> - Casting rules that differ from those NumPy currently has. Output
> dtypes
> can
>   be derived from input dtypes (i.e. no value-based casting), and 0-D
> arrays
>   are treated like >=1-D arrays.
> - Not all dtypes NumPy has are part of the standard. Only boolean,
> signed
> and
>   unsigned integers, and floating-point dtypes up to ``float64`` are
> supported.
>   Complex dtypes are expected to be added in the next version of the
> standard.
>   Extended precision, string, void, object and datetime dtypes, as
> well as
>   structured dtypes, are not included.
>
> Improvements to existing NumPy functionality that are needed include:
>
> - Add support for stacks of matrices to some functions in
> ``numpy.linalg``
>   that are currently missing such support.
> - Add the ``keepdims`` keyword to ``np.argmin`` and ``np.argmax``.
> - Add a "never copy" mode to ``np.asarray``.
>
>
> Functions in the ``array_api`` namespace
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Let's start with an example of a function implementation that shows
> the most
> important differences with the equivalent function in the main
> namespace::
>
>     def max(x: array, /, *,
>             axis: Optional[Union[int, Tuple[int, ...]]] = None,
>             keepdims: bool = False
>         ) -> array:
>         """
>         Array API compatible wrapper for :py:func:`np.max
> <numpy.max>`.
>         """
>         return np.max._implementation(x, axis=axis,
> keepdims=keepdims)
>
> This function does not accept ``array_like`` inputs, only
> ``ndarray``. There
> are multiple reasons for this. Other array libraries all work like
> this.
> Letting the user do coercion of lists, generators, or other foreign
> objects
> separately results in a cleaner design with less unexpected
> behaviour.
> It's higher-performance - less overhead from ``asarray`` calls.
> Static
> typing
> is easier. Subclasses will work as expected. And the slight increase
> in
> verbosity
> because users have to explicitly coerce to ``ndarray`` on rare
> occasions
> seems like a small price to pay.
>
> This function does not support ``__array_ufunc__`` nor
> ``__array_function__``.
> These protocols serve a similar purpose as the array API standard
> module
> itself,
> but through a different mechanisms. Because only ``ndarray``
> instances are
> accepted,
> dispatching via one of these protocols isn't useful anymore.
>
> This function uses positional-only parameters in its signature. This
> makes
> code
> more portable - writing ``max(x=x, ...)`` is no longer valid, hence
> if other
> libraries call the first parameter ``input`` rather than ``x``, that
> is
> fine.
> The rationale for keyword-only parameters (not shown in the above
> example)
> is
> two-fold: clarity of end user code, and it being easier to extend the
> signature
> in the future with keywords in the desired order.
>
> This function has inline type annotations. Inline annotations are far
> easier to
> maintain than separate stub files. And because the types are simple,
> this
> will
> not result in a large amount of clutter with type aliases or unions
> like in
> the
> current stub files NumPy has.
>
>
> DLPack support for zero-copy data interchange
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> The ability to convert one kind of array into another kind is
> valuable, and
> indeed necessary when downstream libraries want to support multiple
> kinds of
> arrays. This requires a well-specified data exchange protocol. NumPy
> already
> supports two of these, namely the buffer protocol (i.e., PEP 3118),
> and
> the ``__array_interface__`` (Python side) / ``__array_struct__`` (C
> side)
> protocol. Both work similarly, letting the "producer" describe how
> the data
> is laid out in memory so the "consumer" can construct its own kind of
> array
> with a view on that data.
>
> DLPack works in a very similar way. The main reasons to prefer DLPack
> over
> the options already present in NumPy are:
>
> 1. DLPack is the only protocol with device support (e.g., GPUs using
> CUDA or
>    ROCm drivers, or OpenCL devices). NumPy is CPU-only, but other
> array
>    libraries are not. Having one protocol per device isn't tenable,
> hence
>    device support is a must.
> 2. Widespread support. DLPack has the widest adoption of all
> protocols, only
>    NumPy is missing support. And the experiences of other libraries
> with it
>    are positive. This contrasts with the protocols NumPy does
> support, which
>    are used very little - when other libraries want to interoperate
> with
>    NumPy, they typically use the (more limited, and NumPy-specific)
>    ``__array__`` protocol.
>
> Adding support for DLPack to NumPy entails:
>
> - Adding a ``ndarray.__dlpack__`` method
> - Adding a ``from_dlpack`` function, which takes as input an object
>   supporting ``__dlpack__``, and returns an ``ndarray``.
>
> DLPack is currently a ~200 LoC header, and is meant to be included
> directly, so
> no external dependency is needed. Implementation should be
> straightforward.
>
>
> Syntax for device support
> ~~~~~~~~~~~~~~~~~~~~~~~~~
>
> NumPy itself is CPU-only, so it clearly doesn't have a need for
> device
> support.
> However, other libraries (e.g. TensorFlow, PyTorch, JAX, MXNet)
> support
> multiple types of devices: CPU, GPU, TPU, and more exotic hardware.
> To write portable code on systems with multiple devices, it's often
> necessary
> to create new arrays on the same device as some other array, or check
> that
> two arrays live on the same device. Hence syntax for that is needed.
>
> The array object will have a ``.device`` attribute which enables
> comparing
> devices of different arrays (they only should compare equal if both
> arrays
> are
> from the same library and it's the same hardware device).
> Furthermore,
> ``device=`` keywords in array creation functions are needed. For
> example::
>
>     def empty(shape: Union[int, Tuple[int, ...]], /, *,
>               dtype: Optional[dtype] = None,
>               device: Optional[device] = None) -> array:
>         """
>         Array API compatible wrapper for :py:func:`np.empty
> <numpy.empty>`.
>         """
>         return np.empty(shape, dtype=dtype, device=device)
>
> The implementation for NumPy may be as simple as setting the device
> attribute to
> the string ``'cpu'`` and raising an exception if array creation
> functions
> encounter any other value.
>
>
> Dtypes and casting rules
> ~~~~~~~~~~~~~~~~~~~~~~~~
>
> The supported dtypes in this namespace are boolean, 8/16/32/64-bit
> signed
> and
> unsigned integer, and 32/64-bit floating-point dtypes. These will be
> added
> to
> the namespace as dtype literals with the expected names (e.g.,
> ``bool``,
> ``uint16``, ``float64``).
>
> The most obvious omissions are the complex dtypes. The rationale for
> the
> lack
> of complex support in the first version of the array API standard is
> that
> several
> libraries (PyTorch, MXNet) are still in the process of adding support
> for
> complex dtypes. The next version of the standard is expected to
> include
> ``complex64``
> and ``complex128`` (see `this issue <
> https://github.com/data-apis/array-api/issues/102>`__
> for more details).
>
> Specifying dtypes to functions, e.g. via the ``dtype=`` keyword, is
> expected
> to only use the dtype literals. Format strings, Python builtin
> dtypes, or
> string representations of the dtype literals are not accepted - this
> will
> improve readability and portability of code at little cost.
>
> Casting rules are only defined between different dtypes of the same
> kind.
> The
> rationale for this is that mixed-kind (e.g., integer to floating-
> point)
> casting behavior differs between libraries. NumPy's mixed-kind
> casting
> behavior doesn't need to be changed or restricted, it only needs to
> be
> documented that if users use mixed-kind casting, their code may not
> be
> portable.
>
> .. image:: _static/nep-0047-casting-rules-lattice.png
>
> *Type promotion diagram. Promotion between any two types is given by
> their
> join on this lattice. Only the types of participating arrays matter,
> not
> their values. Dashed lines indicate that behaviour for Python scalars
> is
> undefined on overflow. Boolean, integer and floating-point dtypes are
> not
> connected, indicating mixed-kind promotion is undefined.*
>
> The most important difference between the casting rules in NumPy and
> in the
> array API standard is how scalars and 0-dimensional arrays are
> handled. In
> the standard, array scalars do not exist and 0-dimensional arrays
> follow the
> same casting rules as higher-dimensional arrays.
>
> See the `Type Promotion Rules section of the array API standard <
> https://data-apis.github.io/array-api/latest/API_specification/type_promotion.html
> > `__
> for more details.
>
> .. note::
>
>     It is not clear what the best way is to support the different
> casting
> rules
>     for 0-dimensional arrays and no value-based casting. One option
> may be
> to
>     implement this second set of casting rules, keep them private,
> mark the
>     array API functions with a private attribute that says they
> adhere to
>     these different rules, and let the casting machinery check
> whether for
>     that attribute.
>
>     This needs discussion.
>
>
> Indexing
> ~~~~~~~~
>
> An indexing expression that would return a scalar with ``ndarray``,
> e.g.
> ``arr_2d[0, 0]``, will return a 0-D array with the new array object.
> There
> are
> several reasons for that: array scalars are largely considered a
> design
> mistake
> which no other array library copied; it works better for non-CPU
> libraries
> (typically arrays can live on the device, scalars live on the host);
> and
> it's
> simply a consistent design. To get a Python scalar out of a 0-D
> array, one
> can
> simply use the builtin for the type, e.g. ``float(arr_0d)``.
>
> The other `indexing modes in the standard <
> https://data-apis.github.io/array-api/latest/API_specification/indexing.html
> > `__
> do work largely the same as they do for ``numpy.ndarray``. One
> noteworthy
> difference is that clipping in slice indexing (e.g., ``a[:n]`` where
> ``n``
> is
> larger than the size of the first axis) is unspecified behaviour,
> because
> that kind of check can be expensive on accelerators.
>
> The lack of advanced indexing, and boolean indexing being limited to
> a
> single
> n-D boolean array, is due to those indexing modes not being suitable
> for all
> types of arrays or JIT compilation. Their absence does not seem to be
> problematic; if a user or library author wants to use them, they can
> do so
> through zero-copy conversion to ``numpy.ndarray``. This will signal
> correctly
> to whomever reads the code that it is then NumPy-specific rather than
> portable
> to all conforming array types.
>
>
>
> The array object
> ~~~~~~~~~~~~~~~~
>
> The array object in the standard does not have methods other than
> dunder
> methods. The rationale for that is that not all array libraries have
> methods
> on their array object (e.g., TensorFlow does not). It also provides
> only a
> single way of doing something, rather than have functions and methods
> that
> are effectively duplicate.
>
> Mixing operations that may produce views (e.g., indexing,
> ``nonzero``)
> in combination with mutation (e.g., item or slice assignment) is
> `explicitly documented in the standard to not be supported <
> https://data-apis.github.io/array-api/latest/design_topics/copies_views_and_mutation.html
> > `__.
> This cannot easily be prohibited in the array object itself; instead
> this
> will
> be guidance to the user via documentation.
>
> The standard current does not prescribe a name for the array object
> itself.
> We propose to simply name it ``ndarray``. This is the most obvious
> name, and
> because of the separate namespace should not clash with
> ``numpy.ndarray``.
>
>
> Implementation
> --------------
>
> .. note::
>
>     This section needs a lot more detail, which will gradually be
> added when
>     the implementation progresses.
>
> A prototype of the ``array_api`` namespace can be found in
> https://github.com/data-apis/numpy/tree/array-api/numpy/_array_api.
> The docstring in its ``__init__.py`` has notes on completeness of the
> implementation. The code for the wrapper functions also contains ``#
> Note:``
> comments everywhere there is a difference with the NumPy API.
> Two important parts that are not implemented yet are the new array
> object
> and
> DLPack support. Functions may need changes to ensure the changed
> casting
> rules
> are respected.
>
> The array object
> ~~~~~~~~~~~~~~~~
>
> Regarding the array object implementation, we plan to start with a
> regular
> Python class that wraps a ``numpy.ndarray`` instance. Attributes and
> methods
> can forward to that wrapped instance, applying input validation and
> implementing changed behaviour as needed.
>
> The casting rules are probably the most challenging part. The in-
> progress
> dtype system refactor (NEPs 40-43) should make implementing the
> correct
> casting
> behaviour easier - it is already moving away from value-based casting
> for
> example.
>
>
> The dtype objects
> ~~~~~~~~~~~~~~~~~
>
> We must be able to compare dtypes for equality, and expressions like
> these
> must
> be possible::
>
>     np.array_api.some_func(..., dtype=x.dtype)
>
> The above implies it would be nice to have ``np.array_api.float32 ==
> np.array_api.ndarray(...).dtype``.
>
> Dtypes should not be assumed to have a class hierarchy by users,
> however we
> are
> free to implement it with a class hierarchy if that's convenient. We
> considered
> the following options to implement dtype objects:
>
> 1. Alias dtypes to those in the main namespace. E.g.,
> ``np.array_api.float32 =
>    np.float32``.
> 2. Make the dtypes instances of ``np.dtype``. E.g.,
> ``np.array_api.float32 =
>    np.dtype(np.float32)``.
> 3. Create new singleton classes with only the required
> methods/attributes
>    (currently just ``__eq__``).
>
> It seems like (2) would be easiest from the perspective of
> interacting with
> functions outside the main namespace. And (3) would adhere best to
> the
> standard.
>
> TBD: the standard does not yet have a good way to inspect properties
> of a
> dtype, to ask questions like "is this an integer dtype?". Perhaps
> this is
> easy
> enough to do for users, like so::
>
>     def _get_dtype(dt_or_arr):
>         return dt_or_arr.dtype if hasattr(dt_or_arr, 'dtype') else
> dt_or_arr
>
>     def is_floating(dtype_or_array):
>         dtype = _get_dtype(dtype_or_array)
>         return dtype in (float32, float64)
>
>     def is_integer(dtype_or_array):
>         dtype = _get_dtype(dtype_or_array)
>         return dtype in (uint8, uint16, uint32, uint64, int8, int16,
> int32,
> int64)
>
> However it could make sense to add to the standard. Note that NumPy
> itself
> currently does not have a great for asking such questions, see
> `gh-17325 <https://github.com/numpy/numpy/issues/17325>`__.
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: NEP: array API standard adoption (NEP 47)

ralfgommers
Thanks for the feedback Sebastian!


On Mon, Feb 22, 2021 at 7:49 PM Sebastian Berg <[hidden email]> wrote:
On Sun, 2021-02-21 at 17:30 +0100, Ralf Gommers wrote:
> Hi all,
>
> Here is a NEP, written together with Stephan Hoyer and Aaron Meurer,
> for
> discussion on adoption of the array API standard (
> https://data-apis.github.io/array-api/latest/). This will add a new
> numpy.array_api submodule containing that standardized API. The main
> purpose of this API is to be able to write code that is portable to
> other
> array/tensor libraries like CuPy, PyTorch, JAX, TensorFlow, Dask, and
> MXNet.
>
> We expect this NEP to remain in draft state for quite a while, while
> we're
> gaining experience with using it in downstream libraries, discuss
> adding it
> to other array libraries, and finishing some of the loose ends (e.g.,
> specifications for linear algebra functions that aren't merged yet,
> see
> https://github.com/data-apis/array-api/pulls) in the API standard
> itself.


There is too much to unpack in a day, I hope I did not miss something
particularly important while reading.
Do you have plans to try some of this outside of NumPy, or maybe make a
repo in the numpy org for it?


Some thoughts:

The DLPack integration: I honestly think we can split that out, maybe
even without or at least a very short NEP.  It seems like a good
addition to me.  And a simple one, especially if we don't need to
integrate it into `np.array(...)`.

I agree. I included it in the NEP because it's the standard exchange mechanism in the API standard, but it makes perfect sense to implement as a standalone feature.


---

It seems the current idea is to create a new NumPy array subclass. That
sounds good, but I am a bit worried how that is going to interact with
actual NumPy arrays.

Not a subclass! If you got that impression, I should clarify the text. The idea is a standalone class that doesn't inherit from anything, and has only the methods, attributes and semantics described in the API standard. It just uses np.asarray under the hood. This will be clearer when we have a prototype for it, probably within two weeks.

Most SciPy users, will still use NumPy proper.  So SciPy must not
return this "random" subclass, when the user did pass in a NumPy array
(or even by default).

Agreed. ndarray in = ndarray out; new array object in = new array object out.


At that point, what will the SciPy dev do to juggle the fact that you
would like to use the new API internally, but the interface must still
default to NumPy (and may depend on the input)?
This is probably not very tricky, but I am slightly worried about what
happens when things get mixed up.  Also if a user passes this "minimal
array" into a current NumPy API function, it will often not work if it
is a subclass.

---

Related to that: how important is it to keep that namespace a "minimal"
implementation, rather than a "conforming" one?  For example, would you
want to reject `numpy_api.array([1, 2, 3, 4], dtype="i,i")`? Or just
`dtype="complex128"`?

Yes, I'd definitely want to reject that. Format strings are terrible.

Maybe I got the wrong impression though.  Is the aim for a minimal
implementation, but you are OK as long as it is a conforming one?

In principle we're okay with a conforming one that's a superset of what is in the standard. But I think we'd only want to do that if creating a minimal one turns out to be difficult. Having the minimal required set is much nicer when one wants to write portable code. Because then you can do so without checking the docs whether any object/method is in "minimal" or in "extended".


---

The implementation mentions bypassing `__array_function__` in the
current implementation.  That requires a semi-formalization of how
`__array_function__` should be "bypassed".

I think simply:

def somefunc(x):
    # do whatever checks needed here for, e.g., input validation
    # then call the native numpy implementation:
    return np.somefunc._implementation(x)


  I think that is useful, but
I have also been wondering about going the pytorch route of in-lining
the check to avoid the current overheads. That avoids a bit jumping
between C and python and multiple function calls, but might mean that
`func._implementation` is a bit in the way.

Besides `_implementation` usually does support array subclasses and may
still dispatch again internally at this time!

What, it dispatches again? That seems very suboptimal. If there's no clean way to avoid a dispatch, it may make sense to just check array inputs for the presence of __array_function__ and raise an exception if it's present.

It's not just about overhead (that's a minor thing), it's that the feature does not make much sense in combination with the array_api namespace. The "get a hold of a new namespace" approach is like __array_module__, which was an alternative to __array_function__ not an addition to it.



---

I am somewhat worried that getting the promotion (and other quirks) to
where you want could be very tricky, unless we are patient enough to
wait for NumPy proper to evolve.
Hopefully I am just too pessimistic and e.g. a mild form of code
duplication can solve all of that.  Probably time and trial-and-error
will be the judge on that...

I do agree that the different casting rules are the single most tricky issue implementation wise.

Cheers,
Ralf




Cheers,

Sebastian


>
> See
> https://mail.python.org/pipermail/numpy-discussion/2020-November/081181.html
> for an initial discussion about this topic.
>
> Please keep high-level discussion here and detailed comments on
> https://github.com/numpy/numpy/pull/18456. Also, you can access a
> rendered
> version of the NEP from that PR (see PR description for how), which
> may be
> helpful.
> Cheers,
> Ralf
>
>
> Abstract
> --------
>
> We propose to adopt the `Python array API standard`_, developed by
> the
> `Consortium for Python Data API Standards`_. Implementing this as a
> separate
> new namespace in NumPy will allow authors of libraries which depend
> on NumPy
> as well as end users to write code that is portable between NumPy and
> all
> other array/tensor libraries that adopt this standard.
>
> .. note::
>
>     We expect that this NEP will remain in a draft state for quite a
> while.
>     Given the large scope we don't expect to propose it for
> acceptance any
>     time soon; instead, we want to solicit feedback on both the high-
> level
>     design and implementation, and learn what needs describing better
> in
> this
>     NEP or changing in either the implementation or the array API
> standard
>     itself.
>
>
> Motivation and Scope
> --------------------
>
> Python users have a wealth of choice for libraries and frameworks for
> numerical computing, data science, machine learning, and deep
> learning. New
> frameworks pushing forward the state of the art in these fields are
> appearing
> every year. One unintended consequence of all this activity and
> creativity
> has been fragmentation in multidimensional array (a.k.a. tensor)
> libraries -
> which are the fundamental data structure for these fields. Choices
> include
> NumPy, Tensorflow, PyTorch, Dask, JAX, CuPy, MXNet, and others.
>
> The APIs of each of these libraries are largely similar, but with
> enough
> differences that it’s quite difficult to write code that works with
> multiple
> (or all) of these libraries. The array API standard aims to address
> that
> issue, by specifying an API for the most common ways arrays are
> constructed
> and used. The proposed API is quite similar to NumPy's API, and
> deviates
> mainly
> in places where (a) NumPy made design choices that are inherently not
> portable
> to other implementations, and (b) where other libraries consistently
> deviated
> from NumPy on purpose because NumPy's design turned out to have
> issues or
> unnecessary complexity.
>
> For a longer discussion on the purpose of the array API standard we
> refer to
> the `Purpose and Scope section of the array API standard <
> https://data-apis.github.io/array-api/latest/purpose_and_scope.html>`
> __
> and the two blog posts announcing the formation of the Consortium
> [1]_ and
> the release of the first draft version of the standard for community
> review
> [2]_.
>
> The scope of this NEP includes:
>
> - Adopting the 2021 version of the array API standard
> - Adding a separate namespace, tentatively named ``numpy.array_api``
> - Changes needed/desired outside of the new namespace, for example
> new
> dunder
>   methods on the ``ndarray`` object
> - Implementation choices, and differences between functions in the
> new
>   namespace with those in the main ``numpy`` namespace
> - A new array object conforming to the array API standard
> - Maintenance effort and testing strategy
> - Impact on NumPy's total exposed API surface and on other future and
>   under-discussion design choices
> - Relation to existing and proposed NumPy array protocols
>   (``__array_ufunc__``, ``__array_function__``,
> ``__array_module__``).
> - Required improvements to existing NumPy functionality
>
> Out of scope for this NEP are:
>
> - Changes in the array API standard itself. Those are likely to come
> up
>   during review of this NEP, but should be upstreamed as needed and
> this NEP
>   subsequently updated.
>
>
> Usage and Impact
> ----------------
>
> *This section will be fleshed out later, for now we refer to the use
> cases
> given
> in* `the array API standard Use Cases section <
> https://data-apis.github.io/array-api/latest/use_cases.html>`__
>
> In addition to those use cases, the new namespace contains
> functionality
> that
> is widely used and supported by many array libraries. As such, it is
> a good
> set of functions to teach to newcomers to NumPy and recommend as
> "best
> practice". That contrasts with NumPy's main namespace, which contains
> many
> functions and objects that have been superceded or we consider
> mistakes -
> but
> that we can't remove because of backwards compatibility reasons.
>
> The usage of the ``numpy.array_api`` namespace by downstream
> libraries is
> intended to enable them to consume multiple kinds of arrays, *without
> having
> to have a hard dependency on all of those array libraries*:
>
> .. image:: _static/nep-0047-library-dependencies.png
>
> Adoption in downstream libraries
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> The prototype implementation of the ``array_api`` namespace will be
> used
> with
> SciPy, scikit-learn and other libraries of interest that depend on
> NumPy, in
> order to get more experience with the design and find out if any
> important
> parts are missing.
>
> The pattern to support multiple array libraries is intended to be
> something
> like::
>
>     def somefunc(x, y):
>         # Retrieves standard namespace. Raises if x and y have
> different
>         # namespaces.  See Appendix for possible get_namespace
> implementation
>         xp = get_namespace(x, y)
>         out = xp.mean(x, axis=0) + 2*xp.std(y, axis=0)
>         return out
>
> The ``get_namespace`` call is effectively the library author opting
> in to
> using the standard API namespace, and thereby explicitly supporting
> all conforming array libraries.
>
>
> The ``asarray`` / ``asanyarray`` pattern
> ````````````````````````````````````````
>
> Many existing libraries use the same ``asarray`` (or ``asanyarray``)
> pattern
> as NumPy itself does; accepting any object that can be coerced into a
> ``np.ndarray``.
> We consider this design pattern problematic - keeping in mind the Zen
> of
> Python, *"explicit is better than implicit"*, as well as the pattern
> being
> historically problematic in the SciPy ecosystem for ``ndarray``
> subclasses
> and with over-eager object creation. All other array/tensor libraries
> are
> more strict, and that works out fine in practice. We would advise
> authors of
> new libraries to avoid the ``asarray`` pattern. Instead they should
> either
> accept just NumPy arrays or, if they want to support multiple kinds
> of
> arrays, check if the incoming array object supports the array API
> standard
> by checking for ``__array_namespace__`` as shown in the example
> above.
>
> Existing libraries can do such a check as well, and only call
> ``asarray`` if
> the check fails. This is very similar to the ``__duckarray__`` idea
> in
> :ref:`NEP30`.
>
>
> .. _adoption-application-code:
>
> Adoption in application code
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> The new namespace can be seen by end users as a cleaned up and
> slimmed down
> version of NumPy's main namespace. Encouraging end users to use this
> namespace like::
>
>     import numpy.array_api as xp
>
>     x = xp.linspace(0, 2*xp.pi, num=100)
>     y = xp.cos(x)
>
> seems perfectly reasonable, and potentially beneficial - users get
> offered
> only
> one function for each purpose (the one we consider best-practice),
> and they
> then write code that is more easily portable to other libraries.
>
>
> Backward compatibility
> ----------------------
>
> No deprecations or removals of existing NumPy APIs or other backwards
> incompatible changes are proposed.
>
>
> High-level design
> -----------------
>
> The array API standard consists of approximately 120 objects, all of
> which
> have a direct NumPy equivalent. This figure shows what is included at
> a
> high level:
>
> .. image:: _static/nep-0047-scope-of-array-API.png
>
> The most important changes compared to what NumPy currently offers
> are:
>
> - A new array object which:
>
>     - conforms to the casting rules and indexing behaviour specified
> by the
>       standard,
>     - does not have methods other than dunder methods,
>     - does not support the full range of NumPy indexing behaviour.
> Advanced
>       indexing with integers is not supported. Only boolean indexing
>       with a single (possibly multi-dimensional) boolean array is
> supported.
>       An indexing expression that selects a single element returns a
> 0-D
> array
>       rather than a scalar.
>
> - Functions in the ``array_api`` namespace:
>
>     - do not accept ``array_like`` inputs, only NumPy arrays and
> Python
> scalars
>     - do not support ``__array_ufunc__`` and ``__array_function__``,
>     - use positional-only and keyword-only parameters in their
> signatures,
>     - have inline type annotations,
>     - may have minor changes to signatures and semantics of
> individual
>       functions compared to their equivalents already present in
> NumPy,
>     - only support dtype literals, not format strings or other ways
> of
>       specifying dtypes
>
> - DLPack_ support will be added to NumPy,
> - New syntax for "device support" will be added, through a
> ``.device``
>   attribute on the new array object, and ``device=`` keywords in
> array
> creation
>   functions in the ``array_api`` namespace,
> - Casting rules that differ from those NumPy currently has. Output
> dtypes
> can
>   be derived from input dtypes (i.e. no value-based casting), and 0-D
> arrays
>   are treated like >=1-D arrays.
> - Not all dtypes NumPy has are part of the standard. Only boolean,
> signed
> and
>   unsigned integers, and floating-point dtypes up to ``float64`` are
> supported.
>   Complex dtypes are expected to be added in the next version of the
> standard.
>   Extended precision, string, void, object and datetime dtypes, as
> well as
>   structured dtypes, are not included.
>
> Improvements to existing NumPy functionality that are needed include:
>
> - Add support for stacks of matrices to some functions in
> ``numpy.linalg``
>   that are currently missing such support.
> - Add the ``keepdims`` keyword to ``np.argmin`` and ``np.argmax``.
> - Add a "never copy" mode to ``np.asarray``.
>
>
> Functions in the ``array_api`` namespace
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Let's start with an example of a function implementation that shows
> the most
> important differences with the equivalent function in the main
> namespace::
>
>     def max(x: array, /, *,
>             axis: Optional[Union[int, Tuple[int, ...]]] = None,
>             keepdims: bool = False
>         ) -> array:
>         """
>         Array API compatible wrapper for :py:func:`np.max
> <numpy.max>`.
>         """
>         return np.max._implementation(x, axis=axis,
> keepdims=keepdims)
>
> This function does not accept ``array_like`` inputs, only
> ``ndarray``. There
> are multiple reasons for this. Other array libraries all work like
> this.
> Letting the user do coercion of lists, generators, or other foreign
> objects
> separately results in a cleaner design with less unexpected
> behaviour.
> It's higher-performance - less overhead from ``asarray`` calls.
> Static
> typing
> is easier. Subclasses will work as expected. And the slight increase
> in
> verbosity
> because users have to explicitly coerce to ``ndarray`` on rare
> occasions
> seems like a small price to pay.
>
> This function does not support ``__array_ufunc__`` nor
> ``__array_function__``.
> These protocols serve a similar purpose as the array API standard
> module
> itself,
> but through a different mechanisms. Because only ``ndarray``
> instances are
> accepted,
> dispatching via one of these protocols isn't useful anymore.
>
> This function uses positional-only parameters in its signature. This
> makes
> code
> more portable - writing ``max(x=x, ...)`` is no longer valid, hence
> if other
> libraries call the first parameter ``input`` rather than ``x``, that
> is
> fine.
> The rationale for keyword-only parameters (not shown in the above
> example)
> is
> two-fold: clarity of end user code, and it being easier to extend the
> signature
> in the future with keywords in the desired order.
>
> This function has inline type annotations. Inline annotations are far
> easier to
> maintain than separate stub files. And because the types are simple,
> this
> will
> not result in a large amount of clutter with type aliases or unions
> like in
> the
> current stub files NumPy has.
>
>
> DLPack support for zero-copy data interchange
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> The ability to convert one kind of array into another kind is
> valuable, and
> indeed necessary when downstream libraries want to support multiple
> kinds of
> arrays. This requires a well-specified data exchange protocol. NumPy
> already
> supports two of these, namely the buffer protocol (i.e., PEP 3118),
> and
> the ``__array_interface__`` (Python side) / ``__array_struct__`` (C
> side)
> protocol. Both work similarly, letting the "producer" describe how
> the data
> is laid out in memory so the "consumer" can construct its own kind of
> array
> with a view on that data.
>
> DLPack works in a very similar way. The main reasons to prefer DLPack
> over
> the options already present in NumPy are:
>
> 1. DLPack is the only protocol with device support (e.g., GPUs using
> CUDA or
>    ROCm drivers, or OpenCL devices). NumPy is CPU-only, but other
> array
>    libraries are not. Having one protocol per device isn't tenable,
> hence
>    device support is a must.
> 2. Widespread support. DLPack has the widest adoption of all
> protocols, only
>    NumPy is missing support. And the experiences of other libraries
> with it
>    are positive. This contrasts with the protocols NumPy does
> support, which
>    are used very little - when other libraries want to interoperate
> with
>    NumPy, they typically use the (more limited, and NumPy-specific)
>    ``__array__`` protocol.
>
> Adding support for DLPack to NumPy entails:
>
> - Adding a ``ndarray.__dlpack__`` method
> - Adding a ``from_dlpack`` function, which takes as input an object
>   supporting ``__dlpack__``, and returns an ``ndarray``.
>
> DLPack is currently a ~200 LoC header, and is meant to be included
> directly, so
> no external dependency is needed. Implementation should be
> straightforward.
>
>
> Syntax for device support
> ~~~~~~~~~~~~~~~~~~~~~~~~~
>
> NumPy itself is CPU-only, so it clearly doesn't have a need for
> device
> support.
> However, other libraries (e.g. TensorFlow, PyTorch, JAX, MXNet)
> support
> multiple types of devices: CPU, GPU, TPU, and more exotic hardware.
> To write portable code on systems with multiple devices, it's often
> necessary
> to create new arrays on the same device as some other array, or check
> that
> two arrays live on the same device. Hence syntax for that is needed.
>
> The array object will have a ``.device`` attribute which enables
> comparing
> devices of different arrays (they only should compare equal if both
> arrays
> are
> from the same library and it's the same hardware device).
> Furthermore,
> ``device=`` keywords in array creation functions are needed. For
> example::
>
>     def empty(shape: Union[int, Tuple[int, ...]], /, *,
>               dtype: Optional[dtype] = None,
>               device: Optional[device] = None) -> array:
>         """
>         Array API compatible wrapper for :py:func:`np.empty
> <numpy.empty>`.
>         """
>         return np.empty(shape, dtype=dtype, device=device)
>
> The implementation for NumPy may be as simple as setting the device
> attribute to
> the string ``'cpu'`` and raising an exception if array creation
> functions
> encounter any other value.
>
>
> Dtypes and casting rules
> ~~~~~~~~~~~~~~~~~~~~~~~~
>
> The supported dtypes in this namespace are boolean, 8/16/32/64-bit
> signed
> and
> unsigned integer, and 32/64-bit floating-point dtypes. These will be
> added
> to
> the namespace as dtype literals with the expected names (e.g.,
> ``bool``,
> ``uint16``, ``float64``).
>
> The most obvious omissions are the complex dtypes. The rationale for
> the
> lack
> of complex support in the first version of the array API standard is
> that
> several
> libraries (PyTorch, MXNet) are still in the process of adding support
> for
> complex dtypes. The next version of the standard is expected to
> include
> ``complex64``
> and ``complex128`` (see `this issue <
> https://github.com/data-apis/array-api/issues/102>`__
> for more details).
>
> Specifying dtypes to functions, e.g. via the ``dtype=`` keyword, is
> expected
> to only use the dtype literals. Format strings, Python builtin
> dtypes, or
> string representations of the dtype literals are not accepted - this
> will
> improve readability and portability of code at little cost.
>
> Casting rules are only defined between different dtypes of the same
> kind.
> The
> rationale for this is that mixed-kind (e.g., integer to floating-
> point)
> casting behavior differs between libraries. NumPy's mixed-kind
> casting
> behavior doesn't need to be changed or restricted, it only needs to
> be
> documented that if users use mixed-kind casting, their code may not
> be
> portable.
>
> .. image:: _static/nep-0047-casting-rules-lattice.png
>
> *Type promotion diagram. Promotion between any two types is given by
> their
> join on this lattice. Only the types of participating arrays matter,
> not
> their values. Dashed lines indicate that behaviour for Python scalars
> is
> undefined on overflow. Boolean, integer and floating-point dtypes are
> not
> connected, indicating mixed-kind promotion is undefined.*
>
> The most important difference between the casting rules in NumPy and
> in the
> array API standard is how scalars and 0-dimensional arrays are
> handled. In
> the standard, array scalars do not exist and 0-dimensional arrays
> follow the
> same casting rules as higher-dimensional arrays.
>
> See the `Type Promotion Rules section of the array API standard <
> https://data-apis.github.io/array-api/latest/API_specification/type_promotion.html
> > `__
> for more details.
>
> .. note::
>
>     It is not clear what the best way is to support the different
> casting
> rules
>     for 0-dimensional arrays and no value-based casting. One option
> may be
> to
>     implement this second set of casting rules, keep them private,
> mark the
>     array API functions with a private attribute that says they
> adhere to
>     these different rules, and let the casting machinery check
> whether for
>     that attribute.
>
>     This needs discussion.
>
>
> Indexing
> ~~~~~~~~
>
> An indexing expression that would return a scalar with ``ndarray``,
> e.g.
> ``arr_2d[0, 0]``, will return a 0-D array with the new array object.
> There
> are
> several reasons for that: array scalars are largely considered a
> design
> mistake
> which no other array library copied; it works better for non-CPU
> libraries
> (typically arrays can live on the device, scalars live on the host);
> and
> it's
> simply a consistent design. To get a Python scalar out of a 0-D
> array, one
> can
> simply use the builtin for the type, e.g. ``float(arr_0d)``.
>
> The other `indexing modes in the standard <
> https://data-apis.github.io/array-api/latest/API_specification/indexing.html
> > `__
> do work largely the same as they do for ``numpy.ndarray``. One
> noteworthy
> difference is that clipping in slice indexing (e.g., ``a[:n]`` where
> ``n``
> is
> larger than the size of the first axis) is unspecified behaviour,
> because
> that kind of check can be expensive on accelerators.
>
> The lack of advanced indexing, and boolean indexing being limited to
> a
> single
> n-D boolean array, is due to those indexing modes not being suitable
> for all
> types of arrays or JIT compilation. Their absence does not seem to be
> problematic; if a user or library author wants to use them, they can
> do so
> through zero-copy conversion to ``numpy.ndarray``. This will signal
> correctly
> to whomever reads the code that it is then NumPy-specific rather than
> portable
> to all conforming array types.
>
>
>
> The array object
> ~~~~~~~~~~~~~~~~
>
> The array object in the standard does not have methods other than
> dunder
> methods. The rationale for that is that not all array libraries have
> methods
> on their array object (e.g., TensorFlow does not). It also provides
> only a
> single way of doing something, rather than have functions and methods
> that
> are effectively duplicate.
>
> Mixing operations that may produce views (e.g., indexing,
> ``nonzero``)
> in combination with mutation (e.g., item or slice assignment) is
> `explicitly documented in the standard to not be supported <
> https://data-apis.github.io/array-api/latest/design_topics/copies_views_and_mutation.html
> > `__.
> This cannot easily be prohibited in the array object itself; instead
> this
> will
> be guidance to the user via documentation.
>
> The standard current does not prescribe a name for the array object
> itself.
> We propose to simply name it ``ndarray``. This is the most obvious
> name, and
> because of the separate namespace should not clash with
> ``numpy.ndarray``.
>
>
> Implementation
> --------------
>
> .. note::
>
>     This section needs a lot more detail, which will gradually be
> added when
>     the implementation progresses.
>
> A prototype of the ``array_api`` namespace can be found in
> https://github.com/data-apis/numpy/tree/array-api/numpy/_array_api.
> The docstring in its ``__init__.py`` has notes on completeness of the
> implementation. The code for the wrapper functions also contains ``#
> Note:``
> comments everywhere there is a difference with the NumPy API.
> Two important parts that are not implemented yet are the new array
> object
> and
> DLPack support. Functions may need changes to ensure the changed
> casting
> rules
> are respected.
>
> The array object
> ~~~~~~~~~~~~~~~~
>
> Regarding the array object implementation, we plan to start with a
> regular
> Python class that wraps a ``numpy.ndarray`` instance. Attributes and
> methods
> can forward to that wrapped instance, applying input validation and
> implementing changed behaviour as needed.
>
> The casting rules are probably the most challenging part. The in-
> progress
> dtype system refactor (NEPs 40-43) should make implementing the
> correct
> casting
> behaviour easier - it is already moving away from value-based casting
> for
> example.
>
>
> The dtype objects
> ~~~~~~~~~~~~~~~~~
>
> We must be able to compare dtypes for equality, and expressions like
> these
> must
> be possible::
>
>     np.array_api.some_func(..., dtype=x.dtype)
>
> The above implies it would be nice to have ``np.array_api.float32 ==
> np.array_api.ndarray(...).dtype``.
>
> Dtypes should not be assumed to have a class hierarchy by users,
> however we
> are
> free to implement it with a class hierarchy if that's convenient. We
> considered
> the following options to implement dtype objects:
>
> 1. Alias dtypes to those in the main namespace. E.g.,
> ``np.array_api.float32 =
>    np.float32``.
> 2. Make the dtypes instances of ``np.dtype``. E.g.,
> ``np.array_api.float32 =
>    np.dtype(np.float32)``.
> 3. Create new singleton classes with only the required
> methods/attributes
>    (currently just ``__eq__``).
>
> It seems like (2) would be easiest from the perspective of
> interacting with
> functions outside the main namespace. And (3) would adhere best to
> the
> standard.
>
> TBD: the standard does not yet have a good way to inspect properties
> of a
> dtype, to ask questions like "is this an integer dtype?". Perhaps
> this is
> easy
> enough to do for users, like so::
>
>     def _get_dtype(dt_or_arr):
>         return dt_or_arr.dtype if hasattr(dt_or_arr, 'dtype') else
> dt_or_arr
>
>     def is_floating(dtype_or_array):
>         dtype = _get_dtype(dtype_or_array)
>         return dtype in (float32, float64)
>
>     def is_integer(dtype_or_array):
>         dtype = _get_dtype(dtype_or_array)
>         return dtype in (uint8, uint16, uint32, uint64, int8, int16,
> int32,
> int64)
>
> However it could make sense to add to the standard. Note that NumPy
> itself
> currently does not have a great for asking such questions, see
> `gh-17325 <https://github.com/numpy/numpy/issues/17325>`__.
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: NEP: array API standard adoption (NEP 47)

Sebastian Berg
On Mon, 2021-02-22 at 20:16 +0100, Ralf Gommers wrote:

<snip>

> >
> > It seems the current idea is to create a new NumPy array subclass.
> > That
> > sounds good, but I am a bit worried how that is going to interact
> > with
> > actual NumPy arrays.
> >
>
> Not a subclass! If you got that impression, I should clarify the
> text. The
Sorry, you do write "wraps a numpy.ndarray", I am not sure why I got
the subclass idea when reading it yesterday.

But, in that case maybe you should just implement it as:

    def somefunc(x):
        # do whatever checks needed here for, e.g., input validation
        # then call the native numpy implementation:
        x = as_numpy_array(x)
        result = np.somefunc(x)
        return as_minimal(x)

and not even use the `._implementation`?  I guess small issue is that
we don't have a way to call `as_ndarray` on all relevant inputs
conveniently.  In most cases it will be straight forward though.

I assume you need your own `as_numpy_array` call, to reject ndarray-
subclasses, that `_implementation` will allow to pass through.


> idea is a standalone class that doesn't inherit from anything, and
> has only
> the methods, attributes and semantics described in the API standard.
> It
> just uses np.asarray under the hood. This will be clearer when we
> have a
> prototype for it, probably within two weeks.
>

Then I should wait for the prototype, for more discussion :).

> Most SciPy users, will still use NumPy proper.  So SciPy must not
> > return this "random" subclass, when the user did pass in a NumPy
> > array
> > (or even by default).
> >
>
> Agreed. ndarray in = ndarray out; new array object in = new array
> object
> out.
>
>
> > At that point, what will the SciPy dev do to juggle the fact that
> > you
> > would like to use the new API internally, but the interface must
> > still
> > default to NumPy (and may depend on the input)?
> > This is probably not very tricky, but I am slightly worried about
> > what
> > happens when things get mixed up.  Also if a user passes this
> > "minimal
> > array" into a current NumPy API function, it will often not work if
> > it
> > is a subclass.
> >
> > ---
> >
> > Related to that: how important is it to keep that namespace a
> > "minimal"
> > implementation, rather than a "conforming" one?  For example, would
> > you
> > want to reject `numpy_api.array([1, 2, 3, 4], dtype="i,i")`? Or
> > just
> > `dtype="complex128"`?
> >
>
> Yes, I'd definitely want to reject that. Format strings are terrible.

Agreed, I guess I am wondering whether we can find a good solution that
does not involve writing stubs around 140 functions with more strict
input validation.
But maybe it is also not particularly difficult or churn to do... Or
even automate, e.g. from the typing stubs.


>
> Maybe I got the wrong impression though.  Is the aim for a minimal
> > implementation, but you are OK as long as it is a conforming one?
> >
>
> In principle we're okay with a conforming one that's a superset of
> what is
> in the standard. But I think we'd only want to do that if creating a
> minimal one turns out to be difficult. Having the minimal required
> set is
> much nicer when one wants to write portable code. Because then you
> can do
> so without checking the docs whether any object/method is in
> "minimal" or
> in "extended".
>
Right, you would like to have a minimal implementation somewhere.
Having it in NumPy could be convenient, although not strictly
necessary.


<snip>

>
> > Besides `_implementation` usually does support array subclasses and
> > may
> > still dispatch again internally at this time!
> >
>
> What, it dispatches again? That seems very suboptimal. If there's no
> clean
> way to avoid a dispatch, it may make sense to just check array inputs
> for
> the presence of __array_function__ and raise an exception if it's
> present.
>

I do not think our functions where ever rewritten to only use e.g.
`._implementation()` internally. I am not even quite certain that would
be correct for subclasses.

It is annoying that you may have to struggle with it here to do
something that is different from the implicit dispatchers.  But on the
up-side a clear solution would be helpful in any case.

Cheers,

Sebastian


> It's not just about overhead (that's a minor thing), it's that the
> feature
> does not make much sense in combination with the array_api namespace.
> The
> "get a hold of a new namespace" approach is like __array_module__,
> which
> was an alternative to __array_function__ not an addition to it.
>
>
>
> > ---
> >
> > I am somewhat worried that getting the promotion (and other quirks)
> > to
> > where you want could be very tricky, unless we are patient enough
> > to
> > wait for NumPy proper to evolve.
> > Hopefully I am just too pessimistic and e.g. a mild form of code
> > duplication can solve all of that.  Probably time and trial-and-
> > error
> > will be the judge on that...
> >
>
> I do agree that the different casting rules are the single most
> tricky
> issue implementation wise.
>
> Cheers,
> Ralf
>
>
>
> >
> > Cheers,
> >
> > Sebastian
> >
> >
> > >
> > > See
> > >
> >    
> > https://mail.python.org/pipermail/numpy-discussion/2020-November/081181.html
> > > for an initial discussion about this topic.
> > >
> > > Please keep high-level discussion here and detailed comments on
> > > https://github.com/numpy/numpy/pull/18456. Also, you can access a
> > > rendered
> > > version of the NEP from that PR (see PR description for how),
> > > which
> > > may be
> > > helpful.
> > > Cheers,
> > > Ralf
> > >
> > >
> > > Abstract
> > > --------
> > >
> > > We propose to adopt the `Python array API standard`_, developed
> > > by
> > > the
> > > `Consortium for Python Data API Standards`_. Implementing this as
> > > a
> > > separate
> > > new namespace in NumPy will allow authors of libraries which
> > > depend
> > > on NumPy
> > > as well as end users to write code that is portable between NumPy
> > > and
> > > all
> > > other array/tensor libraries that adopt this standard.
> > >
> > > .. note::
> > >
> > >     We expect that this NEP will remain in a draft state for
> > > quite a
> > > while.
> > >     Given the large scope we don't expect to propose it for
> > > acceptance any
> > >     time soon; instead, we want to solicit feedback on both the
> > > high-
> > > level
> > >     design and implementation, and learn what needs describing
> > > better
> > > in
> > > this
> > >     NEP or changing in either the implementation or the array API
> > > standard
> > >     itself.
> > >
> > >
> > > Motivation and Scope
> > > --------------------
> > >
> > > Python users have a wealth of choice for libraries and frameworks
> > > for
> > > numerical computing, data science, machine learning, and deep
> > > learning. New
> > > frameworks pushing forward the state of the art in these fields
> > > are
> > > appearing
> > > every year. One unintended consequence of all this activity and
> > > creativity
> > > has been fragmentation in multidimensional array (a.k.a. tensor)
> > > libraries -
> > > which are the fundamental data structure for these fields.
> > > Choices
> > > include
> > > NumPy, Tensorflow, PyTorch, Dask, JAX, CuPy, MXNet, and others.
> > >
> > > The APIs of each of these libraries are largely similar, but with
> > > enough
> > > differences that it’s quite difficult to write code that works
> > > with
> > > multiple
> > > (or all) of these libraries. The array API standard aims to
> > > address
> > > that
> > > issue, by specifying an API for the most common ways arrays are
> > > constructed
> > > and used. The proposed API is quite similar to NumPy's API, and
> > > deviates
> > > mainly
> > > in places where (a) NumPy made design choices that are inherently
> > > not
> > > portable
> > > to other implementations, and (b) where other libraries
> > > consistently
> > > deviated
> > > from NumPy on purpose because NumPy's design turned out to have
> > > issues or
> > > unnecessary complexity.
> > >
> > > For a longer discussion on the purpose of the array API standard
> > > we
> > > refer to
> > > the `Purpose and Scope section of the array API standard <
> > >    
> > > https://data-apis.github.io/array-api/latest/purpose_and_scope.html
> > > >`
> > > __
> > > and the two blog posts announcing the formation of the Consortium
> > > [1]_ and
> > > the release of the first draft version of the standard for
> > > community
> > > review
> > > [2]_.
> > >
> > > The scope of this NEP includes:
> > >
> > > - Adopting the 2021 version of the array API standard
> > > - Adding a separate namespace, tentatively named
> > > ``numpy.array_api``
> > > - Changes needed/desired outside of the new namespace, for
> > > example
> > > new
> > > dunder
> > >   methods on the ``ndarray`` object
> > > - Implementation choices, and differences between functions in
> > > the
> > > new
> > >   namespace with those in the main ``numpy`` namespace
> > > - A new array object conforming to the array API standard
> > > - Maintenance effort and testing strategy
> > > - Impact on NumPy's total exposed API surface and on other future
> > > and
> > >   under-discussion design choices
> > > - Relation to existing and proposed NumPy array protocols
> > >   (``__array_ufunc__``, ``__array_function__``,
> > > ``__array_module__``).
> > > - Required improvements to existing NumPy functionality
> > >
> > > Out of scope for this NEP are:
> > >
> > > - Changes in the array API standard itself. Those are likely to
> > > come
> > > up
> > >   during review of this NEP, but should be upstreamed as needed
> > > and
> > > this NEP
> > >   subsequently updated.
> > >
> > >
> > > Usage and Impact
> > > ----------------
> > >
> > > *This section will be fleshed out later, for now we refer to the
> > > use
> > > cases
> > > given
> > > in* `the array API standard Use Cases section <
> > > https://data-apis.github.io/array-api/latest/use_cases.html>`__
> > >
> > > In addition to those use cases, the new namespace contains
> > > functionality
> > > that
> > > is widely used and supported by many array libraries. As such, it
> > > is
> > > a good
> > > set of functions to teach to newcomers to NumPy and recommend as
> > > "best
> > > practice". That contrasts with NumPy's main namespace, which
> > > contains
> > > many
> > > functions and objects that have been superceded or we consider
> > > mistakes -
> > > but
> > > that we can't remove because of backwards compatibility reasons.
> > >
> > > The usage of the ``numpy.array_api`` namespace by downstream
> > > libraries is
> > > intended to enable them to consume multiple kinds of arrays,
> > > *without
> > > having
> > > to have a hard dependency on all of those array libraries*:
> > >
> > > .. image:: _static/nep-0047-library-dependencies.png
> > >
> > > Adoption in downstream libraries
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > >
> > > The prototype implementation of the ``array_api`` namespace will
> > > be
> > > used
> > > with
> > > SciPy, scikit-learn and other libraries of interest that depend
> > > on
> > > NumPy, in
> > > order to get more experience with the design and find out if any
> > > important
> > > parts are missing.
> > >
> > > The pattern to support multiple array libraries is intended to be
> > > something
> > > like::
> > >
> > >     def somefunc(x, y):
> > >         # Retrieves standard namespace. Raises if x and y have
> > > different
> > >         # namespaces.  See Appendix for possible get_namespace
> > > implementation
> > >         xp = get_namespace(x, y)
> > >         out = xp.mean(x, axis=0) + 2*xp.std(y, axis=0)
> > >         return out
> > >
> > > The ``get_namespace`` call is effectively the library author
> > > opting
> > > in to
> > > using the standard API namespace, and thereby explicitly
> > > supporting
> > > all conforming array libraries.
> > >
> > >
> > > The ``asarray`` / ``asanyarray`` pattern
> > > ````````````````````````````````````````
> > >
> > > Many existing libraries use the same ``asarray`` (or
> > > ``asanyarray``)
> > > pattern
> > > as NumPy itself does; accepting any object that can be coerced
> > > into a
> > > ``np.ndarray``.
> > > We consider this design pattern problematic - keeping in mind the
> > > Zen
> > > of
> > > Python, *"explicit is better than implicit"*, as well as the
> > > pattern
> > > being
> > > historically problematic in the SciPy ecosystem for ``ndarray``
> > > subclasses
> > > and with over-eager object creation. All other array/tensor
> > > libraries
> > > are
> > > more strict, and that works out fine in practice. We would advise
> > > authors of
> > > new libraries to avoid the ``asarray`` pattern. Instead they
> > > should
> > > either
> > > accept just NumPy arrays or, if they want to support multiple
> > > kinds
> > > of
> > > arrays, check if the incoming array object supports the array API
> > > standard
> > > by checking for ``__array_namespace__`` as shown in the example
> > > above.
> > >
> > > Existing libraries can do such a check as well, and only call
> > > ``asarray`` if
> > > the check fails. This is very similar to the ``__duckarray__``
> > > idea
> > > in
> > > :ref:`NEP30`.
> > >
> > >
> > > .. _adoption-application-code:
> > >
> > > Adoption in application code
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > >
> > > The new namespace can be seen by end users as a cleaned up and
> > > slimmed down
> > > version of NumPy's main namespace. Encouraging end users to use
> > > this
> > > namespace like::
> > >
> > >     import numpy.array_api as xp
> > >
> > >     x = xp.linspace(0, 2*xp.pi, num=100)
> > >     y = xp.cos(x)
> > >
> > > seems perfectly reasonable, and potentially beneficial - users
> > > get
> > > offered
> > > only
> > > one function for each purpose (the one we consider best-
> > > practice),
> > > and they
> > > then write code that is more easily portable to other libraries.
> > >
> > >
> > > Backward compatibility
> > > ----------------------
> > >
> > > No deprecations or removals of existing NumPy APIs or other
> > > backwards
> > > incompatible changes are proposed.
> > >
> > >
> > > High-level design
> > > -----------------
> > >
> > > The array API standard consists of approximately 120 objects, all
> > > of
> > > which
> > > have a direct NumPy equivalent. This figure shows what is
> > > included at
> > > a
> > > high level:
> > >
> > > .. image:: _static/nep-0047-scope-of-array-API.png
> > >
> > > The most important changes compared to what NumPy currently
> > > offers
> > > are:
> > >
> > > - A new array object which:
> > >
> > >     - conforms to the casting rules and indexing behaviour
> > > specified
> > > by the
> > >       standard,
> > >     - does not have methods other than dunder methods,
> > >     - does not support the full range of NumPy indexing
> > > behaviour.
> > > Advanced
> > >       indexing with integers is not supported. Only boolean
> > > indexing
> > >       with a single (possibly multi-dimensional) boolean array is
> > > supported.
> > >       An indexing expression that selects a single element
> > > returns a
> > > 0-D
> > > array
> > >       rather than a scalar.
> > >
> > > - Functions in the ``array_api`` namespace:
> > >
> > >     - do not accept ``array_like`` inputs, only NumPy arrays and
> > > Python
> > > scalars
> > >     - do not support ``__array_ufunc__`` and
> > > ``__array_function__``,
> > >     - use positional-only and keyword-only parameters in their
> > > signatures,
> > >     - have inline type annotations,
> > >     - may have minor changes to signatures and semantics of
> > > individual
> > >       functions compared to their equivalents already present in
> > > NumPy,
> > >     - only support dtype literals, not format strings or other
> > > ways
> > > of
> > >       specifying dtypes
> > >
> > > - DLPack_ support will be added to NumPy,
> > > - New syntax for "device support" will be added, through a
> > > ``.device``
> > >   attribute on the new array object, and ``device=`` keywords in
> > > array
> > > creation
> > >   functions in the ``array_api`` namespace,
> > > - Casting rules that differ from those NumPy currently has.
> > > Output
> > > dtypes
> > > can
> > >   be derived from input dtypes (i.e. no value-based casting), and
> > > 0-D
> > > arrays
> > >   are treated like >=1-D arrays.
> > > - Not all dtypes NumPy has are part of the standard. Only
> > > boolean,
> > > signed
> > > and
> > >   unsigned integers, and floating-point dtypes up to ``float64``
> > > are
> > > supported.
> > >   Complex dtypes are expected to be added in the next version of
> > > the
> > > standard.
> > >   Extended precision, string, void, object and datetime dtypes,
> > > as
> > > well as
> > >   structured dtypes, are not included.
> > >
> > > Improvements to existing NumPy functionality that are needed
> > > include:
> > >
> > > - Add support for stacks of matrices to some functions in
> > > ``numpy.linalg``
> > >   that are currently missing such support.
> > > - Add the ``keepdims`` keyword to ``np.argmin`` and
> > > ``np.argmax``.
> > > - Add a "never copy" mode to ``np.asarray``.
> > >
> > >
> > > Functions in the ``array_api`` namespace
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > >
> > > Let's start with an example of a function implementation that
> > > shows
> > > the most
> > > important differences with the equivalent function in the main
> > > namespace::
> > >
> > >     def max(x: array, /, *,
> > >             axis: Optional[Union[int, Tuple[int, ...]]] = None,
> > >             keepdims: bool = False
> > >         ) -> array:
> > >         """
> > >         Array API compatible wrapper for :py:func:`np.max
> > > <numpy.max>`.
> > >         """
> > >         return np.max._implementation(x, axis=axis,
> > > keepdims=keepdims)
> > >
> > > This function does not accept ``array_like`` inputs, only
> > > ``ndarray``. There
> > > are multiple reasons for this. Other array libraries all work
> > > like
> > > this.
> > > Letting the user do coercion of lists, generators, or other
> > > foreign
> > > objects
> > > separately results in a cleaner design with less unexpected
> > > behaviour.
> > > It's higher-performance - less overhead from ``asarray`` calls.
> > > Static
> > > typing
> > > is easier. Subclasses will work as expected. And the slight
> > > increase
> > > in
> > > verbosity
> > > because users have to explicitly coerce to ``ndarray`` on rare
> > > occasions
> > > seems like a small price to pay.
> > >
> > > This function does not support ``__array_ufunc__`` nor
> > > ``__array_function__``.
> > > These protocols serve a similar purpose as the array API standard
> > > module
> > > itself,
> > > but through a different mechanisms. Because only ``ndarray``
> > > instances are
> > > accepted,
> > > dispatching via one of these protocols isn't useful anymore.
> > >
> > > This function uses positional-only parameters in its signature.
> > > This
> > > makes
> > > code
> > > more portable - writing ``max(x=x, ...)`` is no longer valid,
> > > hence
> > > if other
> > > libraries call the first parameter ``input`` rather than ``x``,
> > > that
> > > is
> > > fine.
> > > The rationale for keyword-only parameters (not shown in the above
> > > example)
> > > is
> > > two-fold: clarity of end user code, and it being easier to extend
> > > the
> > > signature
> > > in the future with keywords in the desired order.
> > >
> > > This function has inline type annotations. Inline annotations are
> > > far
> > > easier to
> > > maintain than separate stub files. And because the types are
> > > simple,
> > > this
> > > will
> > > not result in a large amount of clutter with type aliases or
> > > unions
> > > like in
> > > the
> > > current stub files NumPy has.
> > >
> > >
> > > DLPack support for zero-copy data interchange
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > >
> > > The ability to convert one kind of array into another kind is
> > > valuable, and
> > > indeed necessary when downstream libraries want to support
> > > multiple
> > > kinds of
> > > arrays. This requires a well-specified data exchange protocol.
> > > NumPy
> > > already
> > > supports two of these, namely the buffer protocol (i.e., PEP
> > > 3118),
> > > and
> > > the ``__array_interface__`` (Python side) / ``__array_struct__``
> > > (C
> > > side)
> > > protocol. Both work similarly, letting the "producer" describe
> > > how
> > > the data
> > > is laid out in memory so the "consumer" can construct its own
> > > kind of
> > > array
> > > with a view on that data.
> > >
> > > DLPack works in a very similar way. The main reasons to prefer
> > > DLPack
> > > over
> > > the options already present in NumPy are:
> > >
> > > 1. DLPack is the only protocol with device support (e.g., GPUs
> > > using
> > > CUDA or
> > >    ROCm drivers, or OpenCL devices). NumPy is CPU-only, but other
> > > array
> > >    libraries are not. Having one protocol per device isn't
> > > tenable,
> > > hence
> > >    device support is a must.
> > > 2. Widespread support. DLPack has the widest adoption of all
> > > protocols, only
> > >    NumPy is missing support. And the experiences of other
> > > libraries
> > > with it
> > >    are positive. This contrasts with the protocols NumPy does
> > > support, which
> > >    are used very little - when other libraries want to
> > > interoperate
> > > with
> > >    NumPy, they typically use the (more limited, and NumPy-
> > > specific)
> > >    ``__array__`` protocol.
> > >
> > > Adding support for DLPack to NumPy entails:
> > >
> > > - Adding a ``ndarray.__dlpack__`` method
> > > - Adding a ``from_dlpack`` function, which takes as input an
> > > object
> > >   supporting ``__dlpack__``, and returns an ``ndarray``.
> > >
> > > DLPack is currently a ~200 LoC header, and is meant to be
> > > included
> > > directly, so
> > > no external dependency is needed. Implementation should be
> > > straightforward.
> > >
> > >
> > > Syntax for device support
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~
> > >
> > > NumPy itself is CPU-only, so it clearly doesn't have a need for
> > > device
> > > support.
> > > However, other libraries (e.g. TensorFlow, PyTorch, JAX, MXNet)
> > > support
> > > multiple types of devices: CPU, GPU, TPU, and more exotic
> > > hardware.
> > > To write portable code on systems with multiple devices, it's
> > > often
> > > necessary
> > > to create new arrays on the same device as some other array, or
> > > check
> > > that
> > > two arrays live on the same device. Hence syntax for that is
> > > needed.
> > >
> > > The array object will have a ``.device`` attribute which enables
> > > comparing
> > > devices of different arrays (they only should compare equal if
> > > both
> > > arrays
> > > are
> > > from the same library and it's the same hardware device).
> > > Furthermore,
> > > ``device=`` keywords in array creation functions are needed. For
> > > example::
> > >
> > >     def empty(shape: Union[int, Tuple[int, ...]], /, *,
> > >               dtype: Optional[dtype] = None,
> > >               device: Optional[device] = None) -> array:
> > >         """
> > >         Array API compatible wrapper for :py:func:`np.empty
> > > <numpy.empty>`.
> > >         """
> > >         return np.empty(shape, dtype=dtype, device=device)
> > >
> > > The implementation for NumPy may be as simple as setting the
> > > device
> > > attribute to
> > > the string ``'cpu'`` and raising an exception if array creation
> > > functions
> > > encounter any other value.
> > >
> > >
> > > Dtypes and casting rules
> > > ~~~~~~~~~~~~~~~~~~~~~~~~
> > >
> > > The supported dtypes in this namespace are boolean, 8/16/32/64-
> > > bit
> > > signed
> > > and
> > > unsigned integer, and 32/64-bit floating-point dtypes. These will
> > > be
> > > added
> > > to
> > > the namespace as dtype literals with the expected names (e.g.,
> > > ``bool``,
> > > ``uint16``, ``float64``).
> > >
> > > The most obvious omissions are the complex dtypes. The rationale
> > > for
> > > the
> > > lack
> > > of complex support in the first version of the array API standard
> > > is
> > > that
> > > several
> > > libraries (PyTorch, MXNet) are still in the process of adding
> > > support
> > > for
> > > complex dtypes. The next version of the standard is expected to
> > > include
> > > ``complex64``
> > > and ``complex128`` (see `this issue <
> > > https://github.com/data-apis/array-api/issues/102>`__
> > > for more details).
> > >
> > > Specifying dtypes to functions, e.g. via the ``dtype=`` keyword,
> > > is
> > > expected
> > > to only use the dtype literals. Format strings, Python builtin
> > > dtypes, or
> > > string representations of the dtype literals are not accepted -
> > > this
> > > will
> > > improve readability and portability of code at little cost.
> > >
> > > Casting rules are only defined between different dtypes of the
> > > same
> > > kind.
> > > The
> > > rationale for this is that mixed-kind (e.g., integer to floating-
> > > point)
> > > casting behavior differs between libraries. NumPy's mixed-kind
> > > casting
> > > behavior doesn't need to be changed or restricted, it only needs
> > > to
> > > be
> > > documented that if users use mixed-kind casting, their code may
> > > not
> > > be
> > > portable.
> > >
> > > .. image:: _static/nep-0047-casting-rules-lattice.png
> > >
> > > *Type promotion diagram. Promotion between any two types is given
> > > by
> > > their
> > > join on this lattice. Only the types of participating arrays
> > > matter,
> > > not
> > > their values. Dashed lines indicate that behaviour for Python
> > > scalars
> > > is
> > > undefined on overflow. Boolean, integer and floating-point dtypes
> > > are
> > > not
> > > connected, indicating mixed-kind promotion is undefined.*
> > >
> > > The most important difference between the casting rules in NumPy
> > > and
> > > in the
> > > array API standard is how scalars and 0-dimensional arrays are
> > > handled. In
> > > the standard, array scalars do not exist and 0-dimensional arrays
> > > follow the
> > > same casting rules as higher-dimensional arrays.
> > >
> > > See the `Type Promotion Rules section of the array API standard <
> > >
> >    
> > https://data-apis.github.io/array-api/latest/API_specification/type_promotion.html
> > > > `__
> > > for more details.
> > >
> > > .. note::
> > >
> > >     It is not clear what the best way is to support the different
> > > casting
> > > rules
> > >     for 0-dimensional arrays and no value-based casting. One
> > > option
> > > may be
> > > to
> > >     implement this second set of casting rules, keep them
> > > private,
> > > mark the
> > >     array API functions with a private attribute that says they
> > > adhere to
> > >     these different rules, and let the casting machinery check
> > > whether for
> > >     that attribute.
> > >
> > >     This needs discussion.
> > >
> > >
> > > Indexing
> > > ~~~~~~~~
> > >
> > > An indexing expression that would return a scalar with
> > > ``ndarray``,
> > > e.g.
> > > ``arr_2d[0, 0]``, will return a 0-D array with the new array
> > > object.
> > > There
> > > are
> > > several reasons for that: array scalars are largely considered a
> > > design
> > > mistake
> > > which no other array library copied; it works better for non-CPU
> > > libraries
> > > (typically arrays can live on the device, scalars live on the
> > > host);
> > > and
> > > it's
> > > simply a consistent design. To get a Python scalar out of a 0-D
> > > array, one
> > > can
> > > simply use the builtin for the type, e.g. ``float(arr_0d)``.
> > >
> > > The other `indexing modes in the standard <
> > >
> >    
> > https://data-apis.github.io/array-api/latest/API_specification/indexing.html
> > > > `__
> > > do work largely the same as they do for ``numpy.ndarray``. One
> > > noteworthy
> > > difference is that clipping in slice indexing (e.g., ``a[:n]``
> > > where
> > > ``n``
> > > is
> > > larger than the size of the first axis) is unspecified behaviour,
> > > because
> > > that kind of check can be expensive on accelerators.
> > >
> > > The lack of advanced indexing, and boolean indexing being limited
> > > to
> > > a
> > > single
> > > n-D boolean array, is due to those indexing modes not being
> > > suitable
> > > for all
> > > types of arrays or JIT compilation. Their absence does not seem
> > > to be
> > > problematic; if a user or library author wants to use them, they
> > > can
> > > do so
> > > through zero-copy conversion to ``numpy.ndarray``. This will
> > > signal
> > > correctly
> > > to whomever reads the code that it is then NumPy-specific rather
> > > than
> > > portable
> > > to all conforming array types.
> > >
> > >
> > >
> > > The array object
> > > ~~~~~~~~~~~~~~~~
> > >
> > > The array object in the standard does not have methods other than
> > > dunder
> > > methods. The rationale for that is that not all array libraries
> > > have
> > > methods
> > > on their array object (e.g., TensorFlow does not). It also
> > > provides
> > > only a
> > > single way of doing something, rather than have functions and
> > > methods
> > > that
> > > are effectively duplicate.
> > >
> > > Mixing operations that may produce views (e.g., indexing,
> > > ``nonzero``)
> > > in combination with mutation (e.g., item or slice assignment) is
> > > `explicitly documented in the standard to not be supported <
> > >
> >    
> > https://data-apis.github.io/array-api/latest/design_topics/copies_views_and_mutation.html
> > > > `__.
> > > This cannot easily be prohibited in the array object itself;
> > > instead
> > > this
> > > will
> > > be guidance to the user via documentation.
> > >
> > > The standard current does not prescribe a name for the array
> > > object
> > > itself.
> > > We propose to simply name it ``ndarray``. This is the most
> > > obvious
> > > name, and
> > > because of the separate namespace should not clash with
> > > ``numpy.ndarray``.
> > >
> > >
> > > Implementation
> > > --------------
> > >
> > > .. note::
> > >
> > >     This section needs a lot more detail, which will gradually be
> > > added when
> > >     the implementation progresses.
> > >
> > > A prototype of the ``array_api`` namespace can be found in
> > >    
> > > https://github.com/data-apis/numpy/tree/array-api/numpy/_array_api
> > > .
> > > The docstring in its ``__init__.py`` has notes on completeness of
> > > the
> > > implementation. The code for the wrapper functions also contains
> > > ``#
> > > Note:``
> > > comments everywhere there is a difference with the NumPy API.
> > > Two important parts that are not implemented yet are the new
> > > array
> > > object
> > > and
> > > DLPack support. Functions may need changes to ensure the changed
> > > casting
> > > rules
> > > are respected.
> > >
> > > The array object
> > > ~~~~~~~~~~~~~~~~
> > >
> > > Regarding the array object implementation, we plan to start with
> > > a
> > > regular
> > > Python class that wraps a ``numpy.ndarray`` instance. Attributes
> > > and
> > > methods
> > > can forward to that wrapped instance, applying input validation
> > > and
> > > implementing changed behaviour as needed.
> > >
> > > The casting rules are probably the most challenging part. The in-
> > > progress
> > > dtype system refactor (NEPs 40-43) should make implementing the
> > > correct
> > > casting
> > > behaviour easier - it is already moving away from value-based
> > > casting
> > > for
> > > example.
> > >
> > >
> > > The dtype objects
> > > ~~~~~~~~~~~~~~~~~
> > >
> > > We must be able to compare dtypes for equality, and expressions
> > > like
> > > these
> > > must
> > > be possible::
> > >
> > >     np.array_api.some_func(..., dtype=x.dtype)
> > >
> > > The above implies it would be nice to have ``np.array_api.float32
> > > ==
> > > np.array_api.ndarray(...).dtype``.
> > >
> > > Dtypes should not be assumed to have a class hierarchy by users,
> > > however we
> > > are
> > > free to implement it with a class hierarchy if that's convenient.
> > > We
> > > considered
> > > the following options to implement dtype objects:
> > >
> > > 1. Alias dtypes to those in the main namespace. E.g.,
> > > ``np.array_api.float32 =
> > >    np.float32``.
> > > 2. Make the dtypes instances of ``np.dtype``. E.g.,
> > > ``np.array_api.float32 =
> > >    np.dtype(np.float32)``.
> > > 3. Create new singleton classes with only the required
> > > methods/attributes
> > >    (currently just ``__eq__``).
> > >
> > > It seems like (2) would be easiest from the perspective of
> > > interacting with
> > > functions outside the main namespace. And (3) would adhere best
> > > to
> > > the
> > > standard.
> > >
> > > TBD: the standard does not yet have a good way to inspect
> > > properties
> > > of a
> > > dtype, to ask questions like "is this an integer dtype?". Perhaps
> > > this is
> > > easy
> > > enough to do for users, like so::
> > >
> > >     def _get_dtype(dt_or_arr):
> > >         return dt_or_arr.dtype if hasattr(dt_or_arr, 'dtype')
> > > else
> > > dt_or_arr
> > >
> > >     def is_floating(dtype_or_array):
> > >         dtype = _get_dtype(dtype_or_array)
> > >         return dtype in (float32, float64)
> > >
> > >     def is_integer(dtype_or_array):
> > >         dtype = _get_dtype(dtype_or_array)
> > >         return dtype in (uint8, uint16, uint32, uint64, int8,
> > > int16,
> > > int32,
> > > int64)
> > >
> > > However it could make sense to add to the standard. Note that
> > > NumPy
> > > itself
> > > currently does not have a great for asking such questions, see
> > > `gh-17325 <https://github.com/numpy/numpy/issues/17325>`__.
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > [hidden email]
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > [hidden email]
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: NEP: array API standard adoption (NEP 47)

ralfgommers


On Mon, Feb 22, 2021 at 8:52 PM Sebastian Berg <[hidden email]> wrote:
On Mon, 2021-02-22 at 20:16 +0100, Ralf Gommers wrote:

<snip>

> >
> > It seems the current idea is to create a new NumPy array subclass.
> > That
> > sounds good, but I am a bit worried how that is going to interact
> > with
> > actual NumPy arrays.
> >
>
> Not a subclass! If you got that impression, I should clarify the
> text. The

Sorry, you do write "wraps a numpy.ndarray", I am not sure why I got
the subclass idea when reading it yesterday.

But, in that case maybe you should just implement it as:

    def somefunc(x):
        # do whatever checks needed here for, e.g., input validation
        # then call the native numpy implementation:
        x = as_numpy_array(x)
        result = np.somefunc(x)
        return as_minimal(x)

and not even use the `._implementation`? 

That makes sense. It's like doing a double asarray, but you get the fast path in __array_function__

I guess small issue is that
we don't have a way to call `as_ndarray` on all relevant inputs
conveniently.

Why not? The input should be very well-defined, basically just instances of the new array object. Note that you cannot pass lists, generators, or other such types.

  In most cases it will be straight forward though.

I assume you need your own `as_numpy_array` call, to reject ndarray-
subclasses, that `_implementation` will allow to pass through.

> >
> > Related to that: how important is it to keep that namespace a
> > "minimal"
> > implementation, rather than a "conforming" one?  For example, would
> > you
> > want to reject `numpy_api.array([1, 2, 3, 4], dtype="i,i")`? Or
> > just
> > `dtype="complex128"`?
> >
>
> Yes, I'd definitely want to reject that. Format strings are terrible.


Agreed, I guess I am wondering whether we can find a good solution that
does not involve writing stubs around 140 functions with more strict
input validation.
But maybe it is also not particularly difficult or churn to do... Or
even automate, e.g. from the typing stubs.

I think it's easy to do, and better than something "smart". Also note that there are no typing stubs, the type annotations are clean enough that they can be added inline, which is much nicer than stubs.



>
> Maybe I got the wrong impression though.  Is the aim for a minimal
> > implementation, but you are OK as long as it is a conforming one?
> >
>
> In principle we're okay with a conforming one that's a superset of
> what is
> in the standard. But I think we'd only want to do that if creating a
> minimal one turns out to be difficult. Having the minimal required
> set is
> much nicer when one wants to write portable code. Because then you
> can do
> so without checking the docs whether any object/method is in
> "minimal" or
> in "extended".
>

Right, you would like to have a minimal implementation somewhere.
Having it in NumPy could be convenient, although not strictly
necessary.

That was my original thinking - just reuse `np.ndarray`, and have the "minimal" thing as a standalone implementation in a new package. But that's more work, and less nice. After getting used to the idea of a second array object, I'm actually much happier with having it in numpy.



<snip>

>
> > Besides `_implementation` usually does support array subclasses and
> > may
> > still dispatch again internally at this time!
> >
>
> What, it dispatches again? That seems very suboptimal. If there's no
> clean
> way to avoid a dispatch, it may make sense to just check array inputs
> for
> the presence of __array_function__ and raise an exception if it's
> present.
>


I do not think our functions where ever rewritten to only use e.g.
`._implementation()` internally. I am not even quite certain that would
be correct for subclasses.

It is annoying that you may have to struggle with it here to do
something that is different from the implicit dispatchers.  But on the
up-side a clear solution would be helpful in any case.

Agreed.

Cheers,
Ralf
 

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: NEP: array API standard adoption (NEP 47)

ralfgommers
In reply to this post by Sebastian Berg


On Mon, Feb 22, 2021 at 7:49 PM Sebastian Berg <[hidden email]> wrote:
On Sun, 2021-02-21 at 17:30 +0100, Ralf Gommers wrote:
> Hi all,
>
> Here is a NEP, written together with Stephan Hoyer and Aaron Meurer,
> for
> discussion on adoption of the array API standard (
> https://data-apis.github.io/array-api/latest/). This will add a new
> numpy.array_api submodule containing that standardized API. The main
> purpose of this API is to be able to write code that is portable to
> other
> array/tensor libraries like CuPy, PyTorch, JAX, TensorFlow, Dask, and
> MXNet.
>
> We expect this NEP to remain in draft state for quite a while, while
> we're
> gaining experience with using it in downstream libraries, discuss
> adding it
> to other array libraries, and finishing some of the loose ends (e.g.,
> specifications for linear algebra functions that aren't merged yet,
> see
> https://github.com/data-apis/array-api/pulls) in the API standard
> itself.


There is too much to unpack in a day, I hope I did not miss something
particularly important while reading.
Do you have plans to try some of this outside of NumPy, or maybe make a
repo in the numpy org for it?

Sorry, I forgot to answer this question. That is what we're doing now, the current prototype is at https://github.com/data-apis/numpy/tree/array-api/numpy/_array_api. I do expect that as soon we need any changes in C code, that becomes impractical. I think merging as a private submodule (numpy._array_api) makes sense. That will help with WIP PRs to other libraries - then we can use the "test against master" CI for that, rather than having to make a mess injecting things inside CI.

Also, there are a few parts of the NEP that are improvements outside of the new submodule. Not only DLPack, but also consistency in "stacks of matrices" in linalg functions, adding a missing keepdims keyword, the never-copy mode for asarray, and improving the API for inspecting dtype families (https://github.com/numpy/numpy/issues/17325). Those things can all be pushed forward.

Cheers,
Ralf


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: NEP: array API standard adoption (NEP 47)

Sebastian Berg
In reply to this post by ralfgommers
Top Posting, to discuss post specific questions about NEP 47 and
partially the start on implementing it in:

    https://github.com/numpy/numpy/pull/18585

There are probably many more that will crop up. But for me, each of
these is a pretty major difficulty without a clear answer as of now.

1. I still need clarity how a library is supposed to use this namespace
when the user passes in a NumPy array (mentioned before).  The user
must get back a NumPy array after all.  Maybe that is just a decorator,
but it seems important.

2. `np.result_type` special cases array-scalars (the current PR), NEP
47 promises it will not.  The PR could attempt to work around that
using `arr.dtype` int `result_type`, I expect there are more details to
fight with there, but I am not sure.

3. For all other function, the same problem applies. You don't actually
have anything to fix NumPy promotion rules.  You could bake your own
cake here for numeric types, but I am not sure, you might also need NEP
43 in all its promotion power to pull it off.

4. Now that I looked at the above, I do not feel its reasonable to
limit this functionality to numeric dtypes.  If someone uses a NumPy
rational-dtype, why should a SciPy function currently implemented in
pure NumPy reject that?  In other words, I think this is the point
where trying to be "minimal" is counterproductive.

4. The PR makes no attempt at handling binary operators in any way
aside from greedily coercing the other operand.

5. What happens with a mix of array-likes or even array subclasses like
`astropy.quantity`?

6. Is there any provision on how to deal with mixed array-like inputs?
CuPy+numpy, etc.?


I don't think we have to figure out everything up-front, but I do think
there are a few very fundamental questions still open, at least for me
personally.

Cheers,

Sebastian



On Sun, 2021-02-21 at 17:30 +0100, Ralf Gommers wrote:

> Hi all,
>
> Here is a NEP, written together with Stephan Hoyer and Aaron Meurer,
> for
> discussion on adoption of the array API standard (
> https://data-apis.github.io/array-api/latest/). This will add a new
> numpy.array_api submodule containing that standardized API. The main
> purpose of this API is to be able to write code that is portable to
> other
> array/tensor libraries like CuPy, PyTorch, JAX, TensorFlow, Dask, and
> MXNet.
>
> We expect this NEP to remain in draft state for quite a while, while
> we're
> gaining experience with using it in downstream libraries, discuss
> adding it
> to other array libraries, and finishing some of the loose ends (e.g.,
> specifications for linear algebra functions that aren't merged yet,
> see
> https://github.com/data-apis/array-api/pulls) in the API standard
> itself.
>
> See
> https://mail.python.org/pipermail/numpy-discussion/2020-November/081181.html
> for an initial discussion about this topic.
>
> Please keep high-level discussion here and detailed comments on
> https://github.com/numpy/numpy/pull/18456. Also, you can access a
> rendered
> version of the NEP from that PR (see PR description for how), which
> may be
> helpful.
> Cheers,
> Ralf
>
>
> Abstract
> --------
>
> We propose to adopt the `Python array API standard`_, developed by
> the
> `Consortium for Python Data API Standards`_. Implementing this as a
> separate
> new namespace in NumPy will allow authors of libraries which depend
> on NumPy
> as well as end users to write code that is portable between NumPy and
> all
> other array/tensor libraries that adopt this standard.
>
> .. note::
>
>     We expect that this NEP will remain in a draft state for quite a
> while.
>     Given the large scope we don't expect to propose it for
> acceptance any
>     time soon; instead, we want to solicit feedback on both the high-
> level
>     design and implementation, and learn what needs describing better
> in
> this
>     NEP or changing in either the implementation or the array API
> standard
>     itself.
>
>
> Motivation and Scope
> --------------------
>
> Python users have a wealth of choice for libraries and frameworks for
> numerical computing, data science, machine learning, and deep
> learning. New
> frameworks pushing forward the state of the art in these fields are
> appearing
> every year. One unintended consequence of all this activity and
> creativity
> has been fragmentation in multidimensional array (a.k.a. tensor)
> libraries -
> which are the fundamental data structure for these fields. Choices
> include
> NumPy, Tensorflow, PyTorch, Dask, JAX, CuPy, MXNet, and others.
>
> The APIs of each of these libraries are largely similar, but with
> enough
> differences that it’s quite difficult to write code that works with
> multiple
> (or all) of these libraries. The array API standard aims to address
> that
> issue, by specifying an API for the most common ways arrays are
> constructed
> and used. The proposed API is quite similar to NumPy's API, and
> deviates
> mainly
> in places where (a) NumPy made design choices that are inherently not
> portable
> to other implementations, and (b) where other libraries consistently
> deviated
> from NumPy on purpose because NumPy's design turned out to have
> issues or
> unnecessary complexity.
>
> For a longer discussion on the purpose of the array API standard we
> refer to
> the `Purpose and Scope section of the array API standard <
> https://data-apis.github.io/array-api/latest/purpose_and_scope.html>`
> __
> and the two blog posts announcing the formation of the Consortium
> [1]_ and
> the release of the first draft version of the standard for community
> review
> [2]_.
>
> The scope of this NEP includes:
>
> - Adopting the 2021 version of the array API standard
> - Adding a separate namespace, tentatively named ``numpy.array_api``
> - Changes needed/desired outside of the new namespace, for example
> new
> dunder
>   methods on the ``ndarray`` object
> - Implementation choices, and differences between functions in the
> new
>   namespace with those in the main ``numpy`` namespace
> - A new array object conforming to the array API standard
> - Maintenance effort and testing strategy
> - Impact on NumPy's total exposed API surface and on other future and
>   under-discussion design choices
> - Relation to existing and proposed NumPy array protocols
>   (``__array_ufunc__``, ``__array_function__``,
> ``__array_module__``).
> - Required improvements to existing NumPy functionality
>
> Out of scope for this NEP are:
>
> - Changes in the array API standard itself. Those are likely to come
> up
>   during review of this NEP, but should be upstreamed as needed and
> this NEP
>   subsequently updated.
>
>
> Usage and Impact
> ----------------
>
> *This section will be fleshed out later, for now we refer to the use
> cases
> given
> in* `the array API standard Use Cases section <
> https://data-apis.github.io/array-api/latest/use_cases.html>`__
>
> In addition to those use cases, the new namespace contains
> functionality
> that
> is widely used and supported by many array libraries. As such, it is
> a good
> set of functions to teach to newcomers to NumPy and recommend as
> "best
> practice". That contrasts with NumPy's main namespace, which contains
> many
> functions and objects that have been superceded or we consider
> mistakes -
> but
> that we can't remove because of backwards compatibility reasons.
>
> The usage of the ``numpy.array_api`` namespace by downstream
> libraries is
> intended to enable them to consume multiple kinds of arrays, *without
> having
> to have a hard dependency on all of those array libraries*:
>
> .. image:: _static/nep-0047-library-dependencies.png
>
> Adoption in downstream libraries
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> The prototype implementation of the ``array_api`` namespace will be
> used
> with
> SciPy, scikit-learn and other libraries of interest that depend on
> NumPy, in
> order to get more experience with the design and find out if any
> important
> parts are missing.
>
> The pattern to support multiple array libraries is intended to be
> something
> like::
>
>     def somefunc(x, y):
>         # Retrieves standard namespace. Raises if x and y have
> different
>         # namespaces.  See Appendix for possible get_namespace
> implementation
>         xp = get_namespace(x, y)
>         out = xp.mean(x, axis=0) + 2*xp.std(y, axis=0)
>         return out
>
> The ``get_namespace`` call is effectively the library author opting
> in to
> using the standard API namespace, and thereby explicitly supporting
> all conforming array libraries.
>
>
> The ``asarray`` / ``asanyarray`` pattern
> ````````````````````````````````````````
>
> Many existing libraries use the same ``asarray`` (or ``asanyarray``)
> pattern
> as NumPy itself does; accepting any object that can be coerced into a
> ``np.ndarray``.
> We consider this design pattern problematic - keeping in mind the Zen
> of
> Python, *"explicit is better than implicit"*, as well as the pattern
> being
> historically problematic in the SciPy ecosystem for ``ndarray``
> subclasses
> and with over-eager object creation. All other array/tensor libraries
> are
> more strict, and that works out fine in practice. We would advise
> authors of
> new libraries to avoid the ``asarray`` pattern. Instead they should
> either
> accept just NumPy arrays or, if they want to support multiple kinds
> of
> arrays, check if the incoming array object supports the array API
> standard
> by checking for ``__array_namespace__`` as shown in the example
> above.
>
> Existing libraries can do such a check as well, and only call
> ``asarray`` if
> the check fails. This is very similar to the ``__duckarray__`` idea
> in
> :ref:`NEP30`.
>
>
> .. _adoption-application-code:
>
> Adoption in application code
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> The new namespace can be seen by end users as a cleaned up and
> slimmed down
> version of NumPy's main namespace. Encouraging end users to use this
> namespace like::
>
>     import numpy.array_api as xp
>
>     x = xp.linspace(0, 2*xp.pi, num=100)
>     y = xp.cos(x)
>
> seems perfectly reasonable, and potentially beneficial - users get
> offered
> only
> one function for each purpose (the one we consider best-practice),
> and they
> then write code that is more easily portable to other libraries.
>
>
> Backward compatibility
> ----------------------
>
> No deprecations or removals of existing NumPy APIs or other backwards
> incompatible changes are proposed.
>
>
> High-level design
> -----------------
>
> The array API standard consists of approximately 120 objects, all of
> which
> have a direct NumPy equivalent. This figure shows what is included at
> a
> high level:
>
> .. image:: _static/nep-0047-scope-of-array-API.png
>
> The most important changes compared to what NumPy currently offers
> are:
>
> - A new array object which:
>
>     - conforms to the casting rules and indexing behaviour specified
> by the
>       standard,
>     - does not have methods other than dunder methods,
>     - does not support the full range of NumPy indexing behaviour.
> Advanced
>       indexing with integers is not supported. Only boolean indexing
>       with a single (possibly multi-dimensional) boolean array is
> supported.
>       An indexing expression that selects a single element returns a
> 0-D
> array
>       rather than a scalar.
>
> - Functions in the ``array_api`` namespace:
>
>     - do not accept ``array_like`` inputs, only NumPy arrays and
> Python
> scalars
>     - do not support ``__array_ufunc__`` and ``__array_function__``,
>     - use positional-only and keyword-only parameters in their
> signatures,
>     - have inline type annotations,
>     - may have minor changes to signatures and semantics of
> individual
>       functions compared to their equivalents already present in
> NumPy,
>     - only support dtype literals, not format strings or other ways
> of
>       specifying dtypes
>
> - DLPack_ support will be added to NumPy,
> - New syntax for "device support" will be added, through a
> ``.device``
>   attribute on the new array object, and ``device=`` keywords in
> array
> creation
>   functions in the ``array_api`` namespace,
> - Casting rules that differ from those NumPy currently has. Output
> dtypes
> can
>   be derived from input dtypes (i.e. no value-based casting), and 0-D
> arrays
>   are treated like >=1-D arrays.
> - Not all dtypes NumPy has are part of the standard. Only boolean,
> signed
> and
>   unsigned integers, and floating-point dtypes up to ``float64`` are
> supported.
>   Complex dtypes are expected to be added in the next version of the
> standard.
>   Extended precision, string, void, object and datetime dtypes, as
> well as
>   structured dtypes, are not included.
>
> Improvements to existing NumPy functionality that are needed include:
>
> - Add support for stacks of matrices to some functions in
> ``numpy.linalg``
>   that are currently missing such support.
> - Add the ``keepdims`` keyword to ``np.argmin`` and ``np.argmax``.
> - Add a "never copy" mode to ``np.asarray``.
>
>
> Functions in the ``array_api`` namespace
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Let's start with an example of a function implementation that shows
> the most
> important differences with the equivalent function in the main
> namespace::
>
>     def max(x: array, /, *,
>             axis: Optional[Union[int, Tuple[int, ...]]] = None,
>             keepdims: bool = False
>         ) -> array:
>         """
>         Array API compatible wrapper for :py:func:`np.max
> <numpy.max>`.
>         """
>         return np.max._implementation(x, axis=axis,
> keepdims=keepdims)
>
> This function does not accept ``array_like`` inputs, only
> ``ndarray``. There
> are multiple reasons for this. Other array libraries all work like
> this.
> Letting the user do coercion of lists, generators, or other foreign
> objects
> separately results in a cleaner design with less unexpected
> behaviour.
> It's higher-performance - less overhead from ``asarray`` calls.
> Static
> typing
> is easier. Subclasses will work as expected. And the slight increase
> in
> verbosity
> because users have to explicitly coerce to ``ndarray`` on rare
> occasions
> seems like a small price to pay.
>
> This function does not support ``__array_ufunc__`` nor
> ``__array_function__``.
> These protocols serve a similar purpose as the array API standard
> module
> itself,
> but through a different mechanisms. Because only ``ndarray``
> instances are
> accepted,
> dispatching via one of these protocols isn't useful anymore.
>
> This function uses positional-only parameters in its signature. This
> makes
> code
> more portable - writing ``max(x=x, ...)`` is no longer valid, hence
> if other
> libraries call the first parameter ``input`` rather than ``x``, that
> is
> fine.
> The rationale for keyword-only parameters (not shown in the above
> example)
> is
> two-fold: clarity of end user code, and it being easier to extend the
> signature
> in the future with keywords in the desired order.
>
> This function has inline type annotations. Inline annotations are far
> easier to
> maintain than separate stub files. And because the types are simple,
> this
> will
> not result in a large amount of clutter with type aliases or unions
> like in
> the
> current stub files NumPy has.
>
>
> DLPack support for zero-copy data interchange
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> The ability to convert one kind of array into another kind is
> valuable, and
> indeed necessary when downstream libraries want to support multiple
> kinds of
> arrays. This requires a well-specified data exchange protocol. NumPy
> already
> supports two of these, namely the buffer protocol (i.e., PEP 3118),
> and
> the ``__array_interface__`` (Python side) / ``__array_struct__`` (C
> side)
> protocol. Both work similarly, letting the "producer" describe how
> the data
> is laid out in memory so the "consumer" can construct its own kind of
> array
> with a view on that data.
>
> DLPack works in a very similar way. The main reasons to prefer DLPack
> over
> the options already present in NumPy are:
>
> 1. DLPack is the only protocol with device support (e.g., GPUs using
> CUDA or
>    ROCm drivers, or OpenCL devices). NumPy is CPU-only, but other
> array
>    libraries are not. Having one protocol per device isn't tenable,
> hence
>    device support is a must.
> 2. Widespread support. DLPack has the widest adoption of all
> protocols, only
>    NumPy is missing support. And the experiences of other libraries
> with it
>    are positive. This contrasts with the protocols NumPy does
> support, which
>    are used very little - when other libraries want to interoperate
> with
>    NumPy, they typically use the (more limited, and NumPy-specific)
>    ``__array__`` protocol.
>
> Adding support for DLPack to NumPy entails:
>
> - Adding a ``ndarray.__dlpack__`` method
> - Adding a ``from_dlpack`` function, which takes as input an object
>   supporting ``__dlpack__``, and returns an ``ndarray``.
>
> DLPack is currently a ~200 LoC header, and is meant to be included
> directly, so
> no external dependency is needed. Implementation should be
> straightforward.
>
>
> Syntax for device support
> ~~~~~~~~~~~~~~~~~~~~~~~~~
>
> NumPy itself is CPU-only, so it clearly doesn't have a need for
> device
> support.
> However, other libraries (e.g. TensorFlow, PyTorch, JAX, MXNet)
> support
> multiple types of devices: CPU, GPU, TPU, and more exotic hardware.
> To write portable code on systems with multiple devices, it's often
> necessary
> to create new arrays on the same device as some other array, or check
> that
> two arrays live on the same device. Hence syntax for that is needed.
>
> The array object will have a ``.device`` attribute which enables
> comparing
> devices of different arrays (they only should compare equal if both
> arrays
> are
> from the same library and it's the same hardware device).
> Furthermore,
> ``device=`` keywords in array creation functions are needed. For
> example::
>
>     def empty(shape: Union[int, Tuple[int, ...]], /, *,
>               dtype: Optional[dtype] = None,
>               device: Optional[device] = None) -> array:
>         """
>         Array API compatible wrapper for :py:func:`np.empty
> <numpy.empty>`.
>         """
>         return np.empty(shape, dtype=dtype, device=device)
>
> The implementation for NumPy may be as simple as setting the device
> attribute to
> the string ``'cpu'`` and raising an exception if array creation
> functions
> encounter any other value.
>
>
> Dtypes and casting rules
> ~~~~~~~~~~~~~~~~~~~~~~~~
>
> The supported dtypes in this namespace are boolean, 8/16/32/64-bit
> signed
> and
> unsigned integer, and 32/64-bit floating-point dtypes. These will be
> added
> to
> the namespace as dtype literals with the expected names (e.g.,
> ``bool``,
> ``uint16``, ``float64``).
>
> The most obvious omissions are the complex dtypes. The rationale for
> the
> lack
> of complex support in the first version of the array API standard is
> that
> several
> libraries (PyTorch, MXNet) are still in the process of adding support
> for
> complex dtypes. The next version of the standard is expected to
> include
> ``complex64``
> and ``complex128`` (see `this issue <
> https://github.com/data-apis/array-api/issues/102>`__
> for more details).
>
> Specifying dtypes to functions, e.g. via the ``dtype=`` keyword, is
> expected
> to only use the dtype literals. Format strings, Python builtin
> dtypes, or
> string representations of the dtype literals are not accepted - this
> will
> improve readability and portability of code at little cost.
>
> Casting rules are only defined between different dtypes of the same
> kind.
> The
> rationale for this is that mixed-kind (e.g., integer to floating-
> point)
> casting behavior differs between libraries. NumPy's mixed-kind
> casting
> behavior doesn't need to be changed or restricted, it only needs to
> be
> documented that if users use mixed-kind casting, their code may not
> be
> portable.
>
> .. image:: _static/nep-0047-casting-rules-lattice.png
>
> *Type promotion diagram. Promotion between any two types is given by
> their
> join on this lattice. Only the types of participating arrays matter,
> not
> their values. Dashed lines indicate that behaviour for Python scalars
> is
> undefined on overflow. Boolean, integer and floating-point dtypes are
> not
> connected, indicating mixed-kind promotion is undefined.*
>
> The most important difference between the casting rules in NumPy and
> in the
> array API standard is how scalars and 0-dimensional arrays are
> handled. In
> the standard, array scalars do not exist and 0-dimensional arrays
> follow the
> same casting rules as higher-dimensional arrays.
>
> See the `Type Promotion Rules section of the array API standard <
> https://data-apis.github.io/array-api/latest/API_specification/type_promotion.html
> > `__
> for more details.
>
> .. note::
>
>     It is not clear what the best way is to support the different
> casting
> rules
>     for 0-dimensional arrays and no value-based casting. One option
> may be
> to
>     implement this second set of casting rules, keep them private,
> mark the
>     array API functions with a private attribute that says they
> adhere to
>     these different rules, and let the casting machinery check
> whether for
>     that attribute.
>
>     This needs discussion.
>
>
> Indexing
> ~~~~~~~~
>
> An indexing expression that would return a scalar with ``ndarray``,
> e.g.
> ``arr_2d[0, 0]``, will return a 0-D array with the new array object.
> There
> are
> several reasons for that: array scalars are largely considered a
> design
> mistake
> which no other array library copied; it works better for non-CPU
> libraries
> (typically arrays can live on the device, scalars live on the host);
> and
> it's
> simply a consistent design. To get a Python scalar out of a 0-D
> array, one
> can
> simply use the builtin for the type, e.g. ``float(arr_0d)``.
>
> The other `indexing modes in the standard <
> https://data-apis.github.io/array-api/latest/API_specification/indexing.html
> > `__
> do work largely the same as they do for ``numpy.ndarray``. One
> noteworthy
> difference is that clipping in slice indexing (e.g., ``a[:n]`` where
> ``n``
> is
> larger than the size of the first axis) is unspecified behaviour,
> because
> that kind of check can be expensive on accelerators.
>
> The lack of advanced indexing, and boolean indexing being limited to
> a
> single
> n-D boolean array, is due to those indexing modes not being suitable
> for all
> types of arrays or JIT compilation. Their absence does not seem to be
> problematic; if a user or library author wants to use them, they can
> do so
> through zero-copy conversion to ``numpy.ndarray``. This will signal
> correctly
> to whomever reads the code that it is then NumPy-specific rather than
> portable
> to all conforming array types.
>
>
>
> The array object
> ~~~~~~~~~~~~~~~~
>
> The array object in the standard does not have methods other than
> dunder
> methods. The rationale for that is that not all array libraries have
> methods
> on their array object (e.g., TensorFlow does not). It also provides
> only a
> single way of doing something, rather than have functions and methods
> that
> are effectively duplicate.
>
> Mixing operations that may produce views (e.g., indexing,
> ``nonzero``)
> in combination with mutation (e.g., item or slice assignment) is
> `explicitly documented in the standard to not be supported <
> https://data-apis.github.io/array-api/latest/design_topics/copies_views_and_mutation.html
> > `__.
> This cannot easily be prohibited in the array object itself; instead
> this
> will
> be guidance to the user via documentation.
>
> The standard current does not prescribe a name for the array object
> itself.
> We propose to simply name it ``ndarray``. This is the most obvious
> name, and
> because of the separate namespace should not clash with
> ``numpy.ndarray``.
>
>
> Implementation
> --------------
>
> .. note::
>
>     This section needs a lot more detail, which will gradually be
> added when
>     the implementation progresses.
>
> A prototype of the ``array_api`` namespace can be found in
> https://github.com/data-apis/numpy/tree/array-api/numpy/_array_api.
> The docstring in its ``__init__.py`` has notes on completeness of the
> implementation. The code for the wrapper functions also contains ``#
> Note:``
> comments everywhere there is a difference with the NumPy API.
> Two important parts that are not implemented yet are the new array
> object
> and
> DLPack support. Functions may need changes to ensure the changed
> casting
> rules
> are respected.
>
> The array object
> ~~~~~~~~~~~~~~~~
>
> Regarding the array object implementation, we plan to start with a
> regular
> Python class that wraps a ``numpy.ndarray`` instance. Attributes and
> methods
> can forward to that wrapped instance, applying input validation and
> implementing changed behaviour as needed.
>
> The casting rules are probably the most challenging part. The in-
> progress
> dtype system refactor (NEPs 40-43) should make implementing the
> correct
> casting
> behaviour easier - it is already moving away from value-based casting
> for
> example.
>
>
> The dtype objects
> ~~~~~~~~~~~~~~~~~
>
> We must be able to compare dtypes for equality, and expressions like
> these
> must
> be possible::
>
>     np.array_api.some_func(..., dtype=x.dtype)
>
> The above implies it would be nice to have ``np.array_api.float32 ==
> np.array_api.ndarray(...).dtype``.
>
> Dtypes should not be assumed to have a class hierarchy by users,
> however we
> are
> free to implement it with a class hierarchy if that's convenient. We
> considered
> the following options to implement dtype objects:
>
> 1. Alias dtypes to those in the main namespace. E.g.,
> ``np.array_api.float32 =
>    np.float32``.
> 2. Make the dtypes instances of ``np.dtype``. E.g.,
> ``np.array_api.float32 =
>    np.dtype(np.float32)``.
> 3. Create new singleton classes with only the required
> methods/attributes
>    (currently just ``__eq__``).
>
> It seems like (2) would be easiest from the perspective of
> interacting with
> functions outside the main namespace. And (3) would adhere best to
> the
> standard.
>
> TBD: the standard does not yet have a good way to inspect properties
> of a
> dtype, to ask questions like "is this an integer dtype?". Perhaps
> this is
> easy
> enough to do for users, like so::
>
>     def _get_dtype(dt_or_arr):
>         return dt_or_arr.dtype if hasattr(dt_or_arr, 'dtype') else
> dt_or_arr
>
>     def is_floating(dtype_or_array):
>         dtype = _get_dtype(dtype_or_array)
>         return dtype in (float32, float64)
>
>     def is_integer(dtype_or_array):
>         dtype = _get_dtype(dtype_or_array)
>         return dtype in (uint8, uint16, uint32, uint64, int8, int16,
> int32,
> int64)
>
> However it could make sense to add to the standard. Note that NumPy
> itself
> currently does not have a great for asking such questions, see
> `gh-17325 <https://github.com/numpy/numpy/issues/17325>`__.
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: NEP: array API standard adoption (NEP 47)

Aaron Meurer
On Wed, Mar 10, 2021 at 10:42 AM Sebastian Berg
<[hidden email]> wrote:

>
> Top Posting, to discuss post specific questions about NEP 47 and
> partially the start on implementing it in:
>
>     https://github.com/numpy/numpy/pull/18585
>
> There are probably many more that will crop up. But for me, each of
> these is a pretty major difficulty without a clear answer as of now.
>
> 1. I still need clarity how a library is supposed to use this namespace
> when the user passes in a NumPy array (mentioned before).  The user
> must get back a NumPy array after all.  Maybe that is just a decorator,
> but it seems important.
>
> 2. `np.result_type` special cases array-scalars (the current PR), NEP
> 47 promises it will not.  The PR could attempt to work around that
> using `arr.dtype` int `result_type`, I expect there are more details to
> fight with there, but I am not sure.

The idea is to work around it everywhere, so that it follows the rules
in the spec (no array scalars, no value-based casting). I haven't
started it yet, though, so I don't know yet how hard it will be. If it
ends up being too hard we could put it in the same camp as device
support and dlpack support where it needs some basic implementation in
numpy itself first before we can properly do it in the array API
namespace.

>
> 3. For all other function, the same problem applies. You don't actually
> have anything to fix NumPy promotion rules.  You could bake your own
> cake here for numeric types, but I am not sure, you might also need NEP
> 43 in all its promotion power to pull it off.
>
> 4. Now that I looked at the above, I do not feel its reasonable to
> limit this functionality to numeric dtypes.  If someone uses a NumPy
> rational-dtype, why should a SciPy function currently implemented in
> pure NumPy reject that?  In other words, I think this is the point
> where trying to be "minimal" is counterproductive.

The idea of minimality is to make it so users can be sure they will be
able to use other libraries, once they also have array API compliant
namespaces. A rational-dtype wouldn't ever be implemented in those
other libraries, because it isn't part of the standard, so if a user
is using those, that is a sign they are using things that aren't in
the array API, so they can't expect to be able to swap out their
dtypes. If a user wants to use something that's only in NumPy, then
they should just use NumPy.

>
> 4. The PR makes no attempt at handling binary operators in any way
> aside from greedily coercing the other operand.
>
> 5. What happens with a mix of array-likes or even array subclasses like
> `astropy.quantity`?
>
> 6. Is there any provision on how to deal with mixed array-like inputs?
> CuPy+numpy, etc.?

Neither of these are defined in the spec. The spec only deals with
staying inside of the compliant namespace. It doesn't require any
behavior mixing things from other namespaces. That's generally
considered a much harder problem, and there is the data interchange
protocol to deal with it
(https://data-apis.github.io/array-api/latest/design_topics/data_interchange.html).

Aaron Meurer

>
>
> I don't think we have to figure out everything up-front, but I do think
> there are a few very fundamental questions still open, at least for me
> personally.
>
> Cheers,
>
> Sebastian
>
>
>
> On Sun, 2021-02-21 at 17:30 +0100, Ralf Gommers wrote:
> > Hi all,
> >
> > Here is a NEP, written together with Stephan Hoyer and Aaron Meurer,
> > for
> > discussion on adoption of the array API standard (
> > https://data-apis.github.io/array-api/latest/). This will add a new
> > numpy.array_api submodule containing that standardized API. The main
> > purpose of this API is to be able to write code that is portable to
> > other
> > array/tensor libraries like CuPy, PyTorch, JAX, TensorFlow, Dask, and
> > MXNet.
> >
> > We expect this NEP to remain in draft state for quite a while, while
> > we're
> > gaining experience with using it in downstream libraries, discuss
> > adding it
> > to other array libraries, and finishing some of the loose ends (e.g.,
> > specifications for linear algebra functions that aren't merged yet,
> > see
> > https://github.com/data-apis/array-api/pulls) in the API standard
> > itself.
> >
> > See
> > https://mail.python.org/pipermail/numpy-discussion/2020-November/081181.html
> > for an initial discussion about this topic.
> >
> > Please keep high-level discussion here and detailed comments on
> > https://github.com/numpy/numpy/pull/18456. Also, you can access a
> > rendered
> > version of the NEP from that PR (see PR description for how), which
> > may be
> > helpful.
> > Cheers,
> > Ralf
> >
> >
> > Abstract
> > --------
> >
> > We propose to adopt the `Python array API standard`_, developed by
> > the
> > `Consortium for Python Data API Standards`_. Implementing this as a
> > separate
> > new namespace in NumPy will allow authors of libraries which depend
> > on NumPy
> > as well as end users to write code that is portable between NumPy and
> > all
> > other array/tensor libraries that adopt this standard.
> >
> > .. note::
> >
> >     We expect that this NEP will remain in a draft state for quite a
> > while.
> >     Given the large scope we don't expect to propose it for
> > acceptance any
> >     time soon; instead, we want to solicit feedback on both the high-
> > level
> >     design and implementation, and learn what needs describing better
> > in
> > this
> >     NEP or changing in either the implementation or the array API
> > standard
> >     itself.
> >
> >
> > Motivation and Scope
> > --------------------
> >
> > Python users have a wealth of choice for libraries and frameworks for
> > numerical computing, data science, machine learning, and deep
> > learning. New
> > frameworks pushing forward the state of the art in these fields are
> > appearing
> > every year. One unintended consequence of all this activity and
> > creativity
> > has been fragmentation in multidimensional array (a.k.a. tensor)
> > libraries -
> > which are the fundamental data structure for these fields. Choices
> > include
> > NumPy, Tensorflow, PyTorch, Dask, JAX, CuPy, MXNet, and others.
> >
> > The APIs of each of these libraries are largely similar, but with
> > enough
> > differences that it’s quite difficult to write code that works with
> > multiple
> > (or all) of these libraries. The array API standard aims to address
> > that
> > issue, by specifying an API for the most common ways arrays are
> > constructed
> > and used. The proposed API is quite similar to NumPy's API, and
> > deviates
> > mainly
> > in places where (a) NumPy made design choices that are inherently not
> > portable
> > to other implementations, and (b) where other libraries consistently
> > deviated
> > from NumPy on purpose because NumPy's design turned out to have
> > issues or
> > unnecessary complexity.
> >
> > For a longer discussion on the purpose of the array API standard we
> > refer to
> > the `Purpose and Scope section of the array API standard <
> > https://data-apis.github.io/array-api/latest/purpose_and_scope.html>`
> > __
> > and the two blog posts announcing the formation of the Consortium
> > [1]_ and
> > the release of the first draft version of the standard for community
> > review
> > [2]_.
> >
> > The scope of this NEP includes:
> >
> > - Adopting the 2021 version of the array API standard
> > - Adding a separate namespace, tentatively named ``numpy.array_api``
> > - Changes needed/desired outside of the new namespace, for example
> > new
> > dunder
> >   methods on the ``ndarray`` object
> > - Implementation choices, and differences between functions in the
> > new
> >   namespace with those in the main ``numpy`` namespace
> > - A new array object conforming to the array API standard
> > - Maintenance effort and testing strategy
> > - Impact on NumPy's total exposed API surface and on other future and
> >   under-discussion design choices
> > - Relation to existing and proposed NumPy array protocols
> >   (``__array_ufunc__``, ``__array_function__``,
> > ``__array_module__``).
> > - Required improvements to existing NumPy functionality
> >
> > Out of scope for this NEP are:
> >
> > - Changes in the array API standard itself. Those are likely to come
> > up
> >   during review of this NEP, but should be upstreamed as needed and
> > this NEP
> >   subsequently updated.
> >
> >
> > Usage and Impact
> > ----------------
> >
> > *This section will be fleshed out later, for now we refer to the use
> > cases
> > given
> > in* `the array API standard Use Cases section <
> > https://data-apis.github.io/array-api/latest/use_cases.html>`__
> >
> > In addition to those use cases, the new namespace contains
> > functionality
> > that
> > is widely used and supported by many array libraries. As such, it is
> > a good
> > set of functions to teach to newcomers to NumPy and recommend as
> > "best
> > practice". That contrasts with NumPy's main namespace, which contains
> > many
> > functions and objects that have been superceded or we consider
> > mistakes -
> > but
> > that we can't remove because of backwards compatibility reasons.
> >
> > The usage of the ``numpy.array_api`` namespace by downstream
> > libraries is
> > intended to enable them to consume multiple kinds of arrays, *without
> > having
> > to have a hard dependency on all of those array libraries*:
> >
> > .. image:: _static/nep-0047-library-dependencies.png
> >
> > Adoption in downstream libraries
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > The prototype implementation of the ``array_api`` namespace will be
> > used
> > with
> > SciPy, scikit-learn and other libraries of interest that depend on
> > NumPy, in
> > order to get more experience with the design and find out if any
> > important
> > parts are missing.
> >
> > The pattern to support multiple array libraries is intended to be
> > something
> > like::
> >
> >     def somefunc(x, y):
> >         # Retrieves standard namespace. Raises if x and y have
> > different
> >         # namespaces.  See Appendix for possible get_namespace
> > implementation
> >         xp = get_namespace(x, y)
> >         out = xp.mean(x, axis=0) + 2*xp.std(y, axis=0)
> >         return out
> >
> > The ``get_namespace`` call is effectively the library author opting
> > in to
> > using the standard API namespace, and thereby explicitly supporting
> > all conforming array libraries.
> >
> >
> > The ``asarray`` / ``asanyarray`` pattern
> > ````````````````````````````````````````
> >
> > Many existing libraries use the same ``asarray`` (or ``asanyarray``)
> > pattern
> > as NumPy itself does; accepting any object that can be coerced into a
> > ``np.ndarray``.
> > We consider this design pattern problematic - keeping in mind the Zen
> > of
> > Python, *"explicit is better than implicit"*, as well as the pattern
> > being
> > historically problematic in the SciPy ecosystem for ``ndarray``
> > subclasses
> > and with over-eager object creation. All other array/tensor libraries
> > are
> > more strict, and that works out fine in practice. We would advise
> > authors of
> > new libraries to avoid the ``asarray`` pattern. Instead they should
> > either
> > accept just NumPy arrays or, if they want to support multiple kinds
> > of
> > arrays, check if the incoming array object supports the array API
> > standard
> > by checking for ``__array_namespace__`` as shown in the example
> > above.
> >
> > Existing libraries can do such a check as well, and only call
> > ``asarray`` if
> > the check fails. This is very similar to the ``__duckarray__`` idea
> > in
> > :ref:`NEP30`.
> >
> >
> > .. _adoption-application-code:
> >
> > Adoption in application code
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > The new namespace can be seen by end users as a cleaned up and
> > slimmed down
> > version of NumPy's main namespace. Encouraging end users to use this
> > namespace like::
> >
> >     import numpy.array_api as xp
> >
> >     x = xp.linspace(0, 2*xp.pi, num=100)
> >     y = xp.cos(x)
> >
> > seems perfectly reasonable, and potentially beneficial - users get
> > offered
> > only
> > one function for each purpose (the one we consider best-practice),
> > and they
> > then write code that is more easily portable to other libraries.
> >
> >
> > Backward compatibility
> > ----------------------
> >
> > No deprecations or removals of existing NumPy APIs or other backwards
> > incompatible changes are proposed.
> >
> >
> > High-level design
> > -----------------
> >
> > The array API standard consists of approximately 120 objects, all of
> > which
> > have a direct NumPy equivalent. This figure shows what is included at
> > a
> > high level:
> >
> > .. image:: _static/nep-0047-scope-of-array-API.png
> >
> > The most important changes compared to what NumPy currently offers
> > are:
> >
> > - A new array object which:
> >
> >     - conforms to the casting rules and indexing behaviour specified
> > by the
> >       standard,
> >     - does not have methods other than dunder methods,
> >     - does not support the full range of NumPy indexing behaviour.
> > Advanced
> >       indexing with integers is not supported. Only boolean indexing
> >       with a single (possibly multi-dimensional) boolean array is
> > supported.
> >       An indexing expression that selects a single element returns a
> > 0-D
> > array
> >       rather than a scalar.
> >
> > - Functions in the ``array_api`` namespace:
> >
> >     - do not accept ``array_like`` inputs, only NumPy arrays and
> > Python
> > scalars
> >     - do not support ``__array_ufunc__`` and ``__array_function__``,
> >     - use positional-only and keyword-only parameters in their
> > signatures,
> >     - have inline type annotations,
> >     - may have minor changes to signatures and semantics of
> > individual
> >       functions compared to their equivalents already present in
> > NumPy,
> >     - only support dtype literals, not format strings or other ways
> > of
> >       specifying dtypes
> >
> > - DLPack_ support will be added to NumPy,
> > - New syntax for "device support" will be added, through a
> > ``.device``
> >   attribute on the new array object, and ``device=`` keywords in
> > array
> > creation
> >   functions in the ``array_api`` namespace,
> > - Casting rules that differ from those NumPy currently has. Output
> > dtypes
> > can
> >   be derived from input dtypes (i.e. no value-based casting), and 0-D
> > arrays
> >   are treated like >=1-D arrays.
> > - Not all dtypes NumPy has are part of the standard. Only boolean,
> > signed
> > and
> >   unsigned integers, and floating-point dtypes up to ``float64`` are
> > supported.
> >   Complex dtypes are expected to be added in the next version of the
> > standard.
> >   Extended precision, string, void, object and datetime dtypes, as
> > well as
> >   structured dtypes, are not included.
> >
> > Improvements to existing NumPy functionality that are needed include:
> >
> > - Add support for stacks of matrices to some functions in
> > ``numpy.linalg``
> >   that are currently missing such support.
> > - Add the ``keepdims`` keyword to ``np.argmin`` and ``np.argmax``.
> > - Add a "never copy" mode to ``np.asarray``.
> >
> >
> > Functions in the ``array_api`` namespace
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > Let's start with an example of a function implementation that shows
> > the most
> > important differences with the equivalent function in the main
> > namespace::
> >
> >     def max(x: array, /, *,
> >             axis: Optional[Union[int, Tuple[int, ...]]] = None,
> >             keepdims: bool = False
> >         ) -> array:
> >         """
> >         Array API compatible wrapper for :py:func:`np.max
> > <numpy.max>`.
> >         """
> >         return np.max._implementation(x, axis=axis,
> > keepdims=keepdims)
> >
> > This function does not accept ``array_like`` inputs, only
> > ``ndarray``. There
> > are multiple reasons for this. Other array libraries all work like
> > this.
> > Letting the user do coercion of lists, generators, or other foreign
> > objects
> > separately results in a cleaner design with less unexpected
> > behaviour.
> > It's higher-performance - less overhead from ``asarray`` calls.
> > Static
> > typing
> > is easier. Subclasses will work as expected. And the slight increase
> > in
> > verbosity
> > because users have to explicitly coerce to ``ndarray`` on rare
> > occasions
> > seems like a small price to pay.
> >
> > This function does not support ``__array_ufunc__`` nor
> > ``__array_function__``.
> > These protocols serve a similar purpose as the array API standard
> > module
> > itself,
> > but through a different mechanisms. Because only ``ndarray``
> > instances are
> > accepted,
> > dispatching via one of these protocols isn't useful anymore.
> >
> > This function uses positional-only parameters in its signature. This
> > makes
> > code
> > more portable - writing ``max(x=x, ...)`` is no longer valid, hence
> > if other
> > libraries call the first parameter ``input`` rather than ``x``, that
> > is
> > fine.
> > The rationale for keyword-only parameters (not shown in the above
> > example)
> > is
> > two-fold: clarity of end user code, and it being easier to extend the
> > signature
> > in the future with keywords in the desired order.
> >
> > This function has inline type annotations. Inline annotations are far
> > easier to
> > maintain than separate stub files. And because the types are simple,
> > this
> > will
> > not result in a large amount of clutter with type aliases or unions
> > like in
> > the
> > current stub files NumPy has.
> >
> >
> > DLPack support for zero-copy data interchange
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > The ability to convert one kind of array into another kind is
> > valuable, and
> > indeed necessary when downstream libraries want to support multiple
> > kinds of
> > arrays. This requires a well-specified data exchange protocol. NumPy
> > already
> > supports two of these, namely the buffer protocol (i.e., PEP 3118),
> > and
> > the ``__array_interface__`` (Python side) / ``__array_struct__`` (C
> > side)
> > protocol. Both work similarly, letting the "producer" describe how
> > the data
> > is laid out in memory so the "consumer" can construct its own kind of
> > array
> > with a view on that data.
> >
> > DLPack works in a very similar way. The main reasons to prefer DLPack
> > over
> > the options already present in NumPy are:
> >
> > 1. DLPack is the only protocol with device support (e.g., GPUs using
> > CUDA or
> >    ROCm drivers, or OpenCL devices). NumPy is CPU-only, but other
> > array
> >    libraries are not. Having one protocol per device isn't tenable,
> > hence
> >    device support is a must.
> > 2. Widespread support. DLPack has the widest adoption of all
> > protocols, only
> >    NumPy is missing support. And the experiences of other libraries
> > with it
> >    are positive. This contrasts with the protocols NumPy does
> > support, which
> >    are used very little - when other libraries want to interoperate
> > with
> >    NumPy, they typically use the (more limited, and NumPy-specific)
> >    ``__array__`` protocol.
> >
> > Adding support for DLPack to NumPy entails:
> >
> > - Adding a ``ndarray.__dlpack__`` method
> > - Adding a ``from_dlpack`` function, which takes as input an object
> >   supporting ``__dlpack__``, and returns an ``ndarray``.
> >
> > DLPack is currently a ~200 LoC header, and is meant to be included
> > directly, so
> > no external dependency is needed. Implementation should be
> > straightforward.
> >
> >
> > Syntax for device support
> > ~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > NumPy itself is CPU-only, so it clearly doesn't have a need for
> > device
> > support.
> > However, other libraries (e.g. TensorFlow, PyTorch, JAX, MXNet)
> > support
> > multiple types of devices: CPU, GPU, TPU, and more exotic hardware.
> > To write portable code on systems with multiple devices, it's often
> > necessary
> > to create new arrays on the same device as some other array, or check
> > that
> > two arrays live on the same device. Hence syntax for that is needed.
> >
> > The array object will have a ``.device`` attribute which enables
> > comparing
> > devices of different arrays (they only should compare equal if both
> > arrays
> > are
> > from the same library and it's the same hardware device).
> > Furthermore,
> > ``device=`` keywords in array creation functions are needed. For
> > example::
> >
> >     def empty(shape: Union[int, Tuple[int, ...]], /, *,
> >               dtype: Optional[dtype] = None,
> >               device: Optional[device] = None) -> array:
> >         """
> >         Array API compatible wrapper for :py:func:`np.empty
> > <numpy.empty>`.
> >         """
> >         return np.empty(shape, dtype=dtype, device=device)
> >
> > The implementation for NumPy may be as simple as setting the device
> > attribute to
> > the string ``'cpu'`` and raising an exception if array creation
> > functions
> > encounter any other value.
> >
> >
> > Dtypes and casting rules
> > ~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > The supported dtypes in this namespace are boolean, 8/16/32/64-bit
> > signed
> > and
> > unsigned integer, and 32/64-bit floating-point dtypes. These will be
> > added
> > to
> > the namespace as dtype literals with the expected names (e.g.,
> > ``bool``,
> > ``uint16``, ``float64``).
> >
> > The most obvious omissions are the complex dtypes. The rationale for
> > the
> > lack
> > of complex support in the first version of the array API standard is
> > that
> > several
> > libraries (PyTorch, MXNet) are still in the process of adding support
> > for
> > complex dtypes. The next version of the standard is expected to
> > include
> > ``complex64``
> > and ``complex128`` (see `this issue <
> > https://github.com/data-apis/array-api/issues/102>`__
> > for more details).
> >
> > Specifying dtypes to functions, e.g. via the ``dtype=`` keyword, is
> > expected
> > to only use the dtype literals. Format strings, Python builtin
> > dtypes, or
> > string representations of the dtype literals are not accepted - this
> > will
> > improve readability and portability of code at little cost.
> >
> > Casting rules are only defined between different dtypes of the same
> > kind.
> > The
> > rationale for this is that mixed-kind (e.g., integer to floating-
> > point)
> > casting behavior differs between libraries. NumPy's mixed-kind
> > casting
> > behavior doesn't need to be changed or restricted, it only needs to
> > be
> > documented that if users use mixed-kind casting, their code may not
> > be
> > portable.
> >
> > .. image:: _static/nep-0047-casting-rules-lattice.png
> >
> > *Type promotion diagram. Promotion between any two types is given by
> > their
> > join on this lattice. Only the types of participating arrays matter,
> > not
> > their values. Dashed lines indicate that behaviour for Python scalars
> > is
> > undefined on overflow. Boolean, integer and floating-point dtypes are
> > not
> > connected, indicating mixed-kind promotion is undefined.*
> >
> > The most important difference between the casting rules in NumPy and
> > in the
> > array API standard is how scalars and 0-dimensional arrays are
> > handled. In
> > the standard, array scalars do not exist and 0-dimensional arrays
> > follow the
> > same casting rules as higher-dimensional arrays.
> >
> > See the `Type Promotion Rules section of the array API standard <
> > https://data-apis.github.io/array-api/latest/API_specification/type_promotion.html
> > > `__
> > for more details.
> >
> > .. note::
> >
> >     It is not clear what the best way is to support the different
> > casting
> > rules
> >     for 0-dimensional arrays and no value-based casting. One option
> > may be
> > to
> >     implement this second set of casting rules, keep them private,
> > mark the
> >     array API functions with a private attribute that says they
> > adhere to
> >     these different rules, and let the casting machinery check
> > whether for
> >     that attribute.
> >
> >     This needs discussion.
> >
> >
> > Indexing
> > ~~~~~~~~
> >
> > An indexing expression that would return a scalar with ``ndarray``,
> > e.g.
> > ``arr_2d[0, 0]``, will return a 0-D array with the new array object.
> > There
> > are
> > several reasons for that: array scalars are largely considered a
> > design
> > mistake
> > which no other array library copied; it works better for non-CPU
> > libraries
> > (typically arrays can live on the device, scalars live on the host);
> > and
> > it's
> > simply a consistent design. To get a Python scalar out of a 0-D
> > array, one
> > can
> > simply use the builtin for the type, e.g. ``float(arr_0d)``.
> >
> > The other `indexing modes in the standard <
> > https://data-apis.github.io/array-api/latest/API_specification/indexing.html
> > > `__
> > do work largely the same as they do for ``numpy.ndarray``. One
> > noteworthy
> > difference is that clipping in slice indexing (e.g., ``a[:n]`` where
> > ``n``
> > is
> > larger than the size of the first axis) is unspecified behaviour,
> > because
> > that kind of check can be expensive on accelerators.
> >
> > The lack of advanced indexing, and boolean indexing being limited to
> > a
> > single
> > n-D boolean array, is due to those indexing modes not being suitable
> > for all
> > types of arrays or JIT compilation. Their absence does not seem to be
> > problematic; if a user or library author wants to use them, they can
> > do so
> > through zero-copy conversion to ``numpy.ndarray``. This will signal
> > correctly
> > to whomever reads the code that it is then NumPy-specific rather than
> > portable
> > to all conforming array types.
> >
> >
> >
> > The array object
> > ~~~~~~~~~~~~~~~~
> >
> > The array object in the standard does not have methods other than
> > dunder
> > methods. The rationale for that is that not all array libraries have
> > methods
> > on their array object (e.g., TensorFlow does not). It also provides
> > only a
> > single way of doing something, rather than have functions and methods
> > that
> > are effectively duplicate.
> >
> > Mixing operations that may produce views (e.g., indexing,
> > ``nonzero``)
> > in combination with mutation (e.g., item or slice assignment) is
> > `explicitly documented in the standard to not be supported <
> > https://data-apis.github.io/array-api/latest/design_topics/copies_views_and_mutation.html
> > > `__.
> > This cannot easily be prohibited in the array object itself; instead
> > this
> > will
> > be guidance to the user via documentation.
> >
> > The standard current does not prescribe a name for the array object
> > itself.
> > We propose to simply name it ``ndarray``. This is the most obvious
> > name, and
> > because of the separate namespace should not clash with
> > ``numpy.ndarray``.
> >
> >
> > Implementation
> > --------------
> >
> > .. note::
> >
> >     This section needs a lot more detail, which will gradually be
> > added when
> >     the implementation progresses.
> >
> > A prototype of the ``array_api`` namespace can be found in
> > https://github.com/data-apis/numpy/tree/array-api/numpy/_array_api.
> > The docstring in its ``__init__.py`` has notes on completeness of the
> > implementation. The code for the wrapper functions also contains ``#
> > Note:``
> > comments everywhere there is a difference with the NumPy API.
> > Two important parts that are not implemented yet are the new array
> > object
> > and
> > DLPack support. Functions may need changes to ensure the changed
> > casting
> > rules
> > are respected.
> >
> > The array object
> > ~~~~~~~~~~~~~~~~
> >
> > Regarding the array object implementation, we plan to start with a
> > regular
> > Python class that wraps a ``numpy.ndarray`` instance. Attributes and
> > methods
> > can forward to that wrapped instance, applying input validation and
> > implementing changed behaviour as needed.
> >
> > The casting rules are probably the most challenging part. The in-
> > progress
> > dtype system refactor (NEPs 40-43) should make implementing the
> > correct
> > casting
> > behaviour easier - it is already moving away from value-based casting
> > for
> > example.
> >
> >
> > The dtype objects
> > ~~~~~~~~~~~~~~~~~
> >
> > We must be able to compare dtypes for equality, and expressions like
> > these
> > must
> > be possible::
> >
> >     np.array_api.some_func(..., dtype=x.dtype)
> >
> > The above implies it would be nice to have ``np.array_api.float32 ==
> > np.array_api.ndarray(...).dtype``.
> >
> > Dtypes should not be assumed to have a class hierarchy by users,
> > however we
> > are
> > free to implement it with a class hierarchy if that's convenient. We
> > considered
> > the following options to implement dtype objects:
> >
> > 1. Alias dtypes to those in the main namespace. E.g.,
> > ``np.array_api.float32 =
> >    np.float32``.
> > 2. Make the dtypes instances of ``np.dtype``. E.g.,
> > ``np.array_api.float32 =
> >    np.dtype(np.float32)``.
> > 3. Create new singleton classes with only the required
> > methods/attributes
> >    (currently just ``__eq__``).
> >
> > It seems like (2) would be easiest from the perspective of
> > interacting with
> > functions outside the main namespace. And (3) would adhere best to
> > the
> > standard.
> >
> > TBD: the standard does not yet have a good way to inspect properties
> > of a
> > dtype, to ask questions like "is this an integer dtype?". Perhaps
> > this is
> > easy
> > enough to do for users, like so::
> >
> >     def _get_dtype(dt_or_arr):
> >         return dt_or_arr.dtype if hasattr(dt_or_arr, 'dtype') else
> > dt_or_arr
> >
> >     def is_floating(dtype_or_array):
> >         dtype = _get_dtype(dtype_or_array)
> >         return dtype in (float32, float64)
> >
> >     def is_integer(dtype_or_array):
> >         dtype = _get_dtype(dtype_or_array)
> >         return dtype in (uint8, uint16, uint32, uint64, int8, int16,
> > int32,
> > int64)
> >
> > However it could make sense to add to the standard. Note that NumPy
> > itself
> > currently does not have a great for asking such questions, see
> > `gh-17325 <https://github.com/numpy/numpy/issues/17325>`__.
> > _______________________________________________
> > NumPy-Discussion mailing list
> > [hidden email]
> > https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: NEP: array API standard adoption (NEP 47)

Sebastian Berg
On Wed, 2021-03-10 at 13:44 -0700, Aaron Meurer wrote:

> On Wed, Mar 10, 2021 at 10:42 AM Sebastian Berg
> <[hidden email]> wrote:
> >
> > Top Posting, to discuss post specific questions about NEP 47 and
> > partially the start on implementing it in:
> >
> >     https://github.com/numpy/numpy/pull/18585
> >
> > There are probably many more that will crop up. But for me, each of
> > these is a pretty major difficulty without a clear answer as of
> > now.
> >
> > 1. I still need clarity how a library is supposed to use this
> > namespace
> > when the user passes in a NumPy array (mentioned before).  The user
> > must get back a NumPy array after all.  Maybe that is just a
> > decorator,
> > but it seems important.
> >
> > 2. `np.result_type` special cases array-scalars (the current PR),
> > NEP
> > 47 promises it will not.  The PR could attempt to work around that
> > using `arr.dtype` int `result_type`, I expect there are more
> > details to
> > fight with there, but I am not sure.
>
> The idea is to work around it everywhere, so that it follows the
> rules
> in the spec (no array scalars, no value-based casting). I haven't
> started it yet, though, so I don't know yet how hard it will be. If
> it
> ends up being too hard we could put it in the same camp as device
> support and dlpack support where it needs some basic implementation
> in
> numpy itself first before we can properly do it in the array API
> namespace.
Quite frankly. If you really want to implement a minimal API, it may be
best to just write it yourself and ditch NumPy. (Of course I currently
doubt that the NEP 47 implementation should be minimal.)

About doing promotion yourself  ("promotion" as in what ufuncs do; I
call `np.result_type` "common DType", because it is used e.g. in
`concatenate`):

Ufuncs have at least one more rule for true-division, plus there may be
mixed float-int loops, etc.  Since the standard is very limited and you
only have numeric dtypes that might be all though.

In any case, my point is: If NumPy does strange things (and it does
with 0-D arrays currently).  You could cook your own soup there also,
and implement it in NumPy by using `signature=...` in the ufunc call.


>
> >
> > 3. For all other function, the same problem applies. You don't
> > actually
> > have anything to fix NumPy promotion rules.  You could bake your
> > own
> > cake here for numeric types, but I am not sure, you might also need
> > NEP
> > 43 in all its promotion power to pull it off.
> >
> > 4. Now that I looked at the above, I do not feel its reasonable to
> > limit this functionality to numeric dtypes.  If someone uses a
> > NumPy
> > rational-dtype, why should a SciPy function currently implemented
> > in
> > pure NumPy reject that?  In other words, I think this is the point
> > where trying to be "minimal" is counterproductive.
>
> The idea of minimality is to make it so users can be sure they will
> be
> able to use other libraries, once they also have array API compliant
> namespaces. A rational-dtype wouldn't ever be implemented in those
> other libraries, because it isn't part of the standard, so if a user
> is using those, that is a sign they are using things that aren't in
> the array API, so they can't expect to be able to swap out their
> dtypes. If a user wants to use something that's only in NumPy, then
> they should just use NumPy.
>
This is not about the "user", in your scenario the end-user does use
NumPy.  The way I understand this is not a prerequisite.  If it is, a
lot of things will be simpler though, and most of my doubts will go
away (but be replaced with uncertainty about the usefulness).


The problem is that SciPy as the "library author" wants to to use NEP
47 without limiting the end-user (or the end-user even noticing!).
The distinction between end-user and library author (someone who writes
a function that should work with numpy, pytorch, etc.) is very
important here and too all of these "protocol" discussions.


I assume that SciPy should be able to have the cake and eat it to:

* Uses the limited array-api and make sure to only rely on the minimal
  subset.
* Not artificially limit end-users who pass in NumPy arrays.

The second point can also be read as: SciPy would be able to support
practically all current NumPy array use cases without jumping through
any additional hoops (or well, maybe a bit of churn, but churn that is
made easy by as of now undefined API).

> >
> > 4. The PR makes no attempt at handling binary operators in any way
> > aside from greedily coercing the other operand.
> >
> > 5. What happens with a mix of array-likes or even array subclasses
> > like
> > `astropy.quantity`?
> >
> > 6. Is there any provision on how to deal with mixed array-like
> > inputs?
> > CuPy+numpy, etc.?
>
> Neither of these are defined in the spec. The spec only deals with
> staying inside of the compliant namespace. It doesn't require any
> behavior mixing things from other namespaces. That's generally
> considered a much harder problem, and there is the data interchange
> protocol to deal with it
> (
> https://data-apis.github.io/array-api/latest/design_topics/data_interchange.html
> ).
>
OK, maybe you can get away with it, since the current proposal seems to
be that `get_namespace()` raises on mixed input. Still seems like
something that should probably raise an error rather than coerce to
NumPy when calling: `nep47_array_object + dask_array`.

Cheers,

Sebastian


> Aaron Meurer
>
> >
> >
> > I don't think we have to figure out everything up-front, but I do
> > think
> > there are a few very fundamental questions still open, at least for
> > me
> > personally.
> >
> > Cheers,
> >
> > Sebastian
> >
> >
> >
> > On Sun, 2021-02-21 at 17:30 +0100, Ralf Gommers wrote:
> > > Hi all,
> > >
> > > Here is a NEP, written together with Stephan Hoyer and Aaron
> > > Meurer,
> > > for
> > > discussion on adoption of the array API standard (
> > > https://data-apis.github.io/array-api/latest/). This will add a
> > > new
> > > numpy.array_api submodule containing that standardized API. The
> > > main
> > > purpose of this API is to be able to write code that is portable
> > > to
> > > other
> > > array/tensor libraries like CuPy, PyTorch, JAX, TensorFlow, Dask,
> > > and
> > > MXNet.
> > >
> > > We expect this NEP to remain in draft state for quite a while,
> > > while
> > > we're
> > > gaining experience with using it in downstream libraries, discuss
> > > adding it
> > > to other array libraries, and finishing some of the loose ends
> > > (e.g.,
> > > specifications for linear algebra functions that aren't merged
> > > yet,
> > > see
> > > https://github.com/data-apis/array-api/pulls) in the API standard
> > > itself.
> > >
> > > See
> > >  
> > > https://mail.python.org/pipermail/numpy-discussion/2020-November/081181.html
> > > for an initial discussion about this topic.
> > >
> > > Please keep high-level discussion here and detailed comments on
> > > https://github.com/numpy/numpy/pull/18456. Also, you can access a
> > > rendered
> > > version of the NEP from that PR (see PR description for how),
> > > which
> > > may be
> > > helpful.
> > > Cheers,
> > > Ralf
> > >
> > >
> > > Abstract
> > > --------
> > >
> > > We propose to adopt the `Python array API standard`_, developed
> > > by
> > > the
> > > `Consortium for Python Data API Standards`_. Implementing this as
> > > a
> > > separate
> > > new namespace in NumPy will allow authors of libraries which
> > > depend
> > > on NumPy
> > > as well as end users to write code that is portable between NumPy
> > > and
> > > all
> > > other array/tensor libraries that adopt this standard.
> > >
> > > .. note::
> > >
> > >     We expect that this NEP will remain in a draft state for
> > > quite a
> > > while.
> > >     Given the large scope we don't expect to propose it for
> > > acceptance any
> > >     time soon; instead, we want to solicit feedback on both the
> > > high-
> > > level
> > >     design and implementation, and learn what needs describing
> > > better
> > > in
> > > this
> > >     NEP or changing in either the implementation or the array API
> > > standard
> > >     itself.
> > >
> > >
> > > Motivation and Scope
> > > --------------------
> > >
> > > Python users have a wealth of choice for libraries and frameworks
> > > for
> > > numerical computing, data science, machine learning, and deep
> > > learning. New
> > > frameworks pushing forward the state of the art in these fields
> > > are
> > > appearing
> > > every year. One unintended consequence of all this activity and
> > > creativity
> > > has been fragmentation in multidimensional array (a.k.a. tensor)
> > > libraries -
> > > which are the fundamental data structure for these fields.
> > > Choices
> > > include
> > > NumPy, Tensorflow, PyTorch, Dask, JAX, CuPy, MXNet, and others.
> > >
> > > The APIs of each of these libraries are largely similar, but with
> > > enough
> > > differences that it’s quite difficult to write code that works
> > > with
> > > multiple
> > > (or all) of these libraries. The array API standard aims to
> > > address
> > > that
> > > issue, by specifying an API for the most common ways arrays are
> > > constructed
> > > and used. The proposed API is quite similar to NumPy's API, and
> > > deviates
> > > mainly
> > > in places where (a) NumPy made design choices that are inherently
> > > not
> > > portable
> > > to other implementations, and (b) where other libraries
> > > consistently
> > > deviated
> > > from NumPy on purpose because NumPy's design turned out to have
> > > issues or
> > > unnecessary complexity.
> > >
> > > For a longer discussion on the purpose of the array API standard
> > > we
> > > refer to
> > > the `Purpose and Scope section of the array API standard <
> > >  
> > > https://data-apis.github.io/array-api/latest/purpose_and_scope.html
> > > >`
> > > __
> > > and the two blog posts announcing the formation of the Consortium
> > > [1]_ and
> > > the release of the first draft version of the standard for
> > > community
> > > review
> > > [2]_.
> > >
> > > The scope of this NEP includes:
> > >
> > > - Adopting the 2021 version of the array API standard
> > > - Adding a separate namespace, tentatively named
> > > ``numpy.array_api``
> > > - Changes needed/desired outside of the new namespace, for
> > > example
> > > new
> > > dunder
> > >   methods on the ``ndarray`` object
> > > - Implementation choices, and differences between functions in
> > > the
> > > new
> > >   namespace with those in the main ``numpy`` namespace
> > > - A new array object conforming to the array API standard
> > > - Maintenance effort and testing strategy
> > > - Impact on NumPy's total exposed API surface and on other future
> > > and
> > >   under-discussion design choices
> > > - Relation to existing and proposed NumPy array protocols
> > >   (``__array_ufunc__``, ``__array_function__``,
> > > ``__array_module__``).
> > > - Required improvements to existing NumPy functionality
> > >
> > > Out of scope for this NEP are:
> > >
> > > - Changes in the array API standard itself. Those are likely to
> > > come
> > > up
> > >   during review of this NEP, but should be upstreamed as needed
> > > and
> > > this NEP
> > >   subsequently updated.
> > >
> > >
> > > Usage and Impact
> > > ----------------
> > >
> > > *This section will be fleshed out later, for now we refer to the
> > > use
> > > cases
> > > given
> > > in* `the array API standard Use Cases section <
> > > https://data-apis.github.io/array-api/latest/use_cases.html>`__
> > >
> > > In addition to those use cases, the new namespace contains
> > > functionality
> > > that
> > > is widely used and supported by many array libraries. As such, it
> > > is
> > > a good
> > > set of functions to teach to newcomers to NumPy and recommend as
> > > "best
> > > practice". That contrasts with NumPy's main namespace, which
> > > contains
> > > many
> > > functions and objects that have been superceded or we consider
> > > mistakes -
> > > but
> > > that we can't remove because of backwards compatibility reasons.
> > >
> > > The usage of the ``numpy.array_api`` namespace by downstream
> > > libraries is
> > > intended to enable them to consume multiple kinds of arrays,
> > > *without
> > > having
> > > to have a hard dependency on all of those array libraries*:
> > >
> > > .. image:: _static/nep-0047-library-dependencies.png
> > >
> > > Adoption in downstream libraries
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > >
> > > The prototype implementation of the ``array_api`` namespace will
> > > be
> > > used
> > > with
> > > SciPy, scikit-learn and other libraries of interest that depend
> > > on
> > > NumPy, in
> > > order to get more experience with the design and find out if any
> > > important
> > > parts are missing.
> > >
> > > The pattern to support multiple array libraries is intended to be
> > > something
> > > like::
> > >
> > >     def somefunc(x, y):
> > >         # Retrieves standard namespace. Raises if x and y have
> > > different
> > >         # namespaces.  See Appendix for possible get_namespace
> > > implementation
> > >         xp = get_namespace(x, y)
> > >         out = xp.mean(x, axis=0) + 2*xp.std(y, axis=0)
> > >         return out
> > >
> > > The ``get_namespace`` call is effectively the library author
> > > opting
> > > in to
> > > using the standard API namespace, and thereby explicitly
> > > supporting
> > > all conforming array libraries.
> > >
> > >
> > > The ``asarray`` / ``asanyarray`` pattern
> > > ````````````````````````````````````````
> > >
> > > Many existing libraries use the same ``asarray`` (or
> > > ``asanyarray``)
> > > pattern
> > > as NumPy itself does; accepting any object that can be coerced
> > > into a
> > > ``np.ndarray``.
> > > We consider this design pattern problematic - keeping in mind the
> > > Zen
> > > of
> > > Python, *"explicit is better than implicit"*, as well as the
> > > pattern
> > > being
> > > historically problematic in the SciPy ecosystem for ``ndarray``
> > > subclasses
> > > and with over-eager object creation. All other array/tensor
> > > libraries
> > > are
> > > more strict, and that works out fine in practice. We would advise
> > > authors of
> > > new libraries to avoid the ``asarray`` pattern. Instead they
> > > should
> > > either
> > > accept just NumPy arrays or, if they want to support multiple
> > > kinds
> > > of
> > > arrays, check if the incoming array object supports the array API
> > > standard
> > > by checking for ``__array_namespace__`` as shown in the example
> > > above.
> > >
> > > Existing libraries can do such a check as well, and only call
> > > ``asarray`` if
> > > the check fails. This is very similar to the ``__duckarray__``
> > > idea
> > > in
> > > :ref:`NEP30`.
> > >
> > >
> > > .. _adoption-application-code:
> > >
> > > Adoption in application code
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > >
> > > The new namespace can be seen by end users as a cleaned up and
> > > slimmed down
> > > version of NumPy's main namespace. Encouraging end users to use
> > > this
> > > namespace like::
> > >
> > >     import numpy.array_api as xp
> > >
> > >     x = xp.linspace(0, 2*xp.pi, num=100)
> > >     y = xp.cos(x)
> > >
> > > seems perfectly reasonable, and potentially beneficial - users
> > > get
> > > offered
> > > only
> > > one function for each purpose (the one we consider best-
> > > practice),
> > > and they
> > > then write code that is more easily portable to other libraries.
> > >
> > >
> > > Backward compatibility
> > > ----------------------
> > >
> > > No deprecations or removals of existing NumPy APIs or other
> > > backwards
> > > incompatible changes are proposed.
> > >
> > >
> > > High-level design
> > > -----------------
> > >
> > > The array API standard consists of approximately 120 objects, all
> > > of
> > > which
> > > have a direct NumPy equivalent. This figure shows what is
> > > included at
> > > a
> > > high level:
> > >
> > > .. image:: _static/nep-0047-scope-of-array-API.png
> > >
> > > The most important changes compared to what NumPy currently
> > > offers
> > > are:
> > >
> > > - A new array object which:
> > >
> > >     - conforms to the casting rules and indexing behaviour
> > > specified
> > > by the
> > >       standard,
> > >     - does not have methods other than dunder methods,
> > >     - does not support the full range of NumPy indexing
> > > behaviour.
> > > Advanced
> > >       indexing with integers is not supported. Only boolean
> > > indexing
> > >       with a single (possibly multi-dimensional) boolean array is
> > > supported.
> > >       An indexing expression that selects a single element
> > > returns a
> > > 0-D
> > > array
> > >       rather than a scalar.
> > >
> > > - Functions in the ``array_api`` namespace:
> > >
> > >     - do not accept ``array_like`` inputs, only NumPy arrays and
> > > Python
> > > scalars
> > >     - do not support ``__array_ufunc__`` and
> > > ``__array_function__``,
> > >     - use positional-only and keyword-only parameters in their
> > > signatures,
> > >     - have inline type annotations,
> > >     - may have minor changes to signatures and semantics of
> > > individual
> > >       functions compared to their equivalents already present in
> > > NumPy,
> > >     - only support dtype literals, not format strings or other
> > > ways
> > > of
> > >       specifying dtypes
> > >
> > > - DLPack_ support will be added to NumPy,
> > > - New syntax for "device support" will be added, through a
> > > ``.device``
> > >   attribute on the new array object, and ``device=`` keywords in
> > > array
> > > creation
> > >   functions in the ``array_api`` namespace,
> > > - Casting rules that differ from those NumPy currently has.
> > > Output
> > > dtypes
> > > can
> > >   be derived from input dtypes (i.e. no value-based casting), and
> > > 0-D
> > > arrays
> > >   are treated like >=1-D arrays.
> > > - Not all dtypes NumPy has are part of the standard. Only
> > > boolean,
> > > signed
> > > and
> > >   unsigned integers, and floating-point dtypes up to ``float64``
> > > are
> > > supported.
> > >   Complex dtypes are expected to be added in the next version of
> > > the
> > > standard.
> > >   Extended precision, string, void, object and datetime dtypes,
> > > as
> > > well as
> > >   structured dtypes, are not included.
> > >
> > > Improvements to existing NumPy functionality that are needed
> > > include:
> > >
> > > - Add support for stacks of matrices to some functions in
> > > ``numpy.linalg``
> > >   that are currently missing such support.
> > > - Add the ``keepdims`` keyword to ``np.argmin`` and
> > > ``np.argmax``.
> > > - Add a "never copy" mode to ``np.asarray``.
> > >
> > >
> > > Functions in the ``array_api`` namespace
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > >
> > > Let's start with an example of a function implementation that
> > > shows
> > > the most
> > > important differences with the equivalent function in the main
> > > namespace::
> > >
> > >     def max(x: array, /, *,
> > >             axis: Optional[Union[int, Tuple[int, ...]]] = None,
> > >             keepdims: bool = False
> > >         ) -> array:
> > >         """
> > >         Array API compatible wrapper for :py:func:`np.max
> > > <numpy.max>`.
> > >         """
> > >         return np.max._implementation(x, axis=axis,
> > > keepdims=keepdims)
> > >
> > > This function does not accept ``array_like`` inputs, only
> > > ``ndarray``. There
> > > are multiple reasons for this. Other array libraries all work
> > > like
> > > this.
> > > Letting the user do coercion of lists, generators, or other
> > > foreign
> > > objects
> > > separately results in a cleaner design with less unexpected
> > > behaviour.
> > > It's higher-performance - less overhead from ``asarray`` calls.
> > > Static
> > > typing
> > > is easier. Subclasses will work as expected. And the slight
> > > increase
> > > in
> > > verbosity
> > > because users have to explicitly coerce to ``ndarray`` on rare
> > > occasions
> > > seems like a small price to pay.
> > >
> > > This function does not support ``__array_ufunc__`` nor
> > > ``__array_function__``.
> > > These protocols serve a similar purpose as the array API standard
> > > module
> > > itself,
> > > but through a different mechanisms. Because only ``ndarray``
> > > instances are
> > > accepted,
> > > dispatching via one of these protocols isn't useful anymore.
> > >
> > > This function uses positional-only parameters in its signature.
> > > This
> > > makes
> > > code
> > > more portable - writing ``max(x=x, ...)`` is no longer valid,
> > > hence
> > > if other
> > > libraries call the first parameter ``input`` rather than ``x``,
> > > that
> > > is
> > > fine.
> > > The rationale for keyword-only parameters (not shown in the above
> > > example)
> > > is
> > > two-fold: clarity of end user code, and it being easier to extend
> > > the
> > > signature
> > > in the future with keywords in the desired order.
> > >
> > > This function has inline type annotations. Inline annotations are
> > > far
> > > easier to
> > > maintain than separate stub files. And because the types are
> > > simple,
> > > this
> > > will
> > > not result in a large amount of clutter with type aliases or
> > > unions
> > > like in
> > > the
> > > current stub files NumPy has.
> > >
> > >
> > > DLPack support for zero-copy data interchange
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > >
> > > The ability to convert one kind of array into another kind is
> > > valuable, and
> > > indeed necessary when downstream libraries want to support
> > > multiple
> > > kinds of
> > > arrays. This requires a well-specified data exchange protocol.
> > > NumPy
> > > already
> > > supports two of these, namely the buffer protocol (i.e., PEP
> > > 3118),
> > > and
> > > the ``__array_interface__`` (Python side) / ``__array_struct__``
> > > (C
> > > side)
> > > protocol. Both work similarly, letting the "producer" describe
> > > how
> > > the data
> > > is laid out in memory so the "consumer" can construct its own
> > > kind of
> > > array
> > > with a view on that data.
> > >
> > > DLPack works in a very similar way. The main reasons to prefer
> > > DLPack
> > > over
> > > the options already present in NumPy are:
> > >
> > > 1. DLPack is the only protocol with device support (e.g., GPUs
> > > using
> > > CUDA or
> > >    ROCm drivers, or OpenCL devices). NumPy is CPU-only, but other
> > > array
> > >    libraries are not. Having one protocol per device isn't
> > > tenable,
> > > hence
> > >    device support is a must.
> > > 2. Widespread support. DLPack has the widest adoption of all
> > > protocols, only
> > >    NumPy is missing support. And the experiences of other
> > > libraries
> > > with it
> > >    are positive. This contrasts with the protocols NumPy does
> > > support, which
> > >    are used very little - when other libraries want to
> > > interoperate
> > > with
> > >    NumPy, they typically use the (more limited, and NumPy-
> > > specific)
> > >    ``__array__`` protocol.
> > >
> > > Adding support for DLPack to NumPy entails:
> > >
> > > - Adding a ``ndarray.__dlpack__`` method
> > > - Adding a ``from_dlpack`` function, which takes as input an
> > > object
> > >   supporting ``__dlpack__``, and returns an ``ndarray``.
> > >
> > > DLPack is currently a ~200 LoC header, and is meant to be
> > > included
> > > directly, so
> > > no external dependency is needed. Implementation should be
> > > straightforward.
> > >
> > >
> > > Syntax for device support
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~
> > >
> > > NumPy itself is CPU-only, so it clearly doesn't have a need for
> > > device
> > > support.
> > > However, other libraries (e.g. TensorFlow, PyTorch, JAX, MXNet)
> > > support
> > > multiple types of devices: CPU, GPU, TPU, and more exotic
> > > hardware.
> > > To write portable code on systems with multiple devices, it's
> > > often
> > > necessary
> > > to create new arrays on the same device as some other array, or
> > > check
> > > that
> > > two arrays live on the same device. Hence syntax for that is
> > > needed.
> > >
> > > The array object will have a ``.device`` attribute which enables
> > > comparing
> > > devices of different arrays (they only should compare equal if
> > > both
> > > arrays
> > > are
> > > from the same library and it's the same hardware device).
> > > Furthermore,
> > > ``device=`` keywords in array creation functions are needed. For
> > > example::
> > >
> > >     def empty(shape: Union[int, Tuple[int, ...]], /, *,
> > >               dtype: Optional[dtype] = None,
> > >               device: Optional[device] = None) -> array:
> > >         """
> > >         Array API compatible wrapper for :py:func:`np.empty
> > > <numpy.empty>`.
> > >         """
> > >         return np.empty(shape, dtype=dtype, device=device)
> > >
> > > The implementation for NumPy may be as simple as setting the
> > > device
> > > attribute to
> > > the string ``'cpu'`` and raising an exception if array creation
> > > functions
> > > encounter any other value.
> > >
> > >
> > > Dtypes and casting rules
> > > ~~~~~~~~~~~~~~~~~~~~~~~~
> > >
> > > The supported dtypes in this namespace are boolean, 8/16/32/64-
> > > bit
> > > signed
> > > and
> > > unsigned integer, and 32/64-bit floating-point dtypes. These will
> > > be
> > > added
> > > to
> > > the namespace as dtype literals with the expected names (e.g.,
> > > ``bool``,
> > > ``uint16``, ``float64``).
> > >
> > > The most obvious omissions are the complex dtypes. The rationale
> > > for
> > > the
> > > lack
> > > of complex support in the first version of the array API standard
> > > is
> > > that
> > > several
> > > libraries (PyTorch, MXNet) are still in the process of adding
> > > support
> > > for
> > > complex dtypes. The next version of the standard is expected to
> > > include
> > > ``complex64``
> > > and ``complex128`` (see `this issue <
> > > https://github.com/data-apis/array-api/issues/102>`__
> > > for more details).
> > >
> > > Specifying dtypes to functions, e.g. via the ``dtype=`` keyword,
> > > is
> > > expected
> > > to only use the dtype literals. Format strings, Python builtin
> > > dtypes, or
> > > string representations of the dtype literals are not accepted -
> > > this
> > > will
> > > improve readability and portability of code at little cost.
> > >
> > > Casting rules are only defined between different dtypes of the
> > > same
> > > kind.
> > > The
> > > rationale for this is that mixed-kind (e.g., integer to floating-
> > > point)
> > > casting behavior differs between libraries. NumPy's mixed-kind
> > > casting
> > > behavior doesn't need to be changed or restricted, it only needs
> > > to
> > > be
> > > documented that if users use mixed-kind casting, their code may
> > > not
> > > be
> > > portable.
> > >
> > > .. image:: _static/nep-0047-casting-rules-lattice.png
> > >
> > > *Type promotion diagram. Promotion between any two types is given
> > > by
> > > their
> > > join on this lattice. Only the types of participating arrays
> > > matter,
> > > not
> > > their values. Dashed lines indicate that behaviour for Python
> > > scalars
> > > is
> > > undefined on overflow. Boolean, integer and floating-point dtypes
> > > are
> > > not
> > > connected, indicating mixed-kind promotion is undefined.*
> > >
> > > The most important difference between the casting rules in NumPy
> > > and
> > > in the
> > > array API standard is how scalars and 0-dimensional arrays are
> > > handled. In
> > > the standard, array scalars do not exist and 0-dimensional arrays
> > > follow the
> > > same casting rules as higher-dimensional arrays.
> > >
> > > See the `Type Promotion Rules section of the array API standard <
> > >  
> > > https://data-apis.github.io/array-api/latest/API_specification/type_promotion.html
> > > > `__
> > > for more details.
> > >
> > > .. note::
> > >
> > >     It is not clear what the best way is to support the different
> > > casting
> > > rules
> > >     for 0-dimensional arrays and no value-based casting. One
> > > option
> > > may be
> > > to
> > >     implement this second set of casting rules, keep them
> > > private,
> > > mark the
> > >     array API functions with a private attribute that says they
> > > adhere to
> > >     these different rules, and let the casting machinery check
> > > whether for
> > >     that attribute.
> > >
> > >     This needs discussion.
> > >
> > >
> > > Indexing
> > > ~~~~~~~~
> > >
> > > An indexing expression that would return a scalar with
> > > ``ndarray``,
> > > e.g.
> > > ``arr_2d[0, 0]``, will return a 0-D array with the new array
> > > object.
> > > There
> > > are
> > > several reasons for that: array scalars are largely considered a
> > > design
> > > mistake
> > > which no other array library copied; it works better for non-CPU
> > > libraries
> > > (typically arrays can live on the device, scalars live on the
> > > host);
> > > and
> > > it's
> > > simply a consistent design. To get a Python scalar out of a 0-D
> > > array, one
> > > can
> > > simply use the builtin for the type, e.g. ``float(arr_0d)``.
> > >
> > > The other `indexing modes in the standard <
> > >  
> > > https://data-apis.github.io/array-api/latest/API_specification/indexing.html
> > > > `__
> > > do work largely the same as they do for ``numpy.ndarray``. One
> > > noteworthy
> > > difference is that clipping in slice indexing (e.g., ``a[:n]``
> > > where
> > > ``n``
> > > is
> > > larger than the size of the first axis) is unspecified behaviour,
> > > because
> > > that kind of check can be expensive on accelerators.
> > >
> > > The lack of advanced indexing, and boolean indexing being limited
> > > to
> > > a
> > > single
> > > n-D boolean array, is due to those indexing modes not being
> > > suitable
> > > for all
> > > types of arrays or JIT compilation. Their absence does not seem
> > > to be
> > > problematic; if a user or library author wants to use them, they
> > > can
> > > do so
> > > through zero-copy conversion to ``numpy.ndarray``. This will
> > > signal
> > > correctly
> > > to whomever reads the code that it is then NumPy-specific rather
> > > than
> > > portable
> > > to all conforming array types.
> > >
> > >
> > >
> > > The array object
> > > ~~~~~~~~~~~~~~~~
> > >
> > > The array object in the standard does not have methods other than
> > > dunder
> > > methods. The rationale for that is that not all array libraries
> > > have
> > > methods
> > > on their array object (e.g., TensorFlow does not). It also
> > > provides
> > > only a
> > > single way of doing something, rather than have functions and
> > > methods
> > > that
> > > are effectively duplicate.
> > >
> > > Mixing operations that may produce views (e.g., indexing,
> > > ``nonzero``)
> > > in combination with mutation (e.g., item or slice assignment) is
> > > `explicitly documented in the standard to not be supported <
> > >  
> > > https://data-apis.github.io/array-api/latest/design_topics/copies_views_and_mutation.html
> > > > `__.
> > > This cannot easily be prohibited in the array object itself;
> > > instead
> > > this
> > > will
> > > be guidance to the user via documentation.
> > >
> > > The standard current does not prescribe a name for the array
> > > object
> > > itself.
> > > We propose to simply name it ``ndarray``. This is the most
> > > obvious
> > > name, and
> > > because of the separate namespace should not clash with
> > > ``numpy.ndarray``.
> > >
> > >
> > > Implementation
> > > --------------
> > >
> > > .. note::
> > >
> > >     This section needs a lot more detail, which will gradually be
> > > added when
> > >     the implementation progresses.
> > >
> > > A prototype of the ``array_api`` namespace can be found in
> > >  
> > > https://github.com/data-apis/numpy/tree/array-api/numpy/_array_api
> > > .
> > > The docstring in its ``__init__.py`` has notes on completeness of
> > > the
> > > implementation. The code for the wrapper functions also contains
> > > ``#
> > > Note:``
> > > comments everywhere there is a difference with the NumPy API.
> > > Two important parts that are not implemented yet are the new
> > > array
> > > object
> > > and
> > > DLPack support. Functions may need changes to ensure the changed
> > > casting
> > > rules
> > > are respected.
> > >
> > > The array object
> > > ~~~~~~~~~~~~~~~~
> > >
> > > Regarding the array object implementation, we plan to start with
> > > a
> > > regular
> > > Python class that wraps a ``numpy.ndarray`` instance. Attributes
> > > and
> > > methods
> > > can forward to that wrapped instance, applying input validation
> > > and
> > > implementing changed behaviour as needed.
> > >
> > > The casting rules are probably the most challenging part. The in-
> > > progress
> > > dtype system refactor (NEPs 40-43) should make implementing the
> > > correct
> > > casting
> > > behaviour easier - it is already moving away from value-based
> > > casting
> > > for
> > > example.
> > >
> > >
> > > The dtype objects
> > > ~~~~~~~~~~~~~~~~~
> > >
> > > We must be able to compare dtypes for equality, and expressions
> > > like
> > > these
> > > must
> > > be possible::
> > >
> > >     np.array_api.some_func(..., dtype=x.dtype)
> > >
> > > The above implies it would be nice to have ``np.array_api.float32
> > > ==
> > > np.array_api.ndarray(...).dtype``.
> > >
> > > Dtypes should not be assumed to have a class hierarchy by users,
> > > however we
> > > are
> > > free to implement it with a class hierarchy if that's convenient.
> > > We
> > > considered
> > > the following options to implement dtype objects:
> > >
> > > 1. Alias dtypes to those in the main namespace. E.g.,
> > > ``np.array_api.float32 =
> > >    np.float32``.
> > > 2. Make the dtypes instances of ``np.dtype``. E.g.,
> > > ``np.array_api.float32 =
> > >    np.dtype(np.float32)``.
> > > 3. Create new singleton classes with only the required
> > > methods/attributes
> > >    (currently just ``__eq__``).
> > >
> > > It seems like (2) would be easiest from the perspective of
> > > interacting with
> > > functions outside the main namespace. And (3) would adhere best
> > > to
> > > the
> > > standard.
> > >
> > > TBD: the standard does not yet have a good way to inspect
> > > properties
> > > of a
> > > dtype, to ask questions like "is this an integer dtype?". Perhaps
> > > this is
> > > easy
> > > enough to do for users, like so::
> > >
> > >     def _get_dtype(dt_or_arr):
> > >         return dt_or_arr.dtype if hasattr(dt_or_arr, 'dtype')
> > > else
> > > dt_or_arr
> > >
> > >     def is_floating(dtype_or_array):
> > >         dtype = _get_dtype(dtype_or_array)
> > >         return dtype in (float32, float64)
> > >
> > >     def is_integer(dtype_or_array):
> > >         dtype = _get_dtype(dtype_or_array)
> > >         return dtype in (uint8, uint16, uint32, uint64, int8,
> > > int16,
> > > int32,
> > > int64)
> > >
> > > However it could make sense to add to the standard. Note that
> > > NumPy
> > > itself
> > > currently does not have a great for asking such questions, see
> > > `gh-17325 <https://github.com/numpy/numpy/issues/17325>`__.
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > [hidden email]
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > [hidden email]
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: NEP: array API standard adoption (NEP 47)

ralfgommers
In reply to this post by Sebastian Berg


On Wed, Mar 10, 2021 at 6:41 PM Sebastian Berg <[hidden email]> wrote:
Top Posting, to discuss post specific questions about NEP 47 and
partially the start on implementing it in:

    https://github.com/numpy/numpy/pull/18585

There are probably many more that will crop up. But for me, each of
these is a pretty major difficulty without a clear answer as of now.

All great questions, that Sebastian. Let me reply to the questions that Aaron didn't reply to inline below.


1. I still need clarity how a library is supposed to use this namespace
when the user passes in a NumPy array (mentioned before).  The user
must get back a NumPy array after all.  Maybe that is just a decorator,
but it seems important.

I agree that it will be a common pattern that libraries will accept all standard-compliant array types plus numpy.ndarray. And the output array type should match the input type. In Aaron's implementation the new array object has a numpy.ndarray as private attribute, so that's the instance that should be returned. A decorator seems like a sensible way to handle that. Or a simple utility function, something like `return correct_arraytype(out)`.

Either way, that pattern should be added to NEP 47. I don't see a fundamental problem here, we just need to find the nicest UX for it.

3. For all other function, the same problem applies. You don't actually
have anything to fix NumPy promotion rules.  You could bake your own
cake here for numeric types, but I am not sure, you might also need NEP
43 in all its promotion power to pull it off.

This is probably the single most difficult question implementation-wise. Note that there are only numerical dtypes (plus boolean), so dealing with string, datetime, object or third-party dtypes is a non-issue.

4. The PR makes no attempt at handling binary operators in any way
aside from greedily coercing the other operand.

Agreed. This is the same point as (3) I think - how to handle dtype promotion is the main open question.


5. What happens with a mix of array-likes or even array subclasses like
`astropy.quantity`?

Array-likes (e.g. list) should raise an exception, the NEP clearly says "do not accept array_like dtypes". This is what every other array/tensor library already does.

Array subclasses should work as expected, assuming they're valid subclasses and not things like np.matrix. Using Mypy will help avoid writing more subclasses that break the Liskov substitution principle. More comments in https://numpy.org/neps/nep-0047-array-api-standard.html#the-asarray-asanyarray-pattern

Mixing two different types of arrays into a single function call should raise an exception. A design goal is: enable writing functions `somefunc(x1, x2)` that work for any type of array where `x1, x2` come from the same library = so they're either the same type, or two types for which the library itself knows how to mix them. If x1 and x2 are from different libraries, this will raise an exception.

To be clear, it is not intended that `np.array_api.somefunc(x_cupy)` works - this will raise an exception.

Cheers,
Ralf




I don't think we have to figure out everything up-front, but I do think
there are a few very fundamental questions still open, at least for me
personally.

Cheers,

Sebastian



_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: NEP: array API standard adoption (NEP 47)

ralfgommers
In reply to this post by Sebastian Berg


On Wed, Mar 10, 2021 at 11:41 PM Sebastian Berg <[hidden email]> wrote:
On Wed, 2021-03-10 at 13:44 -0700, Aaron Meurer wrote:
> On Wed, Mar 10, 2021 at 10:42 AM Sebastian Berg
> <[hidden email]> wrote:

> >
> > 2. `np.result_type` special cases array-scalars (the current PR),
> > NEP
> > 47 promises it will not.  The PR could attempt to work around that
> > using `arr.dtype` int `result_type`, I expect there are more
> > details to
> > fight with there, but I am not sure.
>
> The idea is to work around it everywhere, so that it follows the
> rules
> in the spec (no array scalars, no value-based casting). I haven't
> started it yet, though, so I don't know yet how hard it will be. If
> it
> ends up being too hard we could put it in the same camp as device
> support and dlpack support where it needs some basic implementation
> in
> numpy itself first before we can properly do it in the array API
> namespace.

Quite frankly. If you really want to implement a minimal API, it may be
best to just write it yourself and ditch NumPy. (Of course I currently
doubt that the NEP 47 implementation should be minimal.)

I'm not really sure what to say other than that I don't think anyone will be served by "ditching NumPy".

The goal for this "minimal" part is to provide an API that you can write code against that will work portably across other array libraries. That seems like a valuable goal, right? And if you want NumPy-specific things that other libraries don't commonly (or at all) implement and are not supported by array_api, then you don't use this API but the existing main numpy namespace.


About doing promotion yourself  ("promotion" as in what ufuncs do; I
call `np.result_type` "common DType", because it is used e.g. in
`concatenate`):

Ufuncs have at least one more rule for true-division, plus there may be
mixed float-int loops, etc.  Since the standard is very limited and you
only have numeric dtypes that might be all though.

In any case, my point is: If NumPy does strange things (and it does
with 0-D arrays currently).  You could cook your own soup there also,
and implement it in NumPy by using `signature=...` in the ufunc call.

Interesting idea.


> > 4. Now that I looked at the above, I do not feel its reasonable to
> > limit this functionality to numeric dtypes.  If someone uses a
> > NumPy
> > rational-dtype, why should a SciPy function currently implemented
> > in
> > pure NumPy reject that?  In other words, I think this is the point
> > where trying to be "minimal" is counterproductive.

SciPy would still be free to implement *both* a portable code path and a numpy-specific path (if that makes sense, which I doubt in many cases). There's just no way those two code paths can be 100% common, because no other library implements a rational dtype.

>
> The idea of minimality is to make it so users can be sure they will
> be
> able to use other libraries, once they also have array API compliant
> namespaces. A rational-dtype wouldn't ever be implemented in those
> other libraries, because it isn't part of the standard, so if a user
> is using those, that is a sign they are using things that aren't in
> the array API, so they can't expect to be able to swap out their
> dtypes. If a user wants to use something that's only in NumPy, then
> they should just use NumPy.
>

This is not about the "user", in your scenario the end-user does use
NumPy.  The way I understand this is not a prerequisite.  If it is, a
lot of things will be simpler though, and most of my doubts will go
away (but be replaced with uncertainty about the usefulness).

The problem is that SciPy as the "library author" wants to to use NEP
47 without limiting the end-user (or the end-user even noticing!).
The distinction between end-user and library author (someone who writes
a function that should work with numpy, pytorch, etc.) is very
important here and too all of these "protocol" discussions.

The example feels a little forced. >99% of end user code written against libraries like SciPy uses standard numerical dtypes. Things like a rational dtype are very niche. A rationale dtype works with most NumPy functions, but is not at all guaranteed to work with SciPy functions - and if it does it's accidental, untested and may break if SciPy would change its implementation (e.g. move from pure Python + NumPy to Cython or C++).



I assume that SciPy should be able to have the cake and eat it to:

* Uses the limited array-api and make sure to only rely on the minimal
  subset.
* Not artificially limit end-users who pass in NumPy arrays.

The second point can also be read as: SciPy would be able to support
practically all current NumPy array use cases without jumping through
any additional hoops (or well, maybe a bit of churn, but churn that is
made easy by as of now undefined API).

I suspect you have things in mind that are not actually supported by SciPy today. The rational dtype is one example, but so are ndarray subclasses. Take masked arrays as an example - these are not supported today, except for scipy.stats.mstats functionality - where support is intentional, special-cases and tested.

For masked arrays as well as other arbitrary fancy subclasses, there's some not-well-defined subset of functionality that may work today, but that is fragile, untested and can break without warning in any release. Only Liskov-substitutable ndarray subclasses are not fragile - those are simply coerced to ndarray via the ubiquitous `np.asarray` pattern, and ndarrays are returned. That must and will remain working.

This is a complex topic, and it's possible that I'm missing other use cases you have in mind, so I thought I'd make a diagram to explain the difference between the custom dtypes & subclasses that are supported by NumPy itself but not by downstream libraries:



> > 6. Is there any provision on how to deal with mixed array-like
> > inputs?
> > CuPy+numpy, etc.?
>
> Neither of these are defined in the spec. The spec only deals with
> staying inside of the compliant namespace. It doesn't require any
> behavior mixing things from other namespaces. That's generally
> considered a much harder problem, and there is the data interchange
> protocol to deal with it
> (
> https://data-apis.github.io/array-api/latest/design_topics/data_interchange.html
> ).
>

OK, maybe you can get away with it, since the current proposal seems to
be that `get_namespace()` raises on mixed input. Still seems like
something that should probably raise an error rather than coerce to
NumPy when calling: `nep47_array_object + dask_array`.

Agreed, this must raise too.

Cheers,
Ralf


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: NEP: array API standard adoption (NEP 47)

Sebastian Berg
In reply to this post by ralfgommers
On Thu, 2021-03-11 at 12:37 +0100, Ralf Gommers wrote:

> On Wed, Mar 10, 2021 at 6:41 PM Sebastian Berg <
> [hidden email]>
> wrote:
>
> > Top Posting, to discuss post specific questions about NEP 47 and
> > partially the start on implementing it in:
> >
> >     https://github.com/numpy/numpy/pull/18585
> >
> > There are probably many more that will crop up. But for me, each of
> > these is a pretty major difficulty without a clear answer as of
> > now.
> >
>
> All great questions, that Sebastian. Let me reply to the questions
> that
> Aaron didn't reply to inline below.
>
To be clear, I do not expect complete answers to these questions right
now.  (Although being unsure about some of them does make me slightly
reluctant to merge the work-in-progress into NumPy proper as opposed to
a separate repo.)

Also, yes, most/all questions are hopefully are just trivialities to
check of (or no more than seeds for thought).  Or even just a starting
point for making NEP 47's "Usage and Impact" section more complete
including them as either "example usage patterns" or "limitations".


My second takeaway from the questions is that I have doubts the
"minimal" version will pan out, it feels like many of the questions
might disappear if you drop that part. So, from my current thinking,
the minimal implementation may not be a good "NEP 47" implementation.

That does _not_ mean that I think you should pause and reconsider or
even worry about pleasing me with good answers!  Just continue under
whatever assumption you prefer and if it turns out that "minimal" won't
work for NEP 47: no harm done!  We need a "minimal implementation" in
any case.

Cheers,

Sebastian



[1] If SciPy needs an additional NumPy code path to keep support
`object` arrays or other dtypes – right now even complex –, then the
reader needs to be aware of that to make a decision if NEP 47 will
actually help for their library.
Will AstroPy have to reimplement `astropy.units.Quantity` to be
"standard conform" (is that even possible!?) before it can easily adopt
it any of its API that currently works with `astropy.units.Quantity`?


>
> > 1. I still need clarity how a library is supposed to use this
> > namespace
> > when the user passes in a NumPy array (mentioned before).  The user
> > must get back a NumPy array after all.  Maybe that is just a
> > decorator,
> > but it seems important.
> >
>
> I agree that it will be a common pattern that libraries will accept
> all
> standard-compliant array types plus numpy.ndarray. And the output
> array
> type should match the input type. In Aaron's implementation the new
> array
> object has a numpy.ndarray as private attribute, so that's the
> instance
> that should be returned. A decorator seems like a sensible way to
> handle
> that. Or a simple utility function, something like `return
> correct_arraytype(out)`.
>
> Either way, that pattern should be added to NEP 47. I don't see a
> fundamental problem here, we just need to find the nicest UX for it.
>
> 3. For all other function, the same problem applies. You don't
> actually
> > have anything to fix NumPy promotion rules.  You could bake your
> > own
> > cake here for numeric types, but I am not sure, you might also need
> > NEP
> > 43 in all its promotion power to pull it off.
> >
>
> This is probably the single most difficult question implementation-
> wise.
> Note that there are only numerical dtypes (plus boolean), so dealing
> with
> string, datetime, object or third-party dtypes is a non-issue.
>
> 4. The PR makes no attempt at handling binary operators in any way
> > aside from greedily coercing the other operand.
> >
>
> Agreed. This is the same point as (3) I think - how to handle dtype
> promotion is the main open question.
>
>
> > 5. What happens with a mix of array-likes or even array subclasses
> > like
> > `astropy.quantity`?
> >
>
> Array-likes (e.g. list) should raise an exception, the NEP clearly
> says "do
> not accept array_like dtypes". This is what every other array/tensor
> library already does.
>
> Array subclasses should work as expected, assuming they're valid
> subclasses
> and not things like np.matrix. Using Mypy will help avoid writing
> more
> subclasses that break the Liskov substitution principle. More
> comments in
> https://numpy.org/neps/nep-0047-array-api-standard.html#the-asarray-asanyarray-pattern
>
> Mixing two different types of arrays into a single function call
> should
> raise an exception. A design goal is: enable writing functions
> `somefunc(x1, x2)` that work for any type of array where `x1, x2`
> come from
> the same library = so they're either the same type, or two types for
> which
> the library itself knows how to mix them. If x1 and x2 are from
> different
> libraries, this will raise an exception.
>
> To be clear, it is not intended that `np.array_api.somefunc(x_cupy)`
> works
> - this will raise an exception.
>
> Cheers,
> Ralf
>
>
>
> >
> > I don't think we have to figure out everything up-front, but I do
> > think
> > there are a few very fundamental questions still open, at least for
> > me
> > personally.
> >
> > Cheers,
> >
> > Sebastian
> >
> >
> >
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: NEP: array API standard adoption (NEP 47)

ralfgommers


On Thu, Mar 11, 2021 at 6:08 PM Sebastian Berg <[hidden email]> wrote:
On Thu, 2021-03-11 at 12:37 +0100, Ralf Gommers wrote:
> On Wed, Mar 10, 2021 at 6:41 PM Sebastian Berg <
> [hidden email]>
> wrote:
>
> > Top Posting, to discuss post specific questions about NEP 47 and
> > partially the start on implementing it in:
> >
> >     https://github.com/numpy/numpy/pull/18585
> >
> > There are probably many more that will crop up. But for me, each of
> > these is a pretty major difficulty without a clear answer as of
> > now.
> >
>
> All great questions, that Sebastian. Let me reply to the questions
> that
> Aaron didn't reply to inline below.
>

To be clear, I do not expect complete answers to these questions right
now.  (Although being unsure about some of them does make me slightly
reluctant to merge the work-in-progress into NumPy proper as opposed to
a separate repo.)

Also, yes, most/all questions are hopefully are just trivialities to
check of (or no more than seeds for thought).  Or even just a starting
point for making NEP 47's "Usage and Impact" section more complete
including them as either "example usage patterns" or "limitations".

Yes, those are always good to have more of.



My second takeaway from the questions is that I have doubts the
"minimal" version will pan out, it feels like many of the questions
might disappear if you drop that part.

My impression is that a strictly compliant (or "minimal") version is *more* useful than something that's a mix between portable and non-portable functionality. The reason to add more than the minimum required functionality would be that it's too hard to hide the numpy-specific extras. E.g., if we'd do `np.array_api.int32 = np.int32` then that dtype would have methods and behavior that's NumPy-specific. But it'd be hard to hide, so we'd accept it.

It's maybe easier to discuss in a call, I've put it on the community meeting agenda.

 
So, from my current thinking,
the minimal implementation may not be a good "NEP 47" implementation.

That does _not_ mean that I think you should pause and reconsider or
even worry about pleasing me with good answers!  Just continue under
whatever assumption you prefer and if it turns out that "minimal" won't
work for NEP 47: no harm done!  We need a "minimal implementation" in
any case.

Yes, I agree.



Cheers,

Sebastian



[1] If SciPy needs an additional NumPy code path to keep support
`object` arrays or other dtypes – right now even complex –, then the
reader needs to be aware of that to make a decision if NEP 47 will
actually help for their library.

Clearly. This is why we'd like to have some WIP PRs for other libraries, actual code to review will be more helpful than only a proposal.

 
Will AstroPy have to reimplement `astropy.units.Quantity` to be
"standard conform" (is that even possible!?) before it can easily adopt
it any of its API that currently works with `astropy.units.Quantity`?

I'm not sure if the question is well-defined, so let me answer both cases:

1. If the APIs in question require units, then there's no other array/tensor types that have unit support, so those APIs accept *only* Quantity. Adopting the standard isn't possible.

2. If the units are unnecessary/optional, then Quantity is not special and can be treated exactly the same as a `numpy.ndarray`. We don't intend to make any changes to how ndarray subclasses work, so if ndarray works with that API after adoption of the standard then Quantity works too.

Cheers,
Ralf




>
> > 1. I still need clarity how a library is supposed to use this
> > namespace
> > when the user passes in a NumPy array (mentioned before).  The user
> > must get back a NumPy array after all.  Maybe that is just a
> > decorator,
> > but it seems important.
> >
>
> I agree that it will be a common pattern that libraries will accept
> all
> standard-compliant array types plus numpy.ndarray. And the output
> array
> type should match the input type. In Aaron's implementation the new
> array
> object has a numpy.ndarray as private attribute, so that's the
> instance
> that should be returned. A decorator seems like a sensible way to
> handle
> that. Or a simple utility function, something like `return
> correct_arraytype(out)`.
>
> Either way, that pattern should be added to NEP 47. I don't see a
> fundamental problem here, we just need to find the nicest UX for it.
>
> 3. For all other function, the same problem applies. You don't
> actually
> > have anything to fix NumPy promotion rules.  You could bake your
> > own
> > cake here for numeric types, but I am not sure, you might also need
> > NEP
> > 43 in all its promotion power to pull it off.
> >
>
> This is probably the single most difficult question implementation-
> wise.
> Note that there are only numerical dtypes (plus boolean), so dealing
> with
> string, datetime, object or third-party dtypes is a non-issue.
>
> 4. The PR makes no attempt at handling binary operators in any way
> > aside from greedily coercing the other operand.
> >
>
> Agreed. This is the same point as (3) I think - how to handle dtype
> promotion is the main open question.
>
>
> > 5. What happens with a mix of array-likes or even array subclasses
> > like
> > `astropy.quantity`?
> >
>
> Array-likes (e.g. list) should raise an exception, the NEP clearly
> says "do
> not accept array_like dtypes". This is what every other array/tensor
> library already does.
>
> Array subclasses should work as expected, assuming they're valid
> subclasses
> and not things like np.matrix. Using Mypy will help avoid writing
> more
> subclasses that break the Liskov substitution principle. More
> comments in
> https://numpy.org/neps/nep-0047-array-api-standard.html#the-asarray-asanyarray-pattern
>
> Mixing two different types of arrays into a single function call
> should
> raise an exception. A design goal is: enable writing functions
> `somefunc(x1, x2)` that work for any type of array where `x1, x2`
> come from
> the same library = so they're either the same type, or two types for
> which
> the library itself knows how to mix them. If x1 and x2 are from
> different
> libraries, this will raise an exception.
>
> To be clear, it is not intended that `np.array_api.somefunc(x_cupy)`
> works
> - this will raise an exception.
>
> Cheers,
> Ralf
>
>
>
> >
> > I don't think we have to figure out everything up-front, but I do
> > think
> > there are a few very fundamental questions still open, at least for
> > me
> > personally.
> >
> > Cheers,
> >
> > Sebastian
> >
> >
> >
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion