After much discussion (and the addition of three new co-authors!), I’m pleased to present a significantly revision of NumPy Enhancement Proposal 18: A dispatch mechanism for NumPy's high level array functions:

The full text is also included below.

Best,

Stephan

===========================================================

A dispatch mechanism for NumPy's high level array functions

===========================================================

:Status: Draft

:Type: Standards Track

:Created: 2018-05-29

Abstact

-------

We propose the ``__array_function__`` protocol, to allow arguments of NumPy

functions to define how that function operates on them. This will allow

using NumPy as a high level API for efficient multi-dimensional array

operations, even with array implementations that differ greatly from

``numpy.ndarray``.

Detailed description

--------------------

NumPy's high level ndarray API has been implemented several times

outside of NumPy itself for different architectures, such as for GPU

arrays (CuPy), Sparse arrays (scipy.sparse, pydata/sparse) and parallel

arrays (Dask array) as well as various NumPy-like implementations in the

deep learning frameworks, like TensorFlow and PyTorch.

Similarly there are many projects that build on top of the NumPy API

for labeled and indexed arrays (XArray), automatic differentiation

(Autograd, Tangent), masked arrays (

numpy.ma), physical units (astropy.units,

pint, unyt), etc. that add additional functionality on top of the NumPy API.

Most of these project also implement a close variation of NumPy's level high

API.

We would like to be able to use these libraries together, for example we

would like to be able to place a CuPy array within XArray, or perform

automatic differentiation on Dask array code. This would be easier to

accomplish if code written for NumPy ndarrays could also be used by

other NumPy-like projects.

For example, we would like for the following code example to work

equally well with any NumPy-like array object:

.. code:: python

def f(x):

y = np.tensordot(x, x.T)

return np.mean(np.exp(y))

Some of this is possible today with various protocol mechanisms within

NumPy.

- The ``np.exp`` function checks the ``__array_ufunc__`` protocol

- The ``.T`` method works using Python's method dispatch

- The ``np.mean`` function explicitly checks for a ``.mean`` method on

the argument

However other functions, like ``np.tensordot`` do not dispatch, and

instead are likely to coerce to a NumPy array (using the ``__array__``)

protocol, or err outright. To achieve enough coverage of the NumPy API

to support downstream projects like XArray and autograd we want to

support *almost all* functions within NumPy, which calls for a more

reaching protocol than just ``__array_ufunc__``. We would like a

protocol that allows arguments of a NumPy function to take control and

divert execution to another function (for example a GPU or parallel

implementation) in a way that is safe and consistent across projects.

Implementation

--------------

We propose adding support for a new protocol in NumPy,

``__array_function__``.

This protocol is intended to be a catch-all for NumPy functionality that

is not covered by the ``__array_ufunc__`` protocol for universal functions

(like ``np.exp``). The semantics are very similar to ``__array_ufunc__``, except

the operation is specified by an arbitrary callable object rather than a ufunc

instance and method.

A prototype implementation can be found in

The interface

~~~~~~~~~~~~~

We propose the following signature for implementations of

``__array_function__``:

.. code-block:: python

def __array_function__(self, func, types, args, kwargs)

- ``func`` is an arbitrary callable exposed by NumPy's public API,

which was called in the form ``func(*args, **kwargs)``.

- ``types`` is a ``frozenset`` of unique argument types from the original NumPy

function call that implement ``__array_function__``.

- The tuple ``args`` and dict ``kwargs`` are directly passed on from the

original call.

Unlike ``__array_ufunc__``, there are no high-level guarantees about the

type of ``func``, or about which of ``args`` and ``kwargs`` may contain objects

implementing the array API.

As a convenience for ``__array_function__`` implementors, ``types`` provides all

argument types with an ``'__array_function__'`` attribute. This

allows downstream implementations to quickly determine if they are likely able

to support the operation. A ``frozenset`` is used to ensure that

``__array_function__`` implementations cannot rely on the iteration order of

``types``, which would facilitate violating the well-defined "Type casting

hierarchy" described in

Example for a project implementing the NumPy API

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Most implementations of ``__array_function__`` will start with two

checks:

1. Is the given function something that we know how to overload?

2. Are all arguments of a type that we know how to handle?

If these conditions hold, ``__array_function__`` should return

the result from calling its implementation for ``func(*args, **kwargs)``.

Otherwise, it should return the sentinel value ``NotImplemented``, indicating

that the function is not implemented by these types. This is preferable to

raising ``TypeError`` directly, because it gives *other* arguments the

opportunity to define the operations.

There are no general requirements on the return value from

``__array_function__``, although most sensible implementations should probably

return array(s) with the same type as one of the function's arguments.

If/when Python gains

and NumPy adds static type annotations, the ``@overload`` implementation

for ``SupportsArrayFunction`` will indicate a return type of ``Any``.

It may also be convenient to define a custom decorators (``implements`` below)

for registering ``__array_function__`` implementations.

.. code:: python

HANDLED_FUNCTIONS = {}

class MyArray:

def __array_function__(self, func, types, args, kwargs):

if func not in HANDLED_FUNCTIONS:

return NotImplemented

# Note: this allows subclasses that don't override

# __array_function__ to handle MyArray objects

if not all(issubclass(t, MyArray) for t in types):

return NotImplemented

return HANDLED_FUNCTIONS[func](*args, **kwargs)

def implements(numpy_function):

"""Register an __array_function__ implementation for MyArray objects."""

def decorator(func):

HANDLED_FUNCTIONS[numpy_function] = func

return func

return decorator

@implements(np.concatenate)

def concatenate(arrays, axis=0, out=None):

... # implementation of concatenate for MyArray objects

@implements(np.broadcast_to)

def broadcast_to(array, shape):

... # implementation of broadcast_to for MyArray objects

Note that it is not required for ``__array_function__`` implementations to

include *all* of the corresponding NumPy function's optional arguments

(e.g., ``broadcast_to`` above omits the irrelevant ``subok`` argument).

Optional arguments are only passed in to ``__array_function__`` if they

were explicitly used in the NumPy function call.

Necessary changes within the NumPy codebase itself

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This will require two changes within the NumPy codebase:

1. A function to inspect available inputs, look for the

``__array_function__`` attribute on those inputs, and call those

methods appropriately until one succeeds. This needs to be fast in the

common all-NumPy case, and have acceptable performance (no worse than

linear time) even if the number of overloaded inputs is large (e.g.,

as might be the case for `np.concatenate`).

This is one additional function of moderate complexity.

2. Calling this function within all relevant NumPy functions.

This affects many parts of the NumPy codebase, although with very low

complexity.

Finding and calling the right ``__array_function__``

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Given a NumPy function, ``*args`` and ``**kwargs`` inputs, we need to

search through ``*args`` and ``**kwargs`` for all appropriate inputs

that might have the ``__array_function__`` attribute. Then we need to

select among those possible methods and execute the right one.

Negotiating between several possible implementations can be complex.

Finding arguments

'''''''''''''''''

Valid arguments may be directly in the ``*args`` and ``**kwargs``, such

as in the case for ``np.tensordot(left, right, out=out)``, or they may

be nested within lists or dictionaries, such as in the case of

``np.concatenate([x, y, z])``. This can be problematic for two reasons:

1. Some functions are given long lists of values, and traversing them

might be prohibitively expensive.

2. Some functions may have arguments that we don't want to inspect, even

if they have the ``__array_function__`` method.

To resolve these issues, NumPy functions should explicitly indicate which

of their arguments may be overloaded, and how these arguments should be

checked. As a rule, this should include all arguments documented as either

``array_like`` or ``ndarray``.

We propose to do so by writing "dispatcher" functions for each overloaded

NumPy function:

- These functions will be called with the exact same arguments that were passed

into the NumPy function (i.e., ``dispatcher(*args, **kwargs)``), and should

return an iterable of arguments to check for overrides.

- Dispatcher functions are required to share the exact same positional,

optional and keyword-only arguments as their corresponding NumPy functions.

Otherwise, valid invocations of a NumPy function could result in an error when

calling its dispatcher.

- Because default *values* for keyword arguments do not have

``__array_function__`` attributes, by convention we set all default argument

values to ``None``. This reduces the likelihood of signatures falling out

of sync, and minimizes extraneous information in the dispatcher.

The only exception should be cases where the argument value in some way

effects dispatching, which should be rare.

An example of the dispatcher for ``np.concatenate`` may be instructive:

.. code:: python

def _concatenate_dispatcher(arrays, axis=None, out=None):

for array in arrays:

yield array

if out is not None:

yield out

The concatenate dispatcher is written as generator function, which allows it

to potentially include the value of the optional ``out`` argument without

needing to create a new sequence with the (potentially long) list of objects

to be concatenated.

Trying ``__array_function__`` methods until the right one works

'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

Many arguments may implement the ``__array_function__`` protocol. Some

of these may decide that, given the available inputs, they are unable to

determine the correct result. How do we call the right one? If several

are valid then which has precedence?

For the most part, the rules for dispatch with ``__array_function__``

match those for ``__array_ufunc__`` (see

In particular:

- NumPy will gather implementations of ``__array_function__`` from all

specified inputs and call them in order: subclasses before

superclasses, and otherwise left to right. Note that in some edge cases

involving subclasses, this differs slightly from the

- Implementations of ``__array_function__`` indicate that they can

handle the operation by returning any value other than

``NotImplemented``.

- If all ``__array_function__`` methods return ``NotImplemented``,

NumPy will raise ``TypeError``.

One deviation from the current behavior of ``__array_ufunc__`` is that NumPy

will only call ``__array_function__`` on the *first* argument of each unique

type. This matches Python's

and this ensures that checking overloads has acceptable performance even when

there are a large number of overloaded arguments. To avoid long-term divergence

between these two dispatch protocols, we should

``__array_ufunc__`` to match this behavior.

Special handling of ``numpy.ndarray``

'''''''''''''''''''''''''''''''''''''

The use cases for subclasses with ``__array_function__`` are the same as those

with ``__array_ufunc__``, so ``numpy.ndarray`` should also define a

``__array_function__`` method mirroring ``ndarray.__array_ufunc__``:

.. code:: python

def __array_function__(self, func, types, args, kwargs):

# Cannot handle items that have __array_function__ other than our own.

for t in types:

if (hasattr(t, '__array_function__') and

t.__array_function__ is not ndarray.__array_function__):

return NotImplemented

# Arguments contain no overrides, so we can safely call the

# overloaded function again.

return func(*args, **kwargs)

To avoid infinite recursion, the dispatch rules for ``__array_function__`` need

also the same special case they have for ``__array_ufunc__``: any arguments with

an ``__array_function__`` method that is identical to

``numpy.ndarray.__array_function__`` are not be called as

``__array_function__`` implementations.

Changes within NumPy functions

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Given a function defining the above behavior, for now call it

``try_array_function_override``, we now need to call that function from

within every relevant NumPy function. This is a pervasive change, but of

fairly simple and innocuous code that should complete quickly and

without effect if no arguments implement the ``__array_function__``

protocol.

In most cases, these functions should written using the

``array_function_dispatch`` decorator, which also associates dispatcher

functions:

.. code:: python

def array_function_dispatch(dispatcher):

"""Wrap a function for dispatch with the __array_function__ protocol."""

def decorator(func):

@functools.wraps(func)

def new_func(*args, **kwargs):

relevant_arguments = dispatcher(*args, **kwargs)

success, value = try_array_function_override(

new_func, relevant_arguments, args, kwargs)

if success:

return value

return func(*args, **kwargs)

return new_func

return decorator

# example usage

def _broadcast_to_dispatcher(array, shape, subok=None, **ignored_kwargs):

return (array,)

@array_function_dispatch(_broadcast_to_dispatcher)

def broadcast_to(array, shape, subok=False):

... # existing definition of np.broadcast_to

Using a decorator is great! We don't need to change the definitions of

existing NumPy functions, and only need to write a few additional lines

for the dispatcher function. We could even reuse a single dispatcher for

families of functions with the same signature (e.g., ``sum`` and ``prod``).

For such functions, the largest change could be adding a few lines to the

docstring to note which arguments are checked for overloads.

It's particularly worth calling out the decorator's use of

``functools.wraps``:

- This ensures that the wrapped function has the same name and docstring as

the wrapped NumPy function.

- On Python 3, it also ensures that the decorator function copies the original

function signature, which is important for introspection based tools such as

auto-complete. If we care about preserving function signatures on Python 2,

that NumPy supports Python 2.7, we do could do so by adding a vendored

dependency on the (single-file, BSD licensed)

- Finally, it ensures that the wrapped function

In a few cases, it would not make sense to use the ``array_function_dispatch``

decorator directly, but override implementation in terms of

``try_array_function_override`` should still be straightforward.

- Functions written entirely in C (e.g., ``np.concatenate``) can't use

decorators, but they could still use a C equivalent of

``try_array_function_override``. If performance is not a concern, they could

also be easily wrapped with a small Python wrapper.

- The ``__call__`` method of ``np.vectorize`` can't be decorated with

<p style="margin:0px;font-stretch:normal;font-size:17.4px;line-

_______________________________________________

NumPy-Discussion mailing list

[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion