
123

Hello all,
It was recently brought to my attention that my mails to NumPydiscussion were probably going into the spam folder for many people, so here I am trying from another email. Probably Google trying
to force people onto their products as usual. 😉
Me, Ralf Gommers and Peter Bell (both cc’d) have come up with a proposal on how to solve the array creation and duck array problems. The solution is outlined
in NEP31, currently in the form of a PR, [1]
Following the high level discussion in NEP22. [2]
It would be nice to get some feedback.
Fulltext of the NEP:
============================================================
NEP 31 — Contextlocal and global overrides of the NumPy API
============================================================
:Author: Hameer Abbasi [hidden email]
:Author: Ralf Gommers [hidden email]
:Author: Peter Bell [hidden email]
:Status: Draft
:Type: Standards Track
:Created: 20190822
Abstract

This NEP proposes to make all of NumPy's public API overridable via an
extensible backend mechanism, using a library called ``uarray`` `[1]`_
``uarray`` provides global and contextlocal overrides, as well as a dispatch
mechanism similar to NEP18 `[2]`_. First experiences with ``__array_function__``
show that it is necessary to be able to override NumPy functions that
*do not take an arraylike argument*, and hence aren't overridable via
``__array_function__``. The most pressing need is array creation and coercion
functions  see e.g. NEP30 `[9]`_.
This NEP proposes to allow, in an optin fashion, overriding any part of the NumPy API.
It is intended as a comprehensive resolution to NEP22 `[3]`_, and obviates the need to
add an evergrowing list of new protocols for each new type of function or object that needs
to become overridable.
Motivation and Scope

The motivation behind ``uarray`` is manyfold: First, there have been several attempts to allow
dispatch of parts of the NumPy API, including (most prominently), the ``__array_ufunc__`` protocol
in NEP13 `[4]`_, and the ``__array_function__`` protocol in NEP18 `[2]`_, but this has shown the
need for further protocols to be developed, including a protocol for coercion (see `[5]`_). The reasons
these overrides are needed have been extensively discussed in the references, and this NEP will not
attempt to go into the details of why these are needed. Another pain point requiring yet another
protocol is the duckarray protocol (see `[9]`_).
This NEP takes a more holistic approach: It assumes that there are parts of the API that need to be
overridable, and that these will grow over time. It provides a general framework and a mechanism to
avoid a design of a new protocol each time this is required.
This NEP proposes the following: That ``unumpy`` `[8]`_ becomes the recommended override mechanism
for the parts of the NumPy API not yet covered by ``__array_function__`` or ``__array_ufunc__``,
and that ``uarray`` is vendored into a new namespace within NumPy to give users and downstream dependencies
access to these overrides. This vendoring mechanism is similar to what SciPy decided to do for
making ``scipy.fft`` overridable (see `[10]`_).
Detailed description

**Note:** *This section will not attempt to explain the specifics or the mechanism of ``uarray``,
that is explained in the ``uarray`` documentation.* `[1]`_ *However, the NumPy community
will have input into the design of ``uarray``, and any backwardincompatible changes
will be discussed on the mailing list.*
The way we propose the overrides will be used by end users is::
import numpy.overridable as np
with np.set_backend(backend):
x = np.asarray(my_array, dtype=dtype)
And a library that implements a NumPylike API will use it in the following manner (as an example)::
import numpy.overridable as np
_ua_implementations = {}
__ua_domain__ = "numpy"
def __ua_function__(func, args, kwargs):
fn = _ua_implementations.get(func, None)
return fn(*args, **kwargs) if fn is not None else NotImplemented
def implements(ua_func):
def inner(func):
_ua_implementations[ua_func] = func
return func
return inner
@implements(np.asarray)
def asarray(a, dtype=None, order=None):
# Code here
# Either this method or __ua_convert__ must
# return NotImplemented for unsupported types,
# Or they shouldn't be marked as dispatchable.
# Provides a default implementation for ones and zeros.
@implements(np.full)
def full(shape, fill_value, dtype=None, order='C'):
# Code here
The only change this NEP proposes at its acceptance, is to make ``unumpy`` the officially recommended
way to override NumPy. ``unumpy`` will remain a separate repository/package (which we propose to vendor
to avoid a hard dependency, and use the separate ``unumpy`` package only if it is installed)
rather than depend on for the time being), and will be developed
primarily with the input of duckarray authors and secondarily, custom dtype authors, via the usual
GitHub workflow. There are a few reasons for this:
* Faster iteration in the case of bugs or issues.
* Faster design changes, in the case of needed functionality.
* ``unumpy`` will work with older versions of NumPy as well.
* The user and library author optin to the override process,
rather than breakages happening when it is least expected.
In simple terms, bugs in ``unumpy`` mean that ``numpy`` remains
unaffected.
Advantanges of ``unumpy`` over other solutions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
``unumpy`` offers a number of advantanges over the approach of defining a new protocol for every
problem encountered: Whenever there is something requiring an override, ``unumpy`` will be able to
offer a unified API with very minor changes. For example:
* ``ufunc`` objects can be overridden via their ``__call__``, ``reduce`` and other methods.
* ``dtype`` objects can be overridden via the dispatch/backend mechanism, going as far as to allow
``np.float32`` et. al. to be overridden by overriding ``__get__``.
* Other functions can be overridden in a similar fashion.
* ``np.asduckarray`` goes away, and becomes ``np.asarray`` with a backend set.
* The same holds for array creation functions such as ``np.zeros``, ``np.empty`` and so on.
This also holds for the future: Making something overridable would require only minor changes to ``unumpy``.
Another promise ``unumpy`` holds is one of default implementations. Default implementations can be provided for
any multimethod, in terms of others. This allows one to override a large part of the NumPy API by defining
only a small part of it. This is to ease the creation of new duckarrays, by providing default implementations of many
functions that can be easily expressed in terms of others, as well as a repository of utility functions
that help in the implementation of duckarrays that most duckarrays would require.
The last benefit is a clear way to coerce to a given backend, and a protocol for coercing not only arrays,
but also ``dtype`` objects and ``ufunc`` objects with similar ones from other libraries. This is due to the existence of
actual, third party dtype packages, and their desire to blend into the NumPy ecosystem (see `[6]`_). This is a separate
issue compared to the Clevel dtype redesign proposed in `[7]`_, it's about allowing thirdparty dtype implementations to
work with NumPy, much like thirdparty array implementations.
Mixing NumPy and ``unumpy`` in the same file
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Normally, one would only want to import only one of ``unumpy`` or ``numpy``, you would import it as ``np`` for
familiarity. However, there may be situations where one wishes to mix NumPy and the overrides, and there are
a few ways to do this, depending on the user's style::
import numpy.overridable as unumpy
import numpy as np
or::
import numpy as np
# Use unumpy via np.overridable
Related Work

Previous override mechanisms
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* NEP18, the ``__array_function__`` protocol. `[2]`_
* NEP13, the ``__array_ufunc__`` protocol. `[3]`_
Existing NumPylike array implementations
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* Dask: https://dask.org/
* CuPy: https://cupy.chainer.org/
* PyData/Sparse: https://sparse.pydata.org/
* Xnd: https://xnd.readthedocs.io/
* Astropy's Quantity: https://docs.astropy.org/en/stable/units/
Existing and potential consumers of alternative arrays
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* Dask: https://dask.org/
* scikitlearn: https://scikitlearn.org/
* Xarray: https://xarray.pydata.org/
* TensorLy: http://tensorly.org/
Existing alternate dtype implementations
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* ``ndtypes``: https://ndtypes.readthedocs.io/en/latest/
* Datashape: https://datashape.readthedocs.io
* Plum: https://plumpy.readthedocs.io/
Implementation

The implementation of this NEP will require the following steps:
* Implementation of ``uarray`` multimethods corresponding to the
NumPy API, including classes for overriding ``dtype``, ``ufunc``
and ``array`` objects, in the ``unumpy`` repository.
* Moving backends from ``unumpy`` into the respective array libraries.
Backward compatibility

There are no backward incompatible changes proposed in this NEP.
Alternatives

The current alternative to this problem is NEP30 plus adding more protocols
(not yet specified) in addition to it. Even then, some parts of the NumPy
API will remain nonoverridable, so it's a partial alternative.
The main alternative to vendoring ``unumpy`` is to simply move it into NumPy
completely and not distribute it as a separate package. This would also achieve
the proposed goals, however we prefer to keep it a separate package for now,
for reasons already stated above.
Discussion

* ``uarray`` blogpost: https://labs.quansight.org/blog/2019/07/uarrayupdateapichangesoverheadandcomparisonto__array_function__/
* The discussion section of NEP18: https://numpy.org/neps/nep0018arrayfunctionprotocol.html#discussion
* NEP22: https://numpy.org/neps/nep0022ndarrayducktypingoverview.html
* Dask issue #4462: https://github.com/dask/dask/issues/4462
* PR #13046: https://github.com/numpy/numpy/pull/13046
* Dask issue #4883: https://github.com/dask/dask/issues/4883
* Issue #13831: https://github.com/numpy/numpy/issues/13831
* Discussion PR 1: https://github.com/hameerabbasi/numpy/pull/3
* Discussion PR 2: https://github.com/hameerabbasi/numpy/pull/4
References and Footnotes

.. _[1]:
[1] uarray, A general dispatch mechanism for Python: https://uarray.readthedocs.io
.. _[2]:
[2] NEP 18 — A dispatch mechanism for NumPy’s high level array functions: https://numpy.org/neps/nep0018arrayfunctionprotocol.html
.. _[3]:
[3] NEP 22 — Duck typing for NumPy arrays – high level overview: https://numpy.org/neps/nep0022ndarrayducktypingoverview.html
.. _[4]:
[4] NEP 13 — A Mechanism for Overriding Ufuncs: https://numpy.org/neps/nep0013ufuncoverrides.html
.. _[5]:
[5] Reply to Adding to the nondispatched implementation of NumPy methods: http://numpydiscussion.10968.n7.nabble.com/AddingtothenondispatchedimplementationofNumPymethodstp46816p46874.html
.. _[6]:
[6] Custom Dtype/Units discussion: http://numpydiscussion.10968.n7.nabble.com/CustomDtypeUnitsdiscussiontd43262.html
.. _[7]:
[7] The epic dtype cleanup plan: https://github.com/numpy/numpy/issues/2899
.. _[8]:
[8] unumpy: NumPy, but implementationindependent: https://unumpy.readthedocs.io
.. _[9]:
[9] NEP 30 — Duck Typing for NumPy Arrays  Implementation: https://www.numpy.org/neps/nep0030duckarrayprotocol.html
.. _[10]:
[10] http://scipy.github.io/devdocs/fft.html#backendcontrol
Copyright

This document has been placed in the public domain.
Best regards,
Hameer Abbasi
[1] https://github.com/numpy/numpy/pull/14389
[2] https://numpy.org/neps/nep0022ndarrayducktypingoverview.html
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


On Mon, Sep 2, 2019 at 2:15 AM Hameer Abbasi < [hidden email]> wrote:
> Me, Ralf Gommers and Peter Bell (both cc’d) have come up with a proposal on how to solve the array creation and duck array problems. The solution is outlined in NEP31, currently in the form of a PR, [1]
Thanks for putting this together! It'd be great to have more
engagement between uarray and numpy.
> ============================================================
>
> NEP 31 — Contextlocal and global overrides of the NumPy API
>
> ============================================================
Now that I've read this over, my main feedback is that right now it
seems too vague and highlevel to give it a fair evaluation? The idea
of a NEP is to lay out a problem and proposed solution in enough
detail that it can be evaluated and critiqued, but this felt to me
more like it was pointing at some other documents for all the details
and then promising that uarray has solutions for all our problems.
> This NEP takes a more holistic approach: It assumes that there are parts of the API that need to be
> overridable, and that these will grow over time. It provides a general framework and a mechanism to
> avoid a design of a new protocol each time this is required.
The idea of a holistic approach makes me nervous, because I'm not sure
we have holistic problems. Sometimes a holistic approach is the right
thing; other times it means sweeping the actual problems under the
rug, so things *look* simple and clean but in fact nothing has been
solved, and they just end up biting us later. And from the NEP as
currently written, I can't tell whether this is the good kind of
holistic or the bad kind of holistic.
Now I'm writing vague handwavey things, so let me follow my own advice
and make it more concrete with an example :).
When Stephan and I were writing NEP 22, the single thing we spent the
most time discussing was the problem of duckarray coercion, and in
particular what to do about existing code that does
np.asarray(duck_array_obj).
The reason this is challenging is that there's a lot of code written
in Cython/C/C++ that calls np.asarray, and then blindly casts the
return value to a PyArray struct and starts accessing the raw memory
fields. If np.asarray starts returning anything besides a realactual
np.ndarray object, then this code will start corrupting random memory,
leading to a segfault at best.
Stephan felt strongly that this meant that existing np.asarray calls
*must not* ever return anything besides an np.ndarray object, and
therefore we needed to add a new function np.asduckarray(), or maybe
an explicit optin flag like np.asarray(..., allow_duck_array=True).
I agreed that this was a problem, but thought we might be able to get
away with an "optout" system, where we add an allow_duck_array= flag,
but make it *default* to True, and document that the Cython/C/C++
users who want to work with a raw np.ndarray object should modify
their code to explicitly call np.asarray(obj, allow_duck_array=False).
This would mean that for a while people who tried to pass duckarrays
into legacy library would get segfaults, but there would be a clear
path for fixing these issues as they were discovered.
Either way, there are also some other details to figure out: how does
this affect the C version of asarray? What about np.asfortranarray –
probably that should default to allow_duck_array=False, even if we did
make np.asarray default to allow_duck_array=True, right?
Now if I understand right, your proposal would be to make it so any
code in any package could arbitrarily change the behavior of
np.asarray for all inputs, e.g. I could just decide that
np.asarray([1, 2, 3]) should return some arbitrary nonnp.ndarray
object. It seems like this has a much greater potential for breaking
existing Cython/C/C++ code, and the NEP doesn't currently describe why
this extra power is useful, and it doesn't currently describe how it
plans to mitigate the downsides. (For example, if a caller needs a
real np.ndarray, then is there some way to explicitly request one? The
NEP doesn't say.) Maybe this is all fine and there are solutions to
these issues, but any proposal to address duck array coercion needs to
at least talk about these issues!
And that's just one example... array coercion is a particularly
central and tricky problem, but the numpy API big, and there are
probably other problems like this. For another example, I don't
understand what the NEP is proposing to do about dtypes at all.
That's why I think the NEP needs to be fleshed out a lot more before
it will be possible to evaluate fairly.
n

Nathaniel J. Smith  https://vorpus.org_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


On Mon, Sep 2, 2019 at 2:09 PM Nathaniel Smith < [hidden email]> wrote: On Mon, Sep 2, 2019 at 2:15 AM Hameer Abbasi <[hidden email]> wrote:
> Me, Ralf Gommers and Peter Bell (both cc’d) have come up with a proposal on how to solve the array creation and duck array problems. The solution is outlined in NEP31, currently in the form of a PR, [1]
Thanks for putting this together! It'd be great to have more
engagement between uarray and numpy.
> ============================================================
>
> NEP 31 — Contextlocal and global overrides of the NumPy API
>
> ============================================================
Now that I've read this over, my main feedback is that right now it
seems too vague and highlevel to give it a fair evaluation? The idea
of a NEP is to lay out a problem and proposed solution in enough
detail that it can be evaluated and critiqued, but this felt to me
more like it was pointing at some other documents for all the details
and then promising that uarray has solutions for all our problems.
This is fair enough I think. We'll need to put some more thought in where to refer to other NEPs, and where to be more concrete.
> This NEP takes a more holistic approach: It assumes that there are parts of the API that need to be
> overridable, and that these will grow over time. It provides a general framework and a mechanism to
> avoid a design of a new protocol each time this is required.
The idea of a holistic approach makes me nervous, because I'm not sure
we have holistic problems. Sometimes a holistic approach is the right
thing; other times it means sweeping the actual problems under the
rug, so things *look* simple and clean but in fact nothing has been
solved, and they just end up biting us later. And from the NEP as
currently written, I can't tell whether this is the good kind of
holistic or the bad kind of holistic.
Now I'm writing vague handwavey things, so let me follow my own advice
and make it more concrete with an example :).
When Stephan and I were writing NEP 22, the single thing we spent the
most time discussing was the problem of duckarray coercion, and in
particular what to do about existing code that does
np.asarray(duck_array_obj).
The reason this is challenging is that there's a lot of code written
in Cython/C/C++ that calls np.asarray,
Cython code only perhaps? It would surprise me if there's a lot of C/C++ code that explicitly calls into our Python rather than C API.
and then blindly casts the
return value to a PyArray struct and starts accessing the raw memory
fields. If np.asarray starts returning anything besides a realactual
np.ndarray object, then this code will start corrupting random memory,
leading to a segfault at best.
Stephan felt strongly that this meant that existing np.asarray calls
*must not* ever return anything besides an np.ndarray object, and
therefore we needed to add a new function np.asduckarray(), or maybe
an explicit optin flag like np.asarray(..., allow_duck_array=True).
I agreed that this was a problem, but thought we might be able to get
away with an "optout" system, where we add an allow_duck_array= flag,
but make it *default* to True, and document that the Cython/C/C++
users who want to work with a raw np.ndarray object should modify
their code to explicitly call np.asarray(obj, allow_duck_array=False).
This would mean that for a while people who tried to pass duckarrays
into legacy library would get segfaults, but there would be a clear
path for fixing these issues as they were discovered.
Either way, there are also some other details to figure out: how does
this affect the C version of asarray? What about np.asfortranarray –
probably that should default to allow_duck_array=False, even if we did
make np.asarray default to allow_duck_array=True, right?
Now if I understand right, your proposal would be to make it so any
code in any package could arbitrarily change the behavior of
np.asarray for all inputs, e.g. I could just decide that
np.asarray([1, 2, 3]) should return some arbitrary nonnp.ndarray
object.
No, definitely not! It's all optin, by explicitly importing from `numpy.overridable` or `unumpy`. No behavior of anything in the existing numpy namespaces should be affected in any way.
I agree with the concerns below, hence it should stay optin.
Cheers,
Ralf
It seems like this has a much greater potential for breaking
existing Cython/C/C++ code, and the NEP doesn't currently describe why
this extra power is useful, and it doesn't currently describe how it
plans to mitigate the downsides. (For example, if a caller needs a
real np.ndarray, then is there some way to explicitly request one? The
NEP doesn't say.) Maybe this is all fine and there are solutions to
these issues, but any proposal to address duck array coercion needs to
at least talk about these issues!
And that's just one example... array coercion is a particularly
central and tricky problem, but the numpy API big, and there are
probably other problems like this. For another example, I don't
understand what the NEP is proposing to do about dtypes at all.
That's why I think the NEP needs to be fleshed out a lot more before
it will be possible to evaluate fairly.
n

Nathaniel J. Smith  https://vorpus.org
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


Hi Nathaniel,
On 02.09.19 23:09, Nathaniel Smith wrote:
> On Mon, Sep 2, 2019 at 2:15 AM Hameer Abbasi < [hidden email]> wrote:
>> Me, Ralf Gommers and Peter Bell (both cc’d) have come up with a proposal on how to solve the array creation and duck array problems. The solution is outlined in NEP31, currently in the form of a PR, [1]
> Thanks for putting this together! It'd be great to have more
> engagement between uarray and numpy.
>
>> ============================================================
>>
>> NEP 31 — Contextlocal and global overrides of the NumPy API
>>
>> ============================================================
> Now that I've read this over, my main feedback is that right now it
> seems too vague and highlevel to give it a fair evaluation? The idea
> of a NEP is to lay out a problem and proposed solution in enough
> detail that it can be evaluated and critiqued, but this felt to me
> more like it was pointing at some other documents for all the details
> and then promising that uarray has solutions for all our problems.
>
>> This NEP takes a more holistic approach: It assumes that there are parts of the API that need to be
>> overridable, and that these will grow over time. It provides a general framework and a mechanism to
>> avoid a design of a new protocol each time this is required.
> The idea of a holistic approach makes me nervous, because I'm not sure
> we have holistic problems.
The fact that we're having to design more and more protocols for a lot
of very similar things is, to me, an indicator that we do have holistic
problems that ought to be solved by a single protocol.
> Sometimes a holistic approach is the right
> thing; other times it means sweeping the actual problems under the
> rug, so things *look* simple and clean but in fact nothing has been
> solved, and they just end up biting us later. And from the NEP as
> currently written, I can't tell whether this is the good kind of
> holistic or the bad kind of holistic.
>
> Now I'm writing vague handwavey things, so let me follow my own advice
> and make it more concrete with an example :).
>
> When Stephan and I were writing NEP 22, the single thing we spent the
> most time discussing was the problem of duckarray coercion, and in
> particular what to do about existing code that does
> np.asarray(duck_array_obj).
>
> The reason this is challenging is that there's a lot of code written
> in Cython/C/C++ that calls np.asarray, and then blindly casts the
> return value to a PyArray struct and starts accessing the raw memory
> fields. If np.asarray starts returning anything besides a realactual
> np.ndarray object, then this code will start corrupting random memory,
> leading to a segfault at best.
>
> Stephan felt strongly that this meant that existing np.asarray calls
> *must not* ever return anything besides an np.ndarray object, and
> therefore we needed to add a new function np.asduckarray(), or maybe
> an explicit optin flag like np.asarray(..., allow_duck_array=True).
>
> I agreed that this was a problem, but thought we might be able to get
> away with an "optout" system, where we add an allow_duck_array= flag,
> but make it *default* to True, and document that the Cython/C/C++
> users who want to work with a raw np.ndarray object should modify
> their code to explicitly call np.asarray(obj, allow_duck_array=False).
> This would mean that for a while people who tried to pass duckarrays
> into legacy library would get segfaults, but there would be a clear
> path for fixing these issues as they were discovered.
>
> Either way, there are also some other details to figure out: how does
> this affect the C version of asarray? What about np.asfortranarray –
> probably that should default to allow_duck_array=False, even if we did
> make np.asarray default to allow_duck_array=True, right?
>
> Now if I understand right, your proposal would be to make it so any
> code in any package could arbitrarily change the behavior of
> np.asarray for all inputs, e.g. I could just decide that
> np.asarray([1, 2, 3]) should return some arbitrary nonnp.ndarray
> object. It seems like this has a much greater potential for breaking
> existing Cython/C/C++ code, and the NEP doesn't currently describe why
> this extra power is useful, and it doesn't currently describe how it
> plans to mitigate the downsides. (For example, if a caller needs a
> real np.ndarray, then is there some way to explicitly request one? The
> NEP doesn't say.) Maybe this is all fine and there are solutions to
> these issues, but any proposal to address duck array coercion needs to
> at least talk about these issues!
I believe I addressed this in a previous email, but the NEP doesn't
suggest overriding numpy.asarray or numpy.array. It suggests overriding
numpy.overridable.asarray and numpy.overridable.array, so existing code
will continue to work asis and overrides are optin rather than forced
on you.
The argument about this kind of code could be applied to return values
from other functions as well. That said, there is a way to request a
NumPy array object explicitly:
with ua.set_backend(np):
x = np.asarray(...)
>
> And that's just one example... array coercion is a particularly
> central and tricky problem, but the numpy API big, and there are
> probably other problems like this. For another example, I don't
> understand what the NEP is proposing to do about dtypes at all.
Just as there are other kinds of arrays, there may be other kinds of
dtypes that are not NumPy dtypes. They cannot be attached to a NumPy
array object (as Sebastian pointed out to me in last week's Community
meeting), but they can still provide other powerful features.
> That's why I think the NEP needs to be fleshed out a lot more before
> it will be possible to evaluate fairly.
>
> n
>
I just pushed a new version of the NEP to my PR, the fulltext of which
is below.
============================================================
NEP 31 — Contextlocal and global overrides of the NumPy API
============================================================
:Author: Hameer Abbasi < [hidden email]>
:Author: Ralf Gommers < [hidden email]>
:Author: Peter Bell < [hidden email]>
:Status: Draft
:Type: Standards Track
:Created: 20190822
Abstract

This NEP proposes to make all of NumPy's public API overridable via an
extensible backend mechanism, using a library called ``uarray`` `[1]`_
``uarray`` provides global and contextlocal overrides, as well as a
dispatch
mechanism similar to NEP18 `[2]`_. First experiences with
``__array_function__`` show that it is necessary to be able to override
NumPy
functions that *do not take an arraylike argument*, and hence aren't
overridable via ``__array_function__``. The most pressing need is array
creation and coercion functions  see e.g. NEP30 `[9]`_.
This NEP proposes to allow, in an optin fashion, overriding any part of
the
NumPy API. It is intended as a comprehensive resolution to NEP22
`[3]`_, and
obviates the need to add an evergrowing list of new protocols for each new
type of function or object that needs to become overridable.
Motivation and Scope

The motivation behind ``uarray`` is manyfold: First, there have been
several
attempts to allow dispatch of parts of the NumPy API, including (most
prominently), the ``__array_ufunc__`` protocol in NEP13 `[4]`_, and the
``__array_function__`` protocol in NEP18 `[2]`_, but this has shown the
need
for further protocols to be developed, including a protocol for coercion
(see
`[5]`_). The reasons these overrides are needed have been extensively
discussed
in the references, and this NEP will not attempt to go into the details
of why
these are needed. Another pain point requiring yet another protocol is the
duckarray protocol (see `[9]`_).
This NEP takes a more holistic approach: It assumes that there are parts of
the API that need to be overridable, and that these will grow over time. It
provides a general framework and a mechanism to avoid a design of a new
protocol each time this is required.
This NEP proposes the following: That ``unumpy`` `[8]`_ becomes the
recommended override mechanism for the parts of the NumPy API not yet
covered
by ``__array_function__`` or ``__array_ufunc__``, and that ``uarray`` is
vendored into a new namespace within NumPy to give users and downstream
dependencies access to these overrides. This vendoring mechanism is
similar
to what SciPy decided to do for making ``scipy.fft`` overridable (see
`[10]`_).
Detailed description

**Note:** *This section will not attempt to go into too much detail about
``uarray``, that is the purpose of the ``uarray`` documentation.* `[1]`_
*However, the NumPy community will have input into the design of
``uarray``, via the issue tracker.*
``uarray`` Primer
^^^^^^^^^^^^^^^^^
Defining backends
~~~~~~~~~~~~~~~~~
``uarray`` consists of two main protocols: ``__ua_convert__`` and
``__ua_function__``, called in that order, along with ``__ua_domain__``,
which
is a string defining the domain of the backend. If any of the protocols
return
``NotImplemented``, we fall back to the next backend.
``__ua_convert__`` is for conversion and coercion. It has the signature
``(dispatchables, coerce)``, where ``dispatchables`` is an iterable of
``ua.Dispatchable`` objects and ``coerce`` is a boolean indicating
whether or
not to force the conversion. ``ua.Dispatchable`` is a simple class
consisting
of three simple values: ``type``, ``value``, and ``coercible``.
``__ua_convert__`` returns an iterable of the converted values, or
``NotImplemented`` in the case of failure. Returning ``NotImplemented``
here will cause ``uarray`` to move to the next available backend.
``__ua_function__`` has the signature ``(func, args, kwargs)`` and defines
the actual implementation of the function. It recieves the function and its
arguments. Returning ``NotImplemented`` will cause a move to the default
implementation of the function if one exists, and failing that, the next
backend.
If all backends are exhausted, a ``ua.BackendNotImplementedError`` is
raised.
Backends can be registered for permanent use if required.
Defining overridable multimethods
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To define an overridable function (a multimethod), one needs a few things:
1. A dispatcher that returns an iterable of ``ua.Dispatchable`` objects.
2. A reverse dispatcher that replaces dispatchable values with the supplied
ones.
3. A domain.
4. Optionally, a default implementation, which can be provided in terms of
other multimethods.
As an example, consider the following::
import uarray as ua
def full_argreplacer(args, kwargs, dispatchables):
def full(shape, fill_value, dtype=None, order='C'):
return (shape, fill_value), dict(
dtype=dispatchables[0],
order=order
)
return full(*args, **kwargs)
@ua.create_multimethod(full_argreplacer, domain="numpy")
def full(shape, fill_value, dtype=None, order='C'):
return (ua.Dispatchable(dtype, np.dtype),)
A large set of examples can be found in the ``unumpy`` repository, `[8]`_.
This simple act of overriding callables allows us to override:
* Methods
* Properties, via ``fget`` and ``fset``
* Entire objects, via ``__get__``.
Using overrides
~~~~~~~~~~~~~~~
The way we propose the overrides will be used by end users is::
import numpy.overridable as np
with np.set_backend(backend):
x = np.asarray(my_array, dtype=dtype)
And a library that implements a NumPylike API will use it in the following
manner (as an example)::
import numpy.overridable as np
_ua_implementations = {}
__ua_domain__ = "numpy"
def __ua_function__(func, args, kwargs):
fn = _ua_implementations.get(func, None)
return fn(*args, **kwargs) if fn is not None else NotImplemented
def implements(ua_func):
def inner(func):
_ua_implementations[ua_func] = func
return func
return inner
@implements(np.asarray)
def asarray(a, dtype=None, order=None):
# Code here
# Either this method or __ua_convert__ must
# return NotImplemented for unsupported types,
# Or they shouldn't be marked as dispatchable.
# Provides a default implementation for ones and zeros.
@implements(np.full)
def full(shape, fill_value, dtype=None, order='C'):
# Code here
The only change this NEP proposes at its acceptance, is to make
``unumpy`` the
officially recommended way to override NumPy. ``unumpy`` will remain a
separate
repository/package (which we propose to vendor to avoid a hard
dependency, and
use the separate ``unumpy`` package only if it is installed) rather than
depend
on for the time being), and will be developed primarily with the input of
duckarray authors and secondarily, custom dtype authors, via the usual
GitHub workflow. There are a few reasons for this:
* Faster iteration in the case of bugs or issues.
* Faster design changes, in the case of needed functionality.
* ``unumpy`` will work with older versions of NumPy as well.
* The user and library author optin to the override process,
rather than breakages happening when it is least expected.
In simple terms, bugs in ``unumpy`` mean that ``numpy`` remains
unaffected.
Duckarray coercion
~~~~~~~~~~~~~~~~~~~
There are inherent problems about returning objects that are not NumPy
arrays
from ``numpy.array`` or ``numpy.asarray``, particularly in the context
of C/C++
or Cython code that may get an object with a different memory layout
than the
one it expects. However, we believe this problem may apply not only to
these
two functions but all functions that return NumPy arrays. For this reason,
overrides are optin for the user, by using the submodule
``numpy.overridable``
rather than ``numpy``. NumPy will continue to work unaffected by
anything in
``numpy.overridable``.
If the user wishes to obtain a NumPy array, there are two ways of doing it:
1. Use ``numpy.asarray`` (the nonoverridable version).
2. Use ``numpy.overridable.asarray`` with the NumPy backend set and
coercion
enabled::
import numpy.overridable as np
with ua.set_backend(np):
x = np.asarray(...)
Advantanges of ``unumpy`` over other solutions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
``unumpy`` offers a number of advantanges over the approach of defining
a new
protocol for every problem encountered: Whenever there is something
requiring
an override, ``unumpy`` will be able to offer a unified API with very minor
changes. For example:
* ``ufunc`` objects can be overridden via their ``__call__``, ``reduce``
and
other methods.
* Other functions can be overridden in a similar fashion.
* ``np.asduckarray`` goes away, and becomes ``np.asarray`` with a
backend set.
* The same holds for array creation functions such as ``np.zeros``,
``np.empty`` and so on.
This also holds for the future: Making something overridable would
require only
minor changes to ``unumpy``.
Another promise ``unumpy`` holds is one of default implementations. Default
implementations can be provided for any multimethod, in terms of others.
This
allows one to override a large part of the NumPy API by defining only a
small
part of it. This is to ease the creation of new duckarrays, by providing
default implementations of many functions that can be easily expressed in
terms of others, as well as a repository of utility functions that help
in the
implementation of duckarrays that most duckarrays would require.
The last benefit is a clear way to coerce to a given backend (via the
``coerce`` keyword in ``ua.set_backend``), and a protocol
for coercing not only arrays, but also ``dtype`` objects and ``ufunc``
objects
with similar ones from other libraries. This is due to the existence of
actual,
third party dtype packages, and their desire to blend into the NumPy
ecosystem
(see `[6]`_). This is a separate issue compared to the Clevel dtype
redesign
proposed in `[7]`_, it's about allowing thirdparty dtype
implementations to
work with NumPy, much like thirdparty array implementations. These can
provide
features such as, for example, units, jagged arrays or other such
features that
are outside the scope of NumPy.
Mixing NumPy and ``unumpy`` in the same file
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Normally, one would only want to import only one of ``unumpy`` or
``numpy``,
you would import it as ``np`` for familiarity. However, there may be
situations
where one wishes to mix NumPy and the overrides, and there are a few
ways to do
this, depending on the user's style::
import numpy.overridable as unumpy
import numpy as np
or::
import numpy as np
# Use unumpy via np.overridable
Related Work

Previous override mechanisms
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* NEP18, the ``__array_function__`` protocol. `[2]`_
* NEP13, the ``__array_ufunc__`` protocol. `[3]`_
Existing NumPylike array implementations
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* Dask: https://dask.org/* CuPy: https://cupy.chainer.org/* PyData/Sparse: https://sparse.pydata.org/* Xnd: https://xnd.readthedocs.io/* Astropy's Quantity: https://docs.astropy.org/en/stable/units/Existing and potential consumers of alternative arrays
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* Dask: https://dask.org/* scikitlearn: https://scikitlearn.org/* xarray: https://xarray.pydata.org/* TensorLy: http://tensorly.org/Existing alternate dtype implementations
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* ``ndtypes``: https://ndtypes.readthedocs.io/en/latest/* Datashape: https://datashape.readthedocs.io* Plum: https://plumpy.readthedocs.io/Implementation

The implementation of this NEP will require the following steps:
* Implementation of ``uarray`` multimethods corresponding to the
NumPy API, including classes for overriding ``dtype``, ``ufunc``
and ``array`` objects, in the ``unumpy`` repository.
* Moving backends from ``unumpy`` into the respective array libraries.
Backward compatibility

There are no backward incompatible changes proposed in this NEP.
Alternatives

The current alternative to this problem is NEP30 plus adding more
protocols
(not yet specified) in addition to it. Even then, some parts of the NumPy
API will remain nonoverridable, so it's a partial alternative.
The main alternative to vendoring ``unumpy`` is to simply move it into
NumPy
completely and not distribute it as a separate package. This would also
achieve
the proposed goals, however we prefer to keep it a separate package for
now,
for reasons already stated above.
Discussion

* ``uarray`` blogpost:
https://labs.quansight.org/blog/2019/07/uarrayupdateapichangesoverheadandcomparisonto__array_function__/* The discussion section of NEP18:
https://numpy.org/neps/nep0018arrayfunctionprotocol.html#discussion* NEP22: https://numpy.org/neps/nep0022ndarrayducktypingoverview.html* Dask issue #4462: https://github.com/dask/dask/issues/4462* PR #13046: https://github.com/numpy/numpy/pull/13046* Dask issue #4883: https://github.com/dask/dask/issues/4883* Issue #13831: https://github.com/numpy/numpy/issues/13831* Discussion PR 1: https://github.com/hameerabbasi/numpy/pull/3* Discussion PR 2: https://github.com/hameerabbasi/numpy/pull/4References and Footnotes

.. _[1]:
[1] uarray, A general dispatch mechanism for Python:
https://uarray.readthedocs.io.. _[2]:
[2] NEP 18 — A dispatch mechanism for NumPy’s high level array
functions: https://numpy.org/neps/nep0018arrayfunctionprotocol.html.. _[3]:
[3] NEP 22 — Duck typing for NumPy arrays – high level overview:
https://numpy.org/neps/nep0022ndarrayducktypingoverview.html.. _[4]:
[4] NEP 13 — A Mechanism for Overriding Ufuncs:
https://numpy.org/neps/nep0013ufuncoverrides.html.. _[5]:
[5] Reply to Adding to the nondispatched implementation of NumPy
methods:
http://numpydiscussion.10968.n7.nabble.com/AddingtothenondispatchedimplementationofNumPymethodstp46816p46874.html.. _[6]:
[6] Custom Dtype/Units discussion:
http://numpydiscussion.10968.n7.nabble.com/CustomDtypeUnitsdiscussiontd43262.html.. _[7]:
[7] The epic dtype cleanup plan: https://github.com/numpy/numpy/issues/2899.. _[8]:
[8] unumpy: NumPy, but implementationindependent:
https://unumpy.readthedocs.io.. _[9]:
[9] NEP 30 — Duck Typing for NumPy Arrays  Implementation:
https://www.numpy.org/neps/nep0030duckarrayprotocol.html.. _[10]:
[10] http://scipy.github.io/devdocs/fft.html#backendcontrolCopyright

This document has been placed in the public domain.
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


Hello everyone;
Thanks to all the feedback from the community, in particular
Sebastian Berg, we have a new draft of NEP31.
Please find the full text quoted below for discussion and
reference. Any feedback and discussion is welcome.
============================================================
NEP 31 — Contextlocal and global overrides of the NumPy API
============================================================
:Author: Hameer Abbasi [hidden email]
:Author: Ralf Gommers [hidden email]
:Author: Peter Bell [hidden email]
:Status: Draft
:Type: Standards Track
:Created: 20190822
Abstract

This NEP proposes to make all of NumPy's public API overridable via an
extensible backend mechanism.
Acceptance of this NEP means NumPy would provide global and contextlocal
overrides, as well as a dispatch mechanism similar to NEP18 [2]_. First
experiences with ``__array_function__`` show that it is necessary to be able
to override NumPy functions that *do not take an arraylike argument*, and
hence aren't overridable via ``__array_function__``. The most pressing need is
array creation and coercion functions, such as ``numpy.zeros`` or
``numpy.asarray``; see e.g. NEP30 [9]_.
This NEP proposes to allow, in an optin fashion, overriding any part of the
NumPy API. It is intended as a comprehensive resolution to NEP22 [3]_, and
obviates the need to add an evergrowing list of new protocols for each new
type of function or object that needs to become overridable.
Motivation and Scope

The motivation behind ``uarray`` is manyfold: First, there have been several
attempts to allow dispatch of parts of the NumPy API, including (most
prominently), the ``__array_ufunc__`` protocol in NEP13 [4]_, and the
``__array_function__`` protocol in NEP18 [2]_, but this has shown the need
for further protocols to be developed, including a protocol for coercion (see
[5]_, [9]_). The reasons these overrides are needed have been extensively
discussed in the references, and this NEP will not attempt to go into the
details of why these are needed; but in short: It is necessary for library
authors to be able to coerce arbitrary objects into arrays of their own types,
such as CuPy needing to coerce to a CuPy array, for example, instead of
a NumPy array.
These kinds of overrides are useful for both the enduser as well as library
authors. Endusers may have written or wish to write code that they then later
speed up or move to a different implementation, say PyData/Sparse. They can do
this simply by setting a backend. Library authors may also wish to write code
that is portable across array implementations, for example ``sklearn`` may wish
to write code for a machine learning algorithm that is portable across array
implementations while also using array creation functions.
This NEP takes a holistic approach: It assumes that there are parts of
the API that need to be overridable, and that these will grow over time. It
provides a general framework and a mechanism to avoid a design of a new
protocol each time this is required. This was the goal of ``uarray``: to
allow for overrides in an API without needing the design of a new protocol.
This NEP proposes the following: That ``unumpy`` [8]_ becomes the
recommended override mechanism for the parts of the NumPy API not yet covered
by ``__array_function__`` or ``__array_ufunc__``, and that ``uarray`` is
vendored into a new namespace within NumPy to give users and downstream
dependencies access to these overrides. This vendoring mechanism is similar
to what SciPy decided to do for making ``scipy.fft`` overridable (see [10]_).
Detailed description

Using overrides
~~~~~~~~~~~~~~~
The way we propose the overrides will be used by end users is::
# On the library side
import numpy.overridable as unp
def library_function(array):
array = unp.asarray(array)
# Code using unumpy as usual
return array
# On the user side:
import numpy.overridable as unp
import uarray as ua
import dask.array as da
ua.register_backend(da)
library_function(dask_array) # works and returns dask_array
with unp.set_backend(da):
library_function([1, 2, 3, 4]) # actually returns a Dask array.
Here, ``backend`` can be any compatible object defined either by NumPy or an
external library, such as Dask or CuPy. Ideally, it should be the module
``dask.array`` or ``cupy`` itself.
Composing backends
~~~~~~~~~~~~~~~~~~
There are some backends which may depend on other backends, for example xarray
depending on `numpy.fft`, and transforming a time axis into a frequency axis,
or Dask/xarray holding an array other than a NumPy array inside it. This would
be handled in the following manner inside code::
with ua.set_backend(cupy), ua.set_backend(dask.array):
# Code that has distributed GPU arrays here
Proposals
~~~~~~~~~
The only change this NEP proposes at its acceptance, is to make ``unumpy`` the
officially recommended way to override NumPy. ``unumpy`` will remain a separate
repository/package (which we propose to vendor to avoid a hard dependency, and
use the separate ``unumpy`` package only if it is installed, rather than depend
on for the time being). In concrete terms, ``numpy.overridable`` becomes an
alias for ``unumpy``, if available with a fallback to the a vendored version if
not. ``uarray`` and ``unumpy`` and will be developed primarily with the input
of duckarray authors and secondarily, custom dtype authors, via the usual
GitHub workflow. There are a few reasons for this:
* Faster iteration in the case of bugs or issues.
* Faster design changes, in the case of needed functionality.
* ``unumpy`` will work with older versions of NumPy as well.
* The user and library author optin to the override process,
rather than breakages happening when it is least expected.
In simple terms, bugs in ``unumpy`` mean that ``numpy`` remains
unaffected.
Advantanges of ``unumpy`` over other solutions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
``unumpy`` offers a number of advantanges over the approach of defining a new
protocol for every problem encountered: Whenever there is something requiring
an override, ``unumpy`` will be able to offer a unified API with very minor
changes. For example:
* ``ufunc`` objects can be overridden via their ``__call__``, ``reduce`` and
other methods.
* Other functions can be overridden in a similar fashion.
* ``np.asduckarray`` goes away, and becomes ``np.overridable.asarray`` with a
backend set.
* The same holds for array creation functions such as ``np.zeros``,
``np.empty`` and so on.
This also holds for the future: Making something overridable would require only
minor changes to ``unumpy``.
Another promise ``unumpy`` holds is one of default implementations. Default
implementations can be provided for any multimethod, in terms of others. This
allows one to override a large part of the NumPy API by defining only a small
part of it. This is to ease the creation of new duckarrays, by providing
default implementations of many functions that can be easily expressed in
terms of others, as well as a repository of utility functions that help in the
implementation of duckarrays that most duckarrays would require.
It also allows one to override functions in a manner which
``__array_function__`` simply cannot, such as overriding ``np.einsum`` with the
version from the ``opt_einsum`` package, or Intel MKL overriding FFT, BLAS
or ``ufunc`` objects. They would define a backend with the appropriate
multimethods, and the user would select them via a ``with`` statement, or
registering them as a backend.
The last benefit is a clear way to coerce to a given backend (via the
``coerce`` keyword in ``ua.set_backend``), and a protocol
for coercing not only arrays, but also ``dtype`` objects and ``ufunc`` objects
with similar ones from other libraries. This is due to the existence of actual,
third party dtype packages, and their desire to blend into the NumPy ecosystem
(see [6]_). This is a separate issue compared to the Clevel dtype redesign
proposed in [7]_, it's about allowing thirdparty dtype implementations to
work with NumPy, much like thirdparty array implementations. These can provide
features such as, for example, units, jagged arrays or other such features that
are outside the scope of NumPy.
Mixing NumPy and ``unumpy`` in the same file
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Normally, one would only want to import only one of ``unumpy`` or ``numpy``,
you would import it as ``np`` for familiarity. However, there may be situations
where one wishes to mix NumPy and the overrides, and there are a few ways to do
this, depending on the user's style::
from numpy import overridable as unp
import numpy as np
or::
import numpy as np
# Use unumpy via np.overridable
Duckarray coercion
~~~~~~~~~~~~~~~~~~~
There are inherent problems about returning objects that are not NumPy arrays
from ``numpy.array`` or ``numpy.asarray``, particularly in the context of C/C++
or Cython code that may get an object with a different memory layout than the
one it expects. However, we believe this problem may apply not only to these
two functions but all functions that return NumPy arrays. For this reason,
overrides are optin for the user, by using the submodule ``numpy.overridable``
rather than ``numpy``. NumPy will continue to work unaffected by anything in
``numpy.overridable``.
If the user wishes to obtain a NumPy array, there are two ways of doing it:
1. Use ``numpy.asarray`` (the nonoverridable version).
2. Use ``numpy.overridable.asarray`` with the NumPy backend set and coercion
enabled
Related Work

Other override mechanisms
~~~~~~~~~~~~~~~~~~~~~~~~~
* NEP18, the ``__array_function__`` protocol. [2]_
* NEP13, the ``__array_ufunc__`` protocol. [3]_
* NEP30, the ``__duck_array__`` protocol. [9]_
Existing NumPylike array implementations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Dask: https://dask.org/
* CuPy: https://cupy.chainer.org/
* PyData/Sparse: https://sparse.pydata.org/
* Xnd: https://xnd.readthedocs.io/
* Astropy's Quantity: https://docs.astropy.org/en/stable/units/
Existing and potential consumers of alternative arrays
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Dask: https://dask.org/
* scikitlearn: https://scikitlearn.org/
* xarray: https://xarray.pydata.org/
* TensorLy: http://tensorly.org/
Existing alternate dtype implementations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* ``ndtypes``: https://ndtypes.readthedocs.io/en/latest/
* Datashape: https://datashape.readthedocs.io
* Plum: https://plumpy.readthedocs.io/
Implementation

The implementation of this NEP will require the following steps:
* Implementation of ``uarray`` multimethods corresponding to the
NumPy API, including classes for overriding ``dtype``, ``ufunc``
and ``array`` objects, in the ``unumpy`` repository.
* Moving backends from ``unumpy`` into the respective array libraries.
``uarray`` Primer
~~~~~~~~~~~~~~~~~
**Note:** *This section will not attempt to go into too much detail about
uarray, that is the purpose of the uarray documentation.* [1]_
*However, the NumPy community will have input into the design of
uarray, via the issue tracker.*
``unumpy`` is the interface that defines a set of overridable functions
(multimethods) compatible with the numpy API. To do this, it uses the
``uarray`` library. ``uarray`` is a general purpose tool for creating
multimethods that dispatch to one of multiple different possible backend
implementations. In this sense, it is similar to the ``__array_function__``
protocol but with the key difference that the backend is explicitly installed
by the enduser and not coupled into the array type.
Decoupling the backend from the array type gives much more flexibility to
endusers and backend authors. For example, it is possible to:
* override functions not taking arrays as arguments
* create backends out of source from the array type
* install multiple backends for the same array type
This decoupling also means that ``uarray`` is not constrained to dispatching
over arraylike types. The backend is free to inspect the entire set of
function arguments to determine if it can implement the function e.g. ``dtype``
parameter dispatching.
Defining backends
^^^^^^^^^^^^^^^^^
``uarray`` consists of two main protocols: ``__ua_convert__`` and
``__ua_function__``, called in that order, along with ``__ua_domain__``.
``__ua_convert__`` is for conversion and coercion. It has the signature
``(dispatchables, coerce)``, where ``dispatchables`` is an iterable of
``ua.Dispatchable`` objects and ``coerce`` is a boolean indicating whether or
not to force the conversion. ``ua.Dispatchable`` is a simple class consisting
of three simple values: ``type``, ``value``, and ``coercible``.
``__ua_convert__`` returns an iterable of the converted values, or
``NotImplemented`` in the case of failure.
``__ua_function__`` has the signature ``(func, args, kwargs)`` and defines
the actual implementation of the function. It recieves the function and its
arguments. Returning ``NotImplemented`` will cause a move to the default
implementation of the function if one exists, and failing that, the next
backend.
Here is what will happen assuming a ``uarray`` multimethod is called:
1. We canonicalise the arguments so any arguments without a default
are placed in ``*args`` and those with one are placed in ``**kwargs``.
2. We check the list of backends.
a. If it is empty, we try the default implementation.
3. We check if the backend's ``__ua_convert__`` method exists. If it exists:
a. We pass it the output of the dispatcher,
which is an iterable of ``ua.Dispatchable`` objects.
b. We feed this output, along with the arguments,
to the argument replacer. ``NotImplemented`` means we move to 3
with the next backend.
c. We store the replaced arguments as the new arguments.
4. We feed the arguments into ``__ua_function__``, and return the output, and
exit if it isn't ``NotImplemented``.
5. If the default implementation exists, we try it with the current backend.
6. On failure, we move to 3 with the next backend. If there are no more
backends, we move to 7.
7. We raise a ``ua.BackendNotImplementedError``.
Defining overridable multimethods
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
To define an overridable function (a multimethod), one needs a few things:
1. A dispatcher that returns an iterable of ``ua.Dispatchable`` objects.
2. A reverse dispatcher that replaces dispatchable values with the supplied
ones.
3. A domain.
4. Optionally, a default implementation, which can be provided in terms of
other multimethods.
As an example, consider the following::
import uarray as ua
def full_argreplacer(args, kwargs, dispatchables):
def full(shape, fill_value, dtype=None, order='C'):
return (shape, fill_value), dict(
dtype=dispatchables[0],
order=order
)
return full(*args, **kwargs)
@ua.create_multimethod(full_argreplacer, domain="numpy")
def full(shape, fill_value, dtype=None, order='C'):
return (ua.Dispatchable(dtype, np.dtype),)
A large set of examples can be found in the ``unumpy`` repository, [8]_.
This simple act of overriding callables allows us to override:
* Methods
* Properties, via ``fget`` and ``fset``
* Entire objects, via ``__get__``.
Examples for NumPy
^^^^^^^^^^^^^^^^^^
A library that implements a NumPylike API will use it in the following
manner (as an example)::
import numpy.overridable as unp
_ua_implementations = {}
__ua_domain__ = "numpy"
def __ua_function__(func, args, kwargs):
fn = _ua_implementations.get(func, None)
return fn(*args, **kwargs) if fn is not None else NotImplemented
def implements(ua_func):
def inner(func):
_ua_implementations[ua_func] = func
return func
return inner
@implements(unp.asarray)
def asarray(a, dtype=None, order=None):
# Code here
# Either this method or __ua_convert__ must
# return NotImplemented for unsupported types,
# Or they shouldn't be marked as dispatchable.
# Provides a default implementation for ones and zeros.
@implements(unp.full)
def full(shape, fill_value, dtype=None, order='C'):
# Code here
Backward compatibility

There are no backward incompatible changes proposed in this NEP.
Alternatives

The current alternative to this problem is a combination of NEP18 [2]_,
NEP13 [4]_ and NEP30 [9]_ plus adding more protocols (not yet specified)
in addition to it. Even then, some parts of the NumPy API will remain
nonoverridable, so it's a partial alternative.
The main alternative to vendoring ``unumpy`` is to simply move it into NumPy
completely and not distribute it as a separate package. This would also achieve
the proposed goals, however we prefer to keep it a separate package for now,
for reasons already stated above.
The third alternative is to move ``unumpy`` into the NumPy organisation and
develop it as a NumPy project. This will also achieve the said goals, and is
also a possibility that can be considered by this NEP. However, the act of
doing an extra ``pip install`` or ``conda install`` may discourage some users
from adopting this method.
Discussion

* ``uarray`` blogpost: https://labs.quansight.org/blog/2019/07/uarrayupdateapichangesoverheadandcomparisonto__array_function__/
* The discussion section of NEP18: https://numpy.org/neps/nep0018arrayfunctionprotocol.html#discussion
* NEP22: https://numpy.org/neps/nep0022ndarrayducktypingoverview.html
* Dask issue #4462: https://github.com/dask/dask/issues/4462
* PR #13046: https://github.com/numpy/numpy/pull/13046
* Dask issue #4883: https://github.com/dask/dask/issues/4883
* Issue #13831: https://github.com/numpy/numpy/issues/13831
* Discussion PR 1: https://github.com/hameerabbasi/numpy/pull/3
* Discussion PR 2: https://github.com/hameerabbasi/numpy/pull/4
* Discussion PR 3: https://github.com/numpy/numpy/pull/14389
References and Footnotes

.. [1] uarray, A general dispatch mechanism for Python: https://uarray.readthedocs.io
.. [2] NEP 18 — A dispatch mechanism for NumPy’s high level array functions: https://numpy.org/neps/nep0018arrayfunctionprotocol.html
.. [3] NEP 22 — Duck typing for NumPy arrays – high level overview: https://numpy.org/neps/nep0022ndarrayducktypingoverview.html
.. [4] NEP 13 — A Mechanism for Overriding Ufuncs: https://numpy.org/neps/nep0013ufuncoverrides.html
.. [5] Reply to Adding to the nondispatched implementation of NumPy methods: http://numpydiscussion.10968.n7.nabble.com/AddingtothenondispatchedimplementationofNumPymethodstp46816p46874.html
.. [6] Custom Dtype/Units discussion: http://numpydiscussion.10968.n7.nabble.com/CustomDtypeUnitsdiscussiontd43262.html
.. [7] The epic dtype cleanup plan: https://github.com/numpy/numpy/issues/2899
.. [8] unumpy: NumPy, but implementationindependent: https://unumpy.readthedocs.io
.. [9] NEP 30 — Duck Typing for NumPy Arrays  Implementation: https://www.numpy.org/neps/nep0030duckarrayprotocol.html
.. [10] http://scipy.github.io/devdocs/fft.html#backendcontrol
Copyright

This document has been placed in the public domain.
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


On Mon, Sep 2, 2019 at 11:21 PM Ralf Gommers < [hidden email]> wrote:
> On Mon, Sep 2, 2019 at 2:09 PM Nathaniel Smith < [hidden email]> wrote:
>> The reason this is challenging is that there's a lot of code written
>> in Cython/C/C++ that calls np.asarray,
>
> Cython code only perhaps? It would surprise me if there's a lot of C/C++ code that explicitly calls into our Python rather than C API.
I think there's also code written as PythonwrappersaroundCcode
where the Python layer handles the errorchecking/coercion, and the C
code trusts it to have done so.
>> Now if I understand right, your proposal would be to make it so any
>> code in any package could arbitrarily change the behavior of
>> np.asarray for all inputs, e.g. I could just decide that
>> np.asarray([1, 2, 3]) should return some arbitrary nonnp.ndarray
>> object.
>
> No, definitely not! It's all optin, by explicitly importing from `numpy.overridable` or `unumpy`. No behavior of anything in the existing numpy namespaces should be affected in any way.
Ah, whoops, I definitely missed that :). That does change things!
So one of the major decision points for any duckarray API work, is
whether to modify the numpy semantics "in place", so user code
automatically gets access to the new semantics, or else to make a new
namespace, that users have to switch over to manually.
The major disadvantage of doing changes "in place" is, of course, that
we have to do all this careful work to move incrementally and make
sure that we don't break things. The major (potential) advantage is
that we have a much better chance of moving the ecosystem with us.
The major advantage of making a new namespace is that it's *much*
easier to experiment, because there's no chance of breaking any
projects that didn't opt in. The major disadvantage is that numpy is
super strongly entrenched, and convincing every project to switch to
something else is incredibly difficult and costly. (I just searched
github for "import numpy" and got 17.7 million hits. That's a lot of
imports to update!) Also, empirically, we've seen multiple projects
try to do this (e.g. DyND), and so far they all failed.
It sounds like unumpy is an interesting approach that hasn't been
tried before – in particular, the promise that you can "just switch
your imports" is a much easier transition than e.g. DyND offered. Of
course, that promise is somewhat undermined by the reality that all
these potential backend libraries *aren't* 100% compatible with numpy,
and can't be... it might turn out that this ends up like asanyarray,
where you can't really use it reliably because the thing that comes
out will generally support *most* of the normal ndarray semantics, but
you don't know which part. Is scipy planning to switch to using this
everywhere, including in C code? If not, then how do you expect
projects like matplotlib to switch, given that matplotlib likes to
pass array objects into scipy functions? Are you planning to take the
opportunity to clean up some of the obscure corners of the numpy API?
But those are general questions about unumpy, and I'm guessing noone
knows all the answers yet... and these question actually aren't super
relevant to the NEP. The NEP isn't inventing unumpy. IIUC, the main
thing the NEP is proposes is simply to make "numpy.overridable" an
alias for "unumpy".
It's not clear to me what problem this alias is solving. If all
downstream users have to update their imports anyway, then they can
write "import unumpy as np" just as easily as they can write "import
numpy.overridable as np". I guess the main reason this is a NEP is
because the unumpy project is hoping to get an "official stamp of
approval" from numpy? But even that could be accomplished by just
putting something in the docs. And adding the alias has substantial
risks: it makes unumpy tied to the numpy release cycle and
compatibility rules, and it means that we're committing to maintaining
unumpy ~forever even if Hameer or Quansight move onto other things.
That seems like a lot to take on for such vague benefits?
On Tue, Sep 3, 2019 at 2:04 AM Hameer Abbasi < [hidden email]> wrote:
> The fact that we're having to design more and more protocols for a lot
> of very similar things is, to me, an indicator that we do have holistic
> problems that ought to be solved by a single protocol.
But the reason we've had trouble designing these protocols is that
they're each different :). If it was just a matter of copying
__array_ufunc__ we'd have been done in a few minutes...
n

Nathaniel J. Smith  https://vorpus.org_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


That's a lot of very good questions!
Let me see if I can answer them onebyone.
On 06.09.19 09:49, Nathaniel Smith
wrote:
Ah,
whoops, I definitely missed that :). That does change things!
So one of the major decision points for any duckarray API work, is
whether to modify the numpy semantics "in place", so user code
automatically gets access to the new semantics, or else to make a new
namespace, that users have to switch over to manually.
The major disadvantage of doing changes "in place" is, of course, that
we have to do all this careful work to move incrementally and make
sure that we don't break things. The major (potential) advantage is
that we have a much better chance of moving the ecosystem with us.
The major advantage of making a new namespace is that it's *much*
easier to experiment, because there's no chance of breaking any
projects that didn't opt in. The major disadvantage is that numpy is
super strongly entrenched, and convincing every project to switch to
something else is incredibly difficult and costly. (I just searched
github for "import numpy" and got 17.7 million hits. That's a lot of
imports to update!) Also, empirically, we've seen multiple projects
try to do this (e.g. DyND), and so far they all failed.
It sounds like unumpy is an interesting approach that hasn't been
tried before – in particular, the promise that you can "just switch
your imports" is a much easier transition than e.g. DyND offered. Of
course, that promise is somewhat undermined by the reality that all
these potential backend libraries *aren't* 100% compatible with numpy,
and can't be...
This is true, however, with minor adjustments it should be
possible to make your code work across backends, if you don't use
a few obscure parts of NumPy.
it might turn out that this ends up like asanyarray,
where you can't really use it reliably because the thing that comes
out will generally support *most* of the normal ndarray semantics, but
you don't know which part. Is scipy planning to switch to using this
everywhere, including in C code?
Not at present I think, however, it should be possible to
"rewrite" parts of scipy on top of unumpy in order to make that
work, and where speed is required and an efficient implementation
isn't available in terms of NumPy functions, make dispatchable
multimethods and allow library authors to provide the said
implementations. We'll call this project uscipy, but that's an
endgame at this point. Right now, we're focusing on unumpy.
If not, then how do you expect
projects like matplotlib to switch, given that matplotlib likes to
pass array objects into scipy functions? Are you planning to take the
opportunity to clean up some of the obscure corners of the numpy API?
That's a completely different thing, and to answer that
question requires a distinction between uarray and unumpy...
uarray is a backendmechanism, independent of array computing.
We hope that matplotlib will adopt it to switch around it's GUI
backends for example.
But those are general questions about unumpy, and I'm guessing noone
knows all the answers yet... and these question actually aren't super
relevant to the NEP. The NEP isn't inventing unumpy. IIUC, the main
thing the NEP is proposes is simply to make "numpy.overridable" an
alias for "unumpy".
It's not clear to me what problem this alias is solving. If all
downstream users have to update their imports anyway, then they can
write "import unumpy as np" just as easily as they can write "import
numpy.overridable as np". I guess the main reason this is a NEP is
because the unumpy project is hoping to get an "official stamp of
approval" from numpy?
That's part of it. The concrete problems it's solving are
threefold:
 Array creation functions can be overridden.
 Array coercion is now covered.
 "Default implementations" will allow you to rewrite your
NumPy array more easily, when such efficient implementations
exist in terms of other NumPy functions. That will also help
achieve similar semantics, but as I said, they're just
"default"...
The import numpy.overridable part is meant to help
garner adoption, and to prefer the unumpy module if it
is available (which will continue to be developed separately).
That way it isn't so tightly coupled to the release cycle. One
alternative Sebastian Berg mentioned (and I am on board with) is
just moving unumpy into the NumPy organisation. What
we fear keeping it separate is that the simple act of a pip
install unumpy will keep people from using it or trying
it out.
But even that could be accomplished by just
putting something in the docs. And adding the alias has substantial
risks: it makes unumpy tied to the numpy release cycle and
compatibility rules, and it means that we're committing to maintaining
unumpy ~forever even if Hameer or Quansight move onto other things.
That seems like a lot to take on for such vague benefits?
I can assure you Travis has had the goal of "replatforming
SciPy" from as far back as I met him, he's spawned quite a few
efforts in that direction along with others from Quansight (and
they've led to nice projects). Quansight, as I see it, is
unlikely to abandon something like this if it becomes successful
(and acceptance of this NEP will be a huge success story).
On Tue, Sep 3, 2019 at 2:04 AM Hameer Abbasi [hidden email] wrote:
The fact that we're having to design more and more protocols for a lot
of very similar things is, to me, an indicator that we do have holistic
problems that ought to be solved by a single protocol.
But the reason we've had trouble designing these protocols is that
they're each different :). If it was just a matter of copying
__array_ufunc__ we'd have been done in a few minutes...
uarray borrows heavily from __array_function__.
It allows substituting (for example) __array_ufunc__ by
overriding ufunc.__call__, ufunc.reduce and so
on. It takes, as I mentioned, a holistic approach: There are
callables that need to be overriden, possibly with nothing to
dispatch on. And then it builds on top of that, adding
coercion/conversion.
n

Nathaniel J. Smith  https://vorpus.org
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


That's a lot of very good questions!
Let me see if I can answer them onebyone.
On 06.09.19 09:49, Nathaniel Smith
wrote:
But even that could be accomplished by just
putting something in the docs. And adding the alias has substantial
risks: it makes unumpy tied to the numpy release cycle and
compatibility rules, and it means that we're committing to maintaining
unumpy ~forever even if Hameer or Quansight move onto other things.
That seems like a lot to take on for such vague benefits?
I can assure you Travis has had the goal of "replatforming
SciPy" from as far back as I met him, he's spawned quite a few
efforts in that direction along with others from Quansight (and
they've led to nice projects). Quansight, as I see it, is
unlikely to abandon something like this if it becomes successful
(and acceptance of this NEP will be a huge success story).
Let me address this separately, since it's not really a technical concern.
First, this is not what we say for other contributions. E.g. we didn't say no to Pocketfft because Martin Reineck may move on, or __array_function__ because Stephan may get other interests at some point, or a whole new numpy.random, etc.
Second, this is not about Quansight. At Quansight Labs we've been able to create time for Hameer to build this, and me and others to contribute  which is very nice, but the two are not tied inextricably together. In the end it's still individuals submitting this NEP. I have been a NumPy dev for ~10 years before joining Quansight, and my future NumPy contributions are not dependent on staying at Quansight (not that I plan to go anywhere!). I'm guessing the same is true for others.
Third, unumpy is a fairly thin layer over uarray, which already has another user in SciPy.
Cheers,
Ralf
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


On Fri, Sep 6, 2019 at 12:53 AM Nathaniel Smith < [hidden email]> wrote: On Mon, Sep 2, 2019 at 11:21 PM Ralf Gommers <[hidden email]> wrote:
> On Mon, Sep 2, 2019 at 2:09 PM Nathaniel Smith <[hidden email]> wrote:
On Tue, Sep 3, 2019 at 2:04 AM Hameer Abbasi <[hidden email]> wrote:
> The fact that we're having to design more and more protocols for a lot
> of very similar things is, to me, an indicator that we do have holistic
> problems that ought to be solved by a single protocol.
But the reason we've had trouble designing these protocols is that
they're each different :). If it was just a matter of copying
__array_ufunc__ we'd have been done in a few minutes...
I don't think that argument is correct. That we now have two very similar protocols is simply a matter of history and limited developer time. NEP 18 discusses in several places that __array_ufunc__ should be brought in line with __array_ufunc__, and that we can migrate a function from one protocol to the other. There's no technical reason other than backwards compat and dev time why we couldn't use __array_function__ for ufuncs also.
Cheers,
Ralf
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


That's a lot of very good questions!
Let me see if I can answer them onebyone.
On 06.09.19 09:49, Nathaniel Smith
wrote:
But those are general questions about unumpy, and I'm guessing noone
knows all the answers yet... and these question actually aren't super
relevant to the NEP. The NEP isn't inventing unumpy. IIUC, the main
thing the NEP is proposes is simply to make "numpy.overridable" an
alias for "unumpy".
It's not clear to me what problem this alias is solving. If all
downstream users have to update their imports anyway, then they can
write "import unumpy as np" just as easily as they can write "import
numpy.overridable as np". I guess the main reason this is a NEP is
because the unumpy project is hoping to get an "official stamp of
approval" from numpy? Also because we have NEP 30 for yet another protocol, and there's likely another NEP to follow after that for array creation. Those use cases are covered by unumpy, so it makes sense to have a NEP for that as well, so they can be considered sidebyside.
That's part of it. The concrete problems it's solving are
threefold:
 Array creation functions can be overridden.
 Array coercion is now covered.
 "Default implementations" will allow you to rewrite your
NumPy array more easily, when such efficient implementations
exist in terms of other NumPy functions. That will also help
achieve similar semantics, but as I said, they're just
"default"...
There may be another very concrete one (that's not yet in the NEP): allowing other libraries that consume ndarrays to use overrides. An example is numpy.fft: currently both mkl_fft and pyfftw monkeypatch NumPy, something we don't like all that much (in particular for mkl_fft, because it's the default in Anaconda). `__array_function__` isn't able to help here, because it will always choose NumPy's own implementation for ndarray input. With unumpy you can support multiple libraries that consume ndarrays.
The point is: sometimes the array protocols are preferred (e.g. Dask/Xarraystyle metaarrays), sometimes unumpystyle dispatch works better. It's also not necessarily an either or, they can be complementary.
Actually, after writing this I just realized something. With 1.17.x we have:
```
In [1]: import dask.array as da
In [2]: d = da.from_array(np.linspace(0, 1))
In [3]: np.fft.fft(d) Out[3]: dask.array<fft, shape=(50,), dtype=complex128, chunksize=(50,)> ```
In Anaconda `np.fft.fft` *is* `mkl_fft._numpy_fft.fft`, so this won't work. We have no bug report yet because 1.17.x hasn't landed in conda defaults yet (perhaps this is a/the reason why?), but it will be a problem.
The import numpy.overridable part is meant to help
garner adoption, and to prefer the unumpy module if it
is available (which will continue to be developed separately).
That way it isn't so tightly coupled to the release cycle. One
alternative Sebastian Berg mentioned (and I am on board with) is
just moving unumpy into the NumPy organisation. What
we fear keeping it separate is that the simple act of a pip
install unumpy will keep people from using it or trying
it out.
Note that this is not the most critical aspect. I pushed for vendoring as numpy.overridable because I want to not derail the comparison with NEP 30 et al. with a "should we add a dependency" discussion. The interesting part to decide on first is: do we need the unumpy override mechanism? Vendoring optin vs. making it default vs. adding a dependency is of secondary interest right now.
Cheers,
Ralf
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


On Fri, Sep 6, 2019 at 2:45 PM Ralf Gommers < [hidden email]> wrote:
> There may be another very concrete one (that's not yet in the NEP): allowing other libraries that consume ndarrays to use overrides. An example is numpy.fft: currently both mkl_fft and pyfftw monkeypatch NumPy, something we don't like all that much (in particular for mkl_fft, because it's the default in Anaconda). `__array_function__` isn't able to help here, because it will always choose NumPy's own implementation for ndarray input. With unumpy you can support multiple libraries that consume ndarrays.
unumpy doesn't help with this either though, does it? unumpy is
doubleoptin: the code using np.fft has to switch to using unumpy.fft
instead, and then someone has to enable the backend. But MKL/pyfftw
started out as optin – you could `import mkl_fft` or `import pyfftw`
– and the whole reason they switched to monkeypatching is that they
decided that optin wasn't good enough for them.
>> The import numpy.overridable part is meant to help garner adoption, and to prefer the unumpy module if it is available (which will continue to be developed separately). That way it isn't so tightly coupled to the release cycle. One alternative Sebastian Berg mentioned (and I am on board with) is just moving unumpy into the NumPy organisation. What we fear keeping it separate is that the simple act of a pip install unumpy will keep people from using it or trying it out.
>
> Note that this is not the most critical aspect. I pushed for vendoring as numpy.overridable because I want to not derail the comparison with NEP 30 et al. with a "should we add a dependency" discussion. The interesting part to decide on first is: do we need the unumpy override mechanism? Vendoring optin vs. making it default vs. adding a dependency is of secondary interest right now.
Wait, but I thought the only reason we would have a dependency is if
we're exporting it as part of the numpy namespace. If we keep the
import as `import unumpy`, then it works just as well, without any
dependency *or* vendoring in numpy, right?
n

Nathaniel J. Smith  https://vorpus.org_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


On Fri, Sep 6, 2019 at 11:44 AM Ralf Gommers < [hidden email]> wrote:
>
>
>
> On Fri, Sep 6, 2019 at 1:32 AM Hameer Abbasi < [hidden email]> wrote:
>>
>> That's a lot of very good questions! Let me see if I can answer them onebyone.
>>
>> On 06.09.19 09:49, Nathaniel Smith wrote:
>>
>> But even that could be accomplished by just
>> putting something in the docs. And adding the alias has substantial
>> risks: it makes unumpy tied to the numpy release cycle and
>> compatibility rules, and it means that we're committing to maintaining
>> unumpy ~forever even if Hameer or Quansight move onto other things.
>> That seems like a lot to take on for such vague benefits?
>>
>> I can assure you Travis has had the goal of "replatforming SciPy" from as far back as I met him, he's spawned quite a few efforts in that direction along with others from Quansight (and they've led to nice projects). Quansight, as I see it, is unlikely to abandon something like this if it becomes successful (and acceptance of this NEP will be a huge success story).
>
>
> Let me address this separately, since it's not really a technical concern.
>
> First, this is not what we say for other contributions. E.g. we didn't say no to Pocketfft because Martin Reineck may move on, or __array_function__ because Stephan may get other interests at some point, or a whole new numpy.random, etc.
>
> Second, this is not about Quansight. At Quansight Labs we've been able to create time for Hameer to build this, and me and others to contribute  which is very nice, but the two are not tied inextricably together. In the end it's still individuals submitting this NEP. I have been a NumPy dev for ~10 years before joining Quansight, and my future NumPy contributions are not dependent on staying at Quansight (not that I plan to go anywhere!). I'm guessing the same is true for others.
>
> Third, unumpy is a fairly thin layer over uarray, which already has another user in SciPy.
I'm sorry if that came across as some kind snipe at Quansight
specifically. I didn't mean it that way. It's a much more general
concern: software projects are inherently risky, and often fail;
companies and research labs change focus and funding shifts around.
This is just a general risk that we need to take that into account
when making decisions. And when there are proposals to add new
submodules to numpy, we always put them under intense scrutiny,
exactly because of the support commitments.
The new fft and random code are replacing/extending our existing
public APIs that we already committed to, so that's a very different
situation. And __array_function__ was something that couldn't work at
all without being built into numpy, and even then it was controversial
and merged on an experimental basis. It's always about tradeoffs. My
concern here is that the NEP is proposing that the numpy maintainers
take on this large commitment, *and* AFAICT there's no compensating
benefit to justify that: everything that can be done with
numpy.overridable can be done just as well with a standalone unumpy
package... right?
n

Nathaniel J. Smith  https://vorpus.org_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


On Fri, Sep 6, 2019 at 5:16 PM Nathaniel Smith < [hidden email]> wrote: On Fri, Sep 6, 2019 at 11:44 AM Ralf Gommers <[hidden email]> wrote:
>
>
>
> On Fri, Sep 6, 2019 at 1:32 AM Hameer Abbasi <[hidden email]> wrote:
>>
>> That's a lot of very good questions! Let me see if I can answer them onebyone.
>>
>> On 06.09.19 09:49, Nathaniel Smith wrote:
>>
>> But even that could be accomplished by just
>> putting something in the docs. And adding the alias has substantial
>> risks: it makes unumpy tied to the numpy release cycle and
>> compatibility rules, and it means that we're committing to maintaining
>> unumpy ~forever even if Hameer or Quansight move onto other things.
>> That seems like a lot to take on for such vague benefits?
>>
>> I can assure you Travis has had the goal of "replatforming SciPy" from as far back as I met him, he's spawned quite a few efforts in that direction along with others from Quansight (and they've led to nice projects). Quansight, as I see it, is unlikely to abandon something like this if it becomes successful (and acceptance of this NEP will be a huge success story).
>
>
> Let me address this separately, since it's not really a technical concern.
>
> First, this is not what we say for other contributions. E.g. we didn't say no to Pocketfft because Martin Reineck may move on, or __array_function__ because Stephan may get other interests at some point, or a whole new numpy.random, etc.
>
> Second, this is not about Quansight. At Quansight Labs we've been able to create time for Hameer to build this, and me and others to contribute  which is very nice, but the two are not tied inextricably together. In the end it's still individuals submitting this NEP. I have been a NumPy dev for ~10 years before joining Quansight, and my future NumPy contributions are not dependent on staying at Quansight (not that I plan to go anywhere!). I'm guessing the same is true for others.
>
> Third, unumpy is a fairly thin layer over uarray, which already has another user in SciPy.
I'm sorry if that came across as some kind snipe at Quansight
specifically. I didn't mean it that way. It's a much more general
concern: software projects are inherently risky, and often fail;
companies and research labs change focus and funding shifts around.
This is just a general risk that we need to take that into account
when making decisions. And when there are proposals to add new
submodules to numpy, we always put them under intense scrutiny,
exactly because of the support commitments.
Yes, that's fair, and we should be critical here. All code we accept is indeed a maintenance burden.
The new fft and random code are replacing/extending our existing
public APIs that we already committed to, so that's a very different
situation. And __array_function__ was something that couldn't work at
all without being built into numpy, and even then it was controversial
and merged on an experimental basis. It's always about tradeoffs. My
concern here is that the NEP is proposing that the numpy maintainers
take on this large commitment,
Again, not just the NumPy maintainers. There really isn't that much in `unumpy` that's all that complicated. And again, `uarray` has multiple maintainers (note that Peter is also a SciPy core dev) and has another user in SciPy.
*and* AFAICT there's no compensating
benefit to justify that: everything that can be done with
numpy.overridable can be done just as well with a standalone unumpy
package... right?
True, mostly. But at that point, if we say that it's the way to do array coercion, and creation (and perhaps some other things as well), we're saying at the same time that every other package that needs this (e.g. Dask, CuPy) should take unumpy as a hard dependency. Which is a much bigger ask than when it comes with NumPy. We can discuss it of course.
Major exception is if we want to make it default for some functionality, like for example numpy.fft (I'll answer your other email for that.
Cheers,
Ralf
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


On Fri, Sep 6, 2019 at 4:51 PM Nathaniel Smith < [hidden email]> wrote: On Fri, Sep 6, 2019 at 2:45 PM Ralf Gommers <[hidden email]> wrote:
> There may be another very concrete one (that's not yet in the NEP): allowing other libraries that consume ndarrays to use overrides. An example is numpy.fft: currently both mkl_fft and pyfftw monkeypatch NumPy, something we don't like all that much (in particular for mkl_fft, because it's the default in Anaconda). `__array_function__` isn't able to help here, because it will always choose NumPy's own implementation for ndarray input. With unumpy you can support multiple libraries that consume ndarrays.
unumpy doesn't help with this either though, does it? unumpy is
doubleoptin: the code using np.fft has to switch to using unumpy.fft
instead, and then someone has to enable the backend.
Very good point. It would make a lot of sense to at least make unumpy default on fft/linalg/random, even if we want to keep it optin for the functions in the main namespace.
But MKL/pyfftw
started out as optin – you could `import mkl_fft` or `import pyfftw`
– and the whole reason they switched to monkeypatching is that they
decided that optin wasn't good enough for them.
No, that's not correct. The MKL team has asked for a proper backend system, so they can plug into numpy rather than monkeypatch it. Oleksey, Chuck and I discussed that two years ago already at the NumFOCUS Summit 2017.
This has been explicitly on the NumPy roadmap for quite a while:
And if Anaconda would like to default to it, that's possible  because one registered backend needs to be chosen as the default, that could be mklfft. That is still a major improvement over the situation today.
>> The import numpy.overridable part is meant to help garner adoption, and to prefer the unumpy module if it is available (which will continue to be developed separately). That way it isn't so tightly coupled to the release cycle. One alternative Sebastian Berg mentioned (and I am on board with) is just moving unumpy into the NumPy organisation. What we fear keeping it separate is that the simple act of a pip install unumpy will keep people from using it or trying it out.
>
> Note that this is not the most critical aspect. I pushed for vendoring as numpy.overridable because I want to not derail the comparison with NEP 30 et al. with a "should we add a dependency" discussion. The interesting part to decide on first is: do we need the unumpy override mechanism? Vendoring optin vs. making it default vs. adding a dependency is of secondary interest right now.
Wait, but I thought the only reason we would have a dependency is if
we're exporting it as part of the numpy namespace. If we keep the
import as `import unumpy`, then it works just as well, without any
dependency *or* vendoring in numpy, right?
Vendoring means "include the code". So no dependency on an external package. If we don't vendor, it's going to be either unused, or end up as a dependency for the whole SciPy/PyData stack.
Actually, now that we've discussed the fft issue, I'd suggest to change the NEP to: vendor, and make default for fft, random, and linalg.
Cheers,
Ralf
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


>> There may be another very concrete one (that's not yet in the NEP): allowing other libraries that consume ndarrays to use overrides. An example is numpy.fft: currently both mkl_fft and pyfftw monkeypatch NumPy, something we don't like all that much (in particular for mkl_fft, because it's the default in Anaconda). `__array_function__` isn't able to help here, because it will always choose NumPy's own implementation for ndarray input. With unumpy you can support multiple libraries that consume ndarrays.
> unumpy doesn't help with this either though, does it? unumpy is doubleoptin: the code using np.fft has to switch to using unumpy.fft instead, and then someone has to enable the backend. But MKL/pyfftw started out as optin – you could `import mkl_fft` or `import pyfftw` – and the whole reason they switched to monkeypatching is that they decided that optin wasn't good enough for them.
Because numpy functions are used to write many library functions, the end user isn't always able to optin by changing imports. So, for library functions, monkey patching is not simply convenient but actually necessary. Take for example scipy.signal.fftconvolve: SciPy can't change to pyfftw for licensing reasons so with SciPy < 1.4 your only option is to monkey patch scipy.fftpack and numpy.fft. However in SciPy >= 1.4, thanks to the uarraybased backend support in scipy.fft, I can write
from scipy import fft, signal
import pyfftw.interfaces.scipy_fft as pyfftw_fft
x = np.random.randn(1024, 1024)
with fft.set_backend(pyfftw_fft):
y = signal.fftconvolve(x, x) # Calls pyfftw's rfft, irfft
Yes, we had to optin in the library function (signal moved from scipy.fftpack to scipy.fft). But because there can be distance between the set_backend call and the FFT calls, the library is now much more configurable. Generally speaking, any library written to use unumpy would be configurable: (i) by the user, (ii) at runtime, (iii) without changing library code and (iv) without monkey patching.
In scipy.fft I actually did it slightly differently than unumpy: the scipy.fft interface itself has the uarray dispatch and I set SciPy's version of pocketfft as the default global backend. This means that normal users don't need to set a backend, and thus don't need to optin in any way. For NumPy to follow this pattern as well would require more change to NumPy's code base than the current NEP's suggestion, mainly in separating the interface from the implementation that would become the default backend.
 Peter
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


On Fri, 20190906 at 14:45 0700, Ralf Gommers wrote:
>
>
<snip>
> > That's part of it. The concrete problems it's solving are
> > threefold:
> > Array creation functions can be overridden.
> > Array coercion is now covered.
> > "Default implementations" will allow you to rewrite your NumPy
> > array more easily, when such efficient implementations exist in
> > terms of other NumPy functions. That will also help achieve similar
> > semantics, but as I said, they're just "default"...
> >
>
> There may be another very concrete one (that's not yet in the NEP):
> allowing other libraries that consume ndarrays to use overrides. An
> example is numpy.fft: currently both mkl_fft and pyfftw monkeypatch
> NumPy, something we don't like all that much (in particular for
> mkl_fft, because it's the default in Anaconda). `__array_function__`
> isn't able to help here, because it will always choose NumPy's own
> implementation for ndarray input. With unumpy you can support
> multiple libraries that consume ndarrays.
>
> Another example is einsum: if you want to use opt_einsum for all
> inputs (including ndarrays), then you cannot use np.einsum. And yet
> another is using bottleneck (
> https://kwgoodman.github.io/bottleneckdoc/reference.html) for nan
> functions and partition. There's likely more of these.
>
> The point is: sometimes the array protocols are preferred (e.g.
> Dask/Xarraystyle metaarrays), sometimes unumpystyle dispatch works
> better. It's also not necessarily an either or, they can be
> complementary.
>
Let me try to move the discussion from the github issue here (this may
not be the best place). ( https://github.com/numpy/numpy/issues/14441which asked for easier creation functions together with `__array_function__`).
I think an important note mentioned here is how users interact with
unumpy, vs. __array_function__. The former is an explicit optin, while
the latter is implicit choice based on an `arraylike` abstract base
class and functional type based dispatching.
To quote NEP 18 on this: "The downsides are that this would require an
explicit optin from all existing code, e.g., import numpy.api as np,
and in the long term would result in the maintenance of two separate
NumPy APIs. Also, many functions from numpy itself are already
overloaded (but inadequately), so confusion about high vs. low level
APIs in NumPy would still persist."
(I do think this is a point we should not just ignore, `uarray` is a
thin layer, but it has a big surface area)
Now there are things where explicit optin is obvious. And the FFT
example is one of those, there is no way to implicitly choose another
backend (except by just replacing it, i.e. monkeypatching) [1]. And
right now I think these are _very_ different.
Now for the endusers choosing one arraylike over another, seems nicer
as an implicit mechanism (why should I not mix sparse, dask and numpy
arrays!?). This is the promise `__array_function__` tries to make.
Unless convinced otherwise, my guess is that most library authors would
strive for implicit support (i.e. sklearn, skimage, scipy).
Circling back to creation and coercion. In a purely Object type system,
these would be classmethods, I guess, but in NumPy and the libraries
above, we are lost.
Solution 1: Create explicit optin, e.g. through uarray. (NEP31)
* Required enduser optin.
* Seems cleaner in many ways
* Requires a full copy of the API.
Solution 2: Add some coercion "protocol" (NEP30) and expose a way to
create new arrays more conveniently. This would practically mean adding
an `array_type=np.ndarray` argument.
* _Not_ used by endusers! End users should use dask.linspace!
* Adds "strange" API somewhere in numpy, and possible a new
"protocol" (additionally to coercion).[2]
I still feel these solve different issues. The second one is intended
to make array likes work implicitly in libraries (without end users
having to do anything). While the first seems to force the end user to
opt in, sometimes unnecessarily:
def my_library_func(array_like):
exp = np.exp(array_like)
idx = np.arange(len(exp))
return idx, exp
Would have all the information for implicit optin/Arraylike support,
but cannot do it right now. This is what I have been wondering, if
uarray/unumpy, can in some way help me make this work (even _without_
the end user opting in). The reason is that simply, right now I am very
clear on the need for this use case, but not sure about the need for
end user opt in, since end users can just use dask.arange().
Cheers,
Sebastian
[1] To be honest, I do think a lot of the "issues" around
monkeypatching exists just as much with backend choosing, the main
difference seems to me that a lot of that:
1. monkeypatching was not done explicit
(import mkl_fft; mkl_fft.monkeypatch_numpy())?
2. A backend system allows libaries to prefer one locally?
(which I think is a big advantage)
[2] There are the options of adding `linspace_like` functions somewhere
in a numpy submodule, or adding `linspace(..., array_type=np.ndarray)`,
or simply inventing a new "protocl" (which is not really a protocol?),
and make it `ndarray.__numpy_like_creation_functions__.arange()`.
> Actually, after writing this I just realized something. With 1.17.x
> we have:
>
> ```
> In [1]: import dask.array as da
>
>
> In [2]: d = da.from_array(np.linspace(0, 1))
>
>
> In [3]: np.fft.fft(d)
>
> Out[3]: dask.array<fft, shape=(50,), dtype=complex128,
> chunksize=(50,)>
> ```
>
> In Anaconda `np.fft.fft` *is* `mkl_fft._numpy_fft.fft`, so this won't
> work. We have no bug report yet because 1.17.x hasn't landed in conda
> defaults yet (perhaps this is a/the reason why?), but it will be a
> problem.
>
> > The import numpy.overridable part is meant to help garner adoption,
> > and to prefer the unumpy module if it is available (which will
> > continue to be developed separately). That way it isn't so tightly
> > coupled to the release cycle. One alternative Sebastian Berg
> > mentioned (and I am on board with) is just moving unumpy into the
> > NumPy organisation. What we fear keeping it separate is that the
> > simple act of a pip install unumpy will keep people from using it
> > or trying it out.
> >
> Note that this is not the most critical aspect. I pushed for
> vendoring as numpy.overridable because I want to not derail the
> comparison with NEP 30 et al. with a "should we add a dependency"
> discussion. The interesting part to decide on first is: do we need
> the unumpy override mechanism? Vendoring optin vs. making it default
> vs. adding a dependency is of secondary interest right now.
>
> Cheers,
> Ralf
>
>
>
> _______________________________________________
> NumPyDiscussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpydiscussion_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


On Fri, 20190906 at 14:45 0700, Ralf Gommers wrote:
>
>
<snip>
> > That's part of it. The concrete problems it's solving are
> > threefold:
> > Array creation functions can be overridden.
> > Array coercion is now covered.
> > "Default implementations" will allow you to rewrite your NumPy
> > array more easily, when such efficient implementations exist in
> > terms of other NumPy functions. That will also help achieve similar
> > semantics, but as I said, they're just "default"...
> >
>
> There may be another very concrete one (that's not yet in the NEP):
> allowing other libraries that consume ndarrays to use overrides. An
> example is numpy.fft: currently both mkl_fft and pyfftw monkeypatch
> NumPy, something we don't like all that much (in particular for
> mkl_fft, because it's the default in Anaconda). `__array_function__`
> isn't able to help here, because it will always choose NumPy's own
> implementation for ndarray input. With unumpy you can support
> multiple libraries that consume ndarrays.
>
> Another example is einsum: if you want to use opt_einsum for all
> inputs (including ndarrays), then you cannot use np.einsum. And yet
> another is using bottleneck (
> https://kwgoodman.github.io/bottleneckdoc/reference.html) for nan
> functions and partition. There's likely more of these.
>
> The point is: sometimes the array protocols are preferred (e.g.
> Dask/Xarraystyle metaarrays), sometimes unumpystyle dispatch works
> better. It's also not necessarily an either or, they can be
> complementary.
>
Let me try to move the discussion from the github issue here (this may
not be the best place). (https://github.com/numpy/numpy/issues/14441
which asked for easier creation functions together with `__array_function__`).
I think an important note mentioned here is how users interact with
unumpy, vs. __array_function__. The former is an explicit optin, while
the latter is implicit choice based on an `arraylike` abstract base
class and functional type based dispatching.
To quote NEP 18 on this: "The downsides are that this would require an
explicit optin from all existing code, e.g., import numpy.api as np,
and in the long term would result in the maintenance of two separate
NumPy APIs. Also, many functions from numpy itself are already
overloaded (but inadequately), so confusion about high vs. low level
APIs in NumPy would still persist."
(I do think this is a point we should not just ignore, `uarray` is a
thin layer, but it has a big surface area)
Now there are things where explicit optin is obvious. And the FFT
example is one of those, there is no way to implicitly choose another
backend (except by just replacing it, i.e. monkeypatching) [1]. And
right now I think these are _very_ different.
Now for the endusers choosing one arraylike over another, seems nicer
as an implicit mechanism (why should I not mix sparse, dask and numpy
arrays!?). This is the promise `__array_function__` tries to make.
Unless convinced otherwise, my guess is that most library authors would
strive for implicit support (i.e. sklearn, skimage, scipy).
Circling back to creation and coercion. In a purely Object type system,
these would be classmethods, I guess, but in NumPy and the libraries
above, we are lost.
Solution 1: Create explicit optin, e.g. through uarray. (NEP31)
* Required enduser optin.
* Seems cleaner in many ways
* Requires a full copy of the API.
bullet 1 and 3 are not required. if we decide to make it default, then there's no separate namespace
Solution 2: Add some coercion "protocol" (NEP30) and expose a way to
create new arrays more conveniently. This would practically mean adding
an `array_type=np.ndarray` argument.
* _Not_ used by endusers! End users should use dask.linspace!
* Adds "strange" API somewhere in numpy, and possible a new
"protocol" (additionally to coercion).[2]
I still feel these solve different issues. The second one is intended
to make array likes work implicitly in libraries (without end users
having to do anything). While the first seems to force the end user to
opt in, sometimes unnecessarily:
def my_library_func(array_like):
exp = np.exp(array_like)
idx = np.arange(len(exp))
return idx, exp
Would have all the information for implicit optin/Arraylike support,
but cannot do it right now.
Can you explain this a bit more? `len(exp)` is a number, so `np.arange(number)` doesn't really have any information here.
This is what I have been wondering, if
uarray/unumpy, can in some way help me make this work (even _without_
the end user opting in).
good question. if that needs to work in the absence of the user doing anything, it should be something like
with unumpy.determine_backend(exp): unumpy.arange(len(exp)) # or np.arange if we make unumpy default
to get the equivalent to `np.arange_like(len(exp), array_type=exp)`.
Note, that `determine_backend` thing doesn't exist today.
The reason is that simply, right now I am very
clear on the need for this use case, but not sure about the need for
end user opt in, since end users can just use dask.arange().
I don't get the last part. The arange is inside a library function, so a user can't just go in and change things there.
Cheers,
Ralf
Cheers,
Sebastian
[1] To be honest, I do think a lot of the "issues" around
monkeypatching exists just as much with backend choosing, the main
difference seems to me that a lot of that:
1. monkeypatching was not done explicit
(import mkl_fft; mkl_fft.monkeypatch_numpy())?
2. A backend system allows libaries to prefer one locally?
(which I think is a big advantage)
[2] There are the options of adding `linspace_like` functions somewhere
in a numpy submodule, or adding `linspace(..., array_type=np.ndarray)`,
or simply inventing a new "protocl" (which is not really a protocol?),
and make it `ndarray.__numpy_like_creation_functions__.arange()`.
> Actually, after writing this I just realized something. With 1.17.x
> we have:
>
> ```
> In [1]: import dask.array as da
>
>
> In [2]: d = da.from_array(np.linspace(0, 1))
>
>
> In [3]: np.fft.fft(d)
>
> Out[3]: dask.array<fft, shape=(50,), dtype=complex128,
> chunksize=(50,)>
> ```
>
> In Anaconda `np.fft.fft` *is* `mkl_fft._numpy_fft.fft`, so this won't
> work. We have no bug report yet because 1.17.x hasn't landed in conda
> defaults yet (perhaps this is a/the reason why?), but it will be a
> problem.
>
> > The import numpy.overridable part is meant to help garner adoption,
> > and to prefer the unumpy module if it is available (which will
> > continue to be developed separately). That way it isn't so tightly
> > coupled to the release cycle. One alternative Sebastian Berg
> > mentioned (and I am on board with) is just moving unumpy into the
> > NumPy organisation. What we fear keeping it separate is that the
> > simple act of a pip install unumpy will keep people from using it
> > or trying it out.
> >
> Note that this is not the most critical aspect. I pushed for
> vendoring as numpy.overridable because I want to not derail the
> comparison with NEP 30 et al. with a "should we add a dependency"
> discussion. The interesting part to decide on first is: do we need
> the unumpy override mechanism? Vendoring optin vs. making it default
> vs. adding a dependency is of secondary interest right now.
>
> Cheers,
> Ralf
>
>
>
> _______________________________________________
> NumPyDiscussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


On 20190907 15:33, Ralf Gommers wrote:
> On Sat, Sep 7, 2019 at 1:07 PM Sebastian Berg
> < [hidden email]> wrote:
>
>> On Fri, 20190906 at 14:45 0700, Ralf Gommers wrote:
>>>
>>>
>> <snip>
>>
>>>> That's part of it. The concrete problems it's solving are
>>>> threefold:
>>>> Array creation functions can be overridden.
>>>> Array coercion is now covered.
>>>> "Default implementations" will allow you to rewrite your NumPy
>>>> array more easily, when such efficient implementations exist in
>>>> terms of other NumPy functions. That will also help achieve
>> similar
>>>> semantics, but as I said, they're just "default"...
>>>>
>>>
>>> There may be another very concrete one (that's not yet in the
>> NEP):
>>> allowing other libraries that consume ndarrays to use overrides.
>> An
>>> example is numpy.fft: currently both mkl_fft and pyfftw
>> monkeypatch
>>> NumPy, something we don't like all that much (in particular for
>>> mkl_fft, because it's the default in Anaconda).
>> `__array_function__`
>>> isn't able to help here, because it will always choose NumPy's own
>>> implementation for ndarray input. With unumpy you can support
>>> multiple libraries that consume ndarrays.
>>>
>>> Another example is einsum: if you want to use opt_einsum for all
>>> inputs (including ndarrays), then you cannot use np.einsum. And
>> yet
>>> another is using bottleneck (
>>> https://kwgoodman.github.io/bottleneckdoc/reference.html) for
>> nan
>>> functions and partition. There's likely more of these.
>>>
>>> The point is: sometimes the array protocols are preferred (e.g.
>>> Dask/Xarraystyle metaarrays), sometimes unumpystyle dispatch
>> works
>>> better. It's also not necessarily an either or, they can be
>>> complementary.
>>>
>>
>> Let me try to move the discussion from the github issue here (this
>> may
>> not be the best place). ( https://github.com/numpy/numpy/issues/14441>> which asked for easier creation functions together with
>> `__array_function__`).
>>
>> I think an important note mentioned here is how users interact with
>> unumpy, vs. __array_function__. The former is an explicit optin,
>> while
>> the latter is implicit choice based on an `arraylike` abstract base
>> class and functional type based dispatching.
>>
>> To quote NEP 18 on this: "The downsides are that this would require
>> an
>> explicit optin from all existing code, e.g., import numpy.api as
>> np,
>> and in the long term would result in the maintenance of two separate
>> NumPy APIs. Also, many functions from numpy itself are already
>> overloaded (but inadequately), so confusion about high vs. low level
>> APIs in NumPy would still persist."
>> (I do think this is a point we should not just ignore, `uarray` is a
>> thin layer, but it has a big surface area)
>>
>> Now there are things where explicit optin is obvious. And the FFT
>> example is one of those, there is no way to implicitly choose
>> another
>> backend (except by just replacing it, i.e. monkeypatching) [1]. And
>> right now I think these are _very_ different.
>>
>> Now for the endusers choosing one arraylike over another, seems
>> nicer
>> as an implicit mechanism (why should I not mix sparse, dask and
>> numpy
>> arrays!?). This is the promise `__array_function__` tries to make.
>> Unless convinced otherwise, my guess is that most library authors
>> would
>> strive for implicit support (i.e. sklearn, skimage, scipy).
>>
>> Circling back to creation and coercion. In a purely Object type
>> system,
>> these would be classmethods, I guess, but in NumPy and the libraries
>> above, we are lost.
>>
>> Solution 1: Create explicit optin, e.g. through uarray. (NEP31)
>> * Required enduser optin.
>
>> * Seems cleaner in many ways
>> * Requires a full copy of the API.
>
> bullet 1 and 3 are not required. if we decide to make it default, then
> there's no separate namespace
It does require explicit optin to have any benefits to the user.
>
>> Solution 2: Add some coercion "protocol" (NEP30) and expose a way
>> to
>> create new arrays more conveniently. This would practically mean
>> adding
>> an `array_type=np.ndarray` argument.
>> * _Not_ used by endusers! End users should use dask.linspace!
>> * Adds "strange" API somewhere in numpy, and possible a new
>> "protocol" (additionally to coercion).[2]
>>
>> I still feel these solve different issues. The second one is
>> intended
>> to make array likes work implicitly in libraries (without end users
>> having to do anything). While the first seems to force the end user
>> to
>> opt in, sometimes unnecessarily:
>>
>> def my_library_func(array_like):
>> exp = np.exp(array_like)
>> idx = np.arange(len(exp))
>> return idx, exp
>>
>> Would have all the information for implicit optin/Arraylike
>> support,
>> but cannot do it right now.
>
> Can you explain this a bit more? `len(exp)` is a number, so
> `np.arange(number)` doesn't really have any information here.
>
Right, but as a library author, I want a way a way to make it use the
same type as `array_like` in this particular function, that is the
point! The enduser already signaled they prefer say dask, due to the
array that was actually passed in. (but this is just repeating what is
below I think).
>> This is what I have been wondering, if
>> uarray/unumpy, can in some way help me make this work (even
>> _without_
>> the end user opting in).
>
> good question. if that needs to work in the absence of the user doing
> anything, it should be something like
>
> with unumpy.determine_backend(exp):
> unumpy.arange(len(exp)) # or np.arange if we make unumpy default
>
> to get the equivalent to `np.arange_like(len(exp), array_type=exp)`.
>
> Note, that `determine_backend` thing doesn't exist today.
>
Exactly, that is what I have been wondering about, there may be more
issues around that.
If it existed, we may be able to solve the implicit library usage by
making libraries use
unumpy (or similar). Although, at that point we half replace
`__array_function__` maybe.
However, the main point is that without such a functionality, NEP 30 and
NEP 31 seem to solve slightly
different issues with respect to how they interact with the enduser
(opt in)?
We may decide that we do not want to solve the library users issue of
wanting to support implicit
optin for array like inputs because it is a rabbit hole. But we may
need to discuss/argue a bit
more that it really is a deep enough rabbit hole that it is not worth
the trouble.
>> The reason is that simply, right now I am very
>> clear on the need for this use case, but not sure about the need for
>> end user opt in, since end users can just use dask.arange().
>
> I don't get the last part. The arange is inside a library function, so
> a user can't just go in and change things there.
A "user" here means "end user". An end user writes a script, and they
can easily change
`arr = np.linspace(10)` to `arr = dask.linspace(10)`, or more likely
just use one within one
script and the other within another script, while both use the same
sklearn functions.
(Although using a backend switching may be nicer in some contexts)
A library provider (library user of unumpy/numpy) of course cannot just
use dask conveniently,
unless they write their own `guess_numpy_like_module()` function first.
> Cheers,
>
> Ralf
>
>> Cheers,
>>
>> Sebastian
>>
>> [1] To be honest, I do think a lot of the "issues" around
>> monkeypatching exists just as much with backend choosing, the main
>> difference seems to me that a lot of that:
>> 1. monkeypatching was not done explicit
>> (import mkl_fft; mkl_fft.monkeypatch_numpy())?
>> 2. A backend system allows libaries to prefer one locally?
>> (which I think is a big advantage)
>>
>> [2] There are the options of adding `linspace_like` functions
>> somewhere
>> in a numpy submodule, or adding `linspace(...,
>> array_type=np.ndarray)`,
>> or simply inventing a new "protocl" (which is not really a
>> protocol?),
>> and make it `ndarray.__numpy_like_creation_functions__.arange()`.
>>
>>> Actually, after writing this I just realized something. With
>> 1.17.x
>>> we have:
>>>
>>> ```
>>> In [1]: import dask.array as da
>>
>>>
>>>
>>> In [2]: d = da.from_array(np.linspace(0, 1))
>>
>>>
>>>
>>> In [3]: np.fft.fft(d)
>>
>>>
>>> Out[3]: dask.array<fft, shape=(50,), dtype=complex128,
>>> chunksize=(50,)>
>>> ```
>>>
>>> In Anaconda `np.fft.fft` *is* `mkl_fft._numpy_fft.fft`, so this
>> won't
>>> work. We have no bug report yet because 1.17.x hasn't landed in
>> conda
>>> defaults yet (perhaps this is a/the reason why?), but it will be a
>>> problem.
>>>
>>>> The import numpy.overridable part is meant to help garner
>> adoption,
>>>> and to prefer the unumpy module if it is available (which will
>>>> continue to be developed separately). That way it isn't so
>> tightly
>>>> coupled to the release cycle. One alternative Sebastian Berg
>>>> mentioned (and I am on board with) is just moving unumpy into
>> the
>>>> NumPy organisation. What we fear keeping it separate is that the
>>>> simple act of a pip install unumpy will keep people from using
>> it
>>>> or trying it out.
>>>>
>>> Note that this is not the most critical aspect. I pushed for
>>> vendoring as numpy.overridable because I want to not derail the
>>> comparison with NEP 30 et al. with a "should we add a dependency"
>>> discussion. The interesting part to decide on first is: do we need
>>> the unumpy override mechanism? Vendoring optin vs. making it
>> default
>>> vs. adding a dependency is of secondary interest right now.
>>>
>>> Cheers,
>>> Ralf
>>>
>>>
>>>
>>> _______________________________________________
>>> NumPyDiscussion mailing list
>>> [hidden email]
>>> https://mail.python.org/mailman/listinfo/numpydiscussion>> _______________________________________________
>> NumPyDiscussion mailing list
>> [hidden email]
>> https://mail.python.org/mailman/listinfo/numpydiscussion> _______________________________________________
> NumPyDiscussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpydiscussion_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


On 20190907 15:33, Ralf Gommers wrote:
> On Sat, Sep 7, 2019 at 1:07 PM Sebastian Berg
> <[hidden email]> wrote:
>
>> On Fri, 20190906 at 14:45 0700, Ralf Gommers wrote:
>>>
>>>
>> <snip>
>>
>>>> That's part of it. The concrete problems it's solving are
>>>> threefold:
>>>> Array creation functions can be overridden.
>>>> Array coercion is now covered.
>>>> "Default implementations" will allow you to rewrite your NumPy
>>>> array more easily, when such efficient implementations exist in
>>>> terms of other NumPy functions. That will also help achieve
>> similar
>>>> semantics, but as I said, they're just "default"...
>>>>
>>>
>>> There may be another very concrete one (that's not yet in the
>> NEP):
>>> allowing other libraries that consume ndarrays to use overrides.
>> An
>>> example is numpy.fft: currently both mkl_fft and pyfftw
>> monkeypatch
>>> NumPy, something we don't like all that much (in particular for
>>> mkl_fft, because it's the default in Anaconda).
>> `__array_function__`
>>> isn't able to help here, because it will always choose NumPy's own
>>> implementation for ndarray input. With unumpy you can support
>>> multiple libraries that consume ndarrays.
>>>
>>> Another example is einsum: if you want to use opt_einsum for all
>>> inputs (including ndarrays), then you cannot use np.einsum. And
>> yet
>>> another is using bottleneck (
>>> https://kwgoodman.github.io/bottleneckdoc/reference.html) for
>> nan
>>> functions and partition. There's likely more of these.
>>>
>>> The point is: sometimes the array protocols are preferred (e.g.
>>> Dask/Xarraystyle metaarrays), sometimes unumpystyle dispatch
>> works
>>> better. It's also not necessarily an either or, they can be
>>> complementary.
>>>
>>
>> Let me try to move the discussion from the github issue here (this
>> may
>> not be the best place). (https://github.com/numpy/numpy/issues/14441
>> which asked for easier creation functions together with
>> `__array_function__`).
>>
>> I think an important note mentioned here is how users interact with
>> unumpy, vs. __array_function__. The former is an explicit optin,
>> while
>> the latter is implicit choice based on an `arraylike` abstract base
>> class and functional type based dispatching.
>>
>> To quote NEP 18 on this: "The downsides are that this would require
>> an
>> explicit optin from all existing code, e.g., import numpy.api as
>> np,
>> and in the long term would result in the maintenance of two separate
>> NumPy APIs. Also, many functions from numpy itself are already
>> overloaded (but inadequately), so confusion about high vs. low level
>> APIs in NumPy would still persist."
>> (I do think this is a point we should not just ignore, `uarray` is a
>> thin layer, but it has a big surface area)
>>
>> Now there are things where explicit optin is obvious. And the FFT
>> example is one of those, there is no way to implicitly choose
>> another
>> backend (except by just replacing it, i.e. monkeypatching) [1]. And
>> right now I think these are _very_ different.
>>
>> Now for the endusers choosing one arraylike over another, seems
>> nicer
>> as an implicit mechanism (why should I not mix sparse, dask and
>> numpy
>> arrays!?). This is the promise `__array_function__` tries to make.
>> Unless convinced otherwise, my guess is that most library authors
>> would
>> strive for implicit support (i.e. sklearn, skimage, scipy).
>>
>> Circling back to creation and coercion. In a purely Object type
>> system,
>> these would be classmethods, I guess, but in NumPy and the libraries
>> above, we are lost.
>>
>> Solution 1: Create explicit optin, e.g. through uarray. (NEP31)
>> * Required enduser optin.
>
>> * Seems cleaner in many ways
>> * Requires a full copy of the API.
>
> bullet 1 and 3 are not required. if we decide to make it default, then
> there's no separate namespace
It does require explicit optin to have any benefits to the user.
>
>> Solution 2: Add some coercion "protocol" (NEP30) and expose a way
>> to
>> create new arrays more conveniently. This would practically mean
>> adding
>> an `array_type=np.ndarray` argument.
>> * _Not_ used by endusers! End users should use dask.linspace!
>> * Adds "strange" API somewhere in numpy, and possible a new
>> "protocol" (additionally to coercion).[2]
>>
>> I still feel these solve different issues. The second one is
>> intended
>> to make array likes work implicitly in libraries (without end users
>> having to do anything). While the first seems to force the end user
>> to
>> opt in, sometimes unnecessarily:
>>
>> def my_library_func(array_like):
>> exp = np.exp(array_like)
>> idx = np.arange(len(exp))
>> return idx, exp
>>
>> Would have all the information for implicit optin/Arraylike
>> support,
>> but cannot do it right now.
>
> Can you explain this a bit more? `len(exp)` is a number, so
> `np.arange(number)` doesn't really have any information here.
>
Right, but as a library author, I want a way a way to make it use the
same type as `array_like` in this particular function, that is the
point! The enduser already signaled they prefer say dask, due to the
array that was actually passed in. (but this is just repeating what is
below I think).
Okay, you meant conceptually:)
>> This is what I have been wondering, if
>> uarray/unumpy, can in some way help me make this work (even
>> _without_
>> the end user opting in).
>
> good question. if that needs to work in the absence of the user doing
> anything, it should be something like
>
> with unumpy.determine_backend(exp):
> unumpy.arange(len(exp)) # or np.arange if we make unumpy default
>
> to get the equivalent to `np.arange_like(len(exp), array_type=exp)`.
>
> Note, that `determine_backend` thing doesn't exist today.
>
Exactly, that is what I have been wondering about, there may be more
issues around that.
If it existed, we may be able to solve the implicit library usage by
making libraries use
unumpy (or similar). Although, at that point we half replace
`__array_function__` maybe.
I don't really think so. Libraries can/will still use __array_function__ for most functionality, and just add a `with determine_backend` for the places where __array_function__ doesn't work.
However, the main point is that without such a functionality, NEP 30 and
NEP 31 seem to solve slightly
different issues with respect to how they interact with the enduser
(opt in)?
Yes, I agree with that.
Cheers,
Ralf
We may decide that we do not want to solve the library users issue of
wanting to support implicit
optin for array like inputs because it is a rabbit hole. But we may
need to discuss/argue a bit
more that it really is a deep enough rabbit hole that it is not worth
the trouble.
>> The reason is that simply, right now I am very
>> clear on the need for this use case, but not sure about the need for
>> end user opt in, since end users can just use dask.arange().
>
> I don't get the last part. The arange is inside a library function, so
> a user can't just go in and change things there.
A "user" here means "end user". An end user writes a script, and they
can easily change
`arr = np.linspace(10)` to `arr = dask.linspace(10)`, or more likely
just use one within one
script and the other within another script, while both use the same
sklearn functions.
(Although using a backend switching may be nicer in some contexts)
A library provider (library user of unumpy/numpy) of course cannot just
use dask conveniently,
unless they write their own `guess_numpy_like_module()` function first.
> Cheers,
>
> Ralf
>
>> Cheers,
>>
>> Sebastian
>>
>> [1] To be honest, I do think a lot of the "issues" around
>> monkeypatching exists just as much with backend choosing, the main
>> difference seems to me that a lot of that:
>> 1. monkeypatching was not done explicit
>> (import mkl_fft; mkl_fft.monkeypatch_numpy())?
>> 2. A backend system allows libaries to prefer one locally?
>> (which I think is a big advantage)
>>
>> [2] There are the options of adding `linspace_like` functions
>> somewhere
>> in a numpy submodule, or adding `linspace(...,
>> array_type=np.ndarray)`,
>> or simply inventing a new "protocl" (which is not really a
>> protocol?),
>> and make it `ndarray.__numpy_like_creation_functions__.arange()`.
>>
>>> Actually, after writing this I just realized something. With
>> 1.17.x
>>> we have:
>>>
>>> ```
>>> In [1]: import dask.array as da
>>
>>>
>>>
>>> In [2]: d = da.from_array(np.linspace(0, 1))
>>
>>>
>>>
>>> In [3]: np.fft.fft(d)
>>
>>>
>>> Out[3]: dask.array<fft, shape=(50,), dtype=complex128,
>>> chunksize=(50,)>
>>> ```
>>>
>>> In Anaconda `np.fft.fft` *is* `mkl_fft._numpy_fft.fft`, so this
>> won't
>>> work. We have no bug report yet because 1.17.x hasn't landed in
>> conda
>>> defaults yet (perhaps this is a/the reason why?), but it will be a
>>> problem.
>>>
>>>> The import numpy.overridable part is meant to help garner
>> adoption,
>>>> and to prefer the unumpy module if it is available (which will
>>>> continue to be developed separately). That way it isn't so
>> tightly
>>>> coupled to the release cycle. One alternative Sebastian Berg
>>>> mentioned (and I am on board with) is just moving unumpy into
>> the
>>>> NumPy organisation. What we fear keeping it separate is that the
>>>> simple act of a pip install unumpy will keep people from using
>> it
>>>> or trying it out.
>>>>
>>> Note that this is not the most critical aspect. I pushed for
>>> vendoring as numpy.overridable because I want to not derail the
>>> comparison with NEP 30 et al. with a "should we add a dependency"
>>> discussion. The interesting part to decide on first is: do we need
>>> the unumpy override mechanism? Vendoring optin vs. making it
>> default
>>> vs. adding a dependency is of secondary interest right now.
>>>
>>> Cheers,
>>> Ralf
>>>
>>>
>>>
>>> _______________________________________________
>>> NumPyDiscussion mailing list
>>> [hidden email]
>>> https://mail.python.org/mailman/listinfo/numpydiscussion
>> _______________________________________________
>> NumPyDiscussion mailing list
>> [hidden email]
>> https://mail.python.org/mailman/listinfo/numpydiscussion
> _______________________________________________
> NumPyDiscussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


On Fri, Sep 6, 2019 at 11:04 PM Ralf Gommers < [hidden email]> wrote:
> Vendoring means "include the code". So no dependency on an external package. If we don't vendor, it's going to be either unused, or end up as a dependency for the whole SciPy/PyData stack.
If we vendor it then it also ends up as a dependency for the whole
SciPy/PyData stack...
> Actually, now that we've discussed the fft issue, I'd suggest to change the NEP to: vendor, and make default for fft, random, and linalg.
There's no way we can have an effective discussion of duck arrays, fft
backends, random backends, and linalg backends all at once in a single
thread.
Can you write separate NEPs for each of these? Some questions I'd like
to see addressed:
For fft:
 fft is an entirely selfcontained operation, with no interactions
with the rest of the system; the only difference between
implementations is speed. What problems are caused by monkeypatching,
and how is uarray materially different from monkeypatching?
For random:
 I thought the new random implementation with pluggable generators
etc. was supposed to solve this problem already. Why doesn't it?
 The biggest issue with MKL monkeypatching random is that it breaks
stream stability. How does the uarray approach address this?
For linalg:
 linalg already support __array_ufunc__ for overrides. Why do we need
a second override system? Isn't that redundant?
n

Nathaniel J. Smith  https://vorpus.org_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion

123
