Adding to the non-dispatched implementation of NumPy methods

classic Classic list List threaded Threaded
62 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Adding to the non-dispatched implementation of NumPy methods

Stephan Hoyer-2
Hi everyone,

We have a proposed a revision to NEP-18 (__array_function__). The proposal is for an adding an alias to the non-dispatched version of NumPy array functions in the __numpy_implementaiton__ function attribute:

I believe this attribute improves the protocol in three ways:
1. It provides a hook that __array_function__ methods can use to call implementation intended for NumPy arrays. This allows for "partial implementations" of NumPy's API, which turns out to useful even for some array libraries that reimplement nearly everything (namely, for CuPy and JAX).
2. It allows for fast access to the non-dispatching version of NumPy functions, e.g., np.concatenate.__numpy_implementation__(list_of_all_numpy_arrays).
3. Internally, the implementation of numpy.ndarray.__array_function__ now looks identical to how we encourage outside developers to write their own __array_function__ methods. The dispatching logic no longer includes a special case for NumPy arrays.

Feedback would be greatly welcomed!

Best,
Stephan

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Adding to the non-dispatched implementation of NumPy methods

Stefan van der Walt
On Mon, 15 Apr 2019 08:30:06 -0700, Stephan Hoyer wrote:
> We have a proposed a revision to NEP-18 (__array_function__). The proposal
> is for an adding an alias to the non-dispatched version of NumPy array
> functions in the __numpy_implementation__ function attribute:
> https://github.com/numpy/numpy/pull/13305

To help others parsing through the comments in GitHub, this mailing list
post is already a summary of all the comments up to

https://github.com/numpy/numpy/pull/13305#issuecomment-483301211

I'm generally in favor: it makes sense that you should easily be able to
access the original function when overriding it.

Stéfan
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Adding to the non-dispatched implementation of NumPy methods

Nathaniel Smith
In reply to this post by Stephan Hoyer-2
What's the difference between

np.concatenate.__numpy_implementation__(...)

and

np.ndarray.__array_function__(np.concatenate, ...)

?

More generally, I guess I'm not quite clear on how to think about what the "no dispatch" version does, because obviously it doesn't make sense to have *no* dispatch. Instead it's something like "the legacy hard-coded dispatch"?

On Mon, Apr 15, 2019, 08:30 Stephan Hoyer <[hidden email]> wrote:
Hi everyone,

We have a proposed a revision to NEP-18 (__array_function__). The proposal is for an adding an alias to the non-dispatched version of NumPy array functions in the __numpy_implementaiton__ function attribute:

I believe this attribute improves the protocol in three ways:
1. It provides a hook that __array_function__ methods can use to call implementation intended for NumPy arrays. This allows for "partial implementations" of NumPy's API, which turns out to useful even for some array libraries that reimplement nearly everything (namely, for CuPy and JAX).
2. It allows for fast access to the non-dispatching version of NumPy functions, e.g., np.concatenate.__numpy_implementation__(list_of_all_numpy_arrays).
3. Internally, the implementation of numpy.ndarray.__array_function__ now looks identical to how we encourage outside developers to write their own __array_function__ methods. The dispatching logic no longer includes a special case for NumPy arrays.

Feedback would be greatly welcomed!

Best,
Stephan
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Adding to the non-dispatched implementation of NumPy methods

Stephan Hoyer-2
On Mon, Apr 15, 2019 at 1:21 PM Nathaniel Smith <[hidden email]> wrote:
What's the difference between

np.concatenate.__numpy_implementation__(...)

and

np.ndarray.__array_function__(np.concatenate, ...)

?

I can answer this technically, though this doesn't seem to be quite what you're looking for:
- The former always succeed at dispatch, because it coerces all arguments to NumPy arrays.
- The second will either return NotImplemented (if a non-NumPy arrays implements __array_function__), or give the same result as former.
 
More generally, I guess I'm not quite clear on how to think about what the "no dispatch" version does, because obviously it doesn't make sense to have *no* dispatch. Instead it's something like "the legacy hard-coded dispatch"?

__numpy_implementation__ means you skip __array_function__ dispath and call the original NumPy function. In practice, this means you get legacy hard-coded dispatch behavior in most cases, e.g., the result will always be in the form of NumPy array(s).

It doesn't mean that the implementation always coerces all arguments to NumPy arrays. For example, np.result_type() will pull out of .dtype attributes off of its arguments, even without necessarily coercing its arguments to NumPy arrays. This strange version of "the implementation for NumPy arrays" turns out to be something that several libraries that want to implement __array_function__ want to be able to continue to use on their own array objects (namely, JAX and CuPy).
 

On Mon, Apr 15, 2019, 08:30 Stephan Hoyer <[hidden email]> wrote:
Hi everyone,

We have a proposed a revision to NEP-18 (__array_function__). The proposal is for an adding an alias to the non-dispatched version of NumPy array functions in the __numpy_implementaiton__ function attribute:

I believe this attribute improves the protocol in three ways:
1. It provides a hook that __array_function__ methods can use to call implementation intended for NumPy arrays. This allows for "partial implementations" of NumPy's API, which turns out to useful even for some array libraries that reimplement nearly everything (namely, for CuPy and JAX).
2. It allows for fast access to the non-dispatching version of NumPy functions, e.g., np.concatenate.__numpy_implementation__(list_of_all_numpy_arrays).
3. Internally, the implementation of numpy.ndarray.__array_function__ now looks identical to how we encourage outside developers to write their own __array_function__ methods. The dispatching logic no longer includes a special case for NumPy arrays.

Feedback would be greatly welcomed!

Best,
Stephan
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Adding to the non-dispatched implementation of NumPy methods

Nathaniel Smith
On Mon, Apr 15, 2019 at 4:39 PM Stephan Hoyer <[hidden email]> wrote:

>
> On Mon, Apr 15, 2019 at 1:21 PM Nathaniel Smith <[hidden email]> wrote:
>>
>> What's the difference between
>>
>> np.concatenate.__numpy_implementation__(...)
>>
>> and
>>
>> np.ndarray.__array_function__(np.concatenate, ...)
>>
>> ?
>
>
> I can answer this technically, though this doesn't seem to be quite what you're looking for:
> - The former always succeed at dispatch, because it coerces all arguments to NumPy arrays.
> - The second will either return NotImplemented (if a non-NumPy arrays implements __array_function__), or give the same result as former.
>
>>
>> More generally, I guess I'm not quite clear on how to think about what the "no dispatch" version does, because obviously it doesn't make sense to have *no* dispatch. Instead it's something like "the legacy hard-coded dispatch"?
>
> __numpy_implementation__ means you skip __array_function__ dispath and call the original NumPy function. In practice, this means you get legacy hard-coded dispatch behavior in most cases, e.g., the result will always be in the form of NumPy array(s).
>
> It doesn't mean that the implementation always coerces all arguments to NumPy arrays. For example, np.result_type() will pull out of .dtype attributes off of its arguments, even without necessarily coercing its arguments to NumPy arrays. This strange version of "the implementation for NumPy arrays" turns out to be something that several libraries that want to implement __array_function__ want to be able to continue to use on their own array objects (namely, JAX and CuPy).

Microsoft's "open standard" [1] document format, OOXML, famously
contains tags like "autoSpaceLikeWord95" and
"useWord97LineBreakRules". If you want to correctly interpret a Word
document, you have to know what these mean. (Unfortunately, the
standard doesn't say.)

Mostly I would like the definition for numpy 1.17's semantics to be
internally coherent and self-contained. If the documentation for
__numpy_implementation__ is just "it does whatever numpy 1.14 did",
then that seems not so great. Is there any way to define
__numpy_implementation__'s semantics without incorporating previous
versions of numpy by reference?

-n

[1] https://en.wikipedia.org/wiki/Standardization_of_Office_Open_XML

--
Nathaniel J. Smith -- https://vorpus.org
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Adding to the non-dispatched implementation of NumPy methods

Stefan van der Walt
I thought this was simply a slot to store the NumPy version of the
dispatched method, so that you could see easily call through to it and
extend it.

Stephan, was there a deeper intent here that I missed?

Best regards,
Stéfan

On April 15, 2019 20:32:35 Nathaniel Smith <[hidden email]> wrote:

> On Mon, Apr 15, 2019 at 4:39 PM Stephan Hoyer <[hidden email]> wrote:
>>
>> On Mon, Apr 15, 2019 at 1:21 PM Nathaniel Smith <[hidden email]> wrote:
>>>
>>>
>>> What's the difference between
>>>
>>>
>>> np.concatenate.__numpy_implementation__(...)
>>>
>>>
>>> and
>>>
>>>
>>> np.ndarray.__array_function__(np.concatenate, ...)
>>>
>>>
>>> ?
>>
>>
>> I can answer this technically, though this doesn't seem to be quite what
>> you're looking for:
>> - The former always succeed at dispatch, because it coerces all arguments
>> to NumPy arrays.
>> - The second will either return NotImplemented (if a non-NumPy arrays
>> implements __array_function__), or give the same result as former.
>>
>>>
>>>
>>> More generally, I guess I'm not quite clear on how to think about what the
>>> "no dispatch" version does, because obviously it doesn't make sense to have
>>> *no* dispatch. Instead it's something like "the legacy hard-coded dispatch"?
>>
>> __numpy_implementation__ means you skip __array_function__ dispath and call
>> the original NumPy function. In practice, this means you get legacy
>> hard-coded dispatch behavior in most cases, e.g., the result will always be
>> in the form of NumPy array(s).
>>
>> It doesn't mean that the implementation always coerces all arguments to
>> NumPy arrays. For example, np.result_type() will pull out of .dtype
>> attributes off of its arguments, even without necessarily coercing its
>> arguments to NumPy arrays. This strange version of "the implementation for
>> NumPy arrays" turns out to be something that several libraries that want to
>> implement __array_function__ want to be able to continue to use on their
>> own array objects (namely, JAX and CuPy).
>
> Microsoft's "open standard" [1] document format, OOXML, famously
> contains tags like "autoSpaceLikeWord95" and
> "useWord97LineBreakRules". If you want to correctly interpret a Word
> document, you have to know what these mean. (Unfortunately, the
> standard doesn't say.)
>
> Mostly I would like the definition for numpy 1.17's semantics to be
> internally coherent and self-contained. If the documentation for
> __numpy_implementation__ is just "it does whatever numpy 1.14 did",
> then that seems not so great. Is there any way to define
> __numpy_implementation__'s semantics without incorporating previous
> versions of numpy by reference?
>
> -n
>
> [1] https://en.wikipedia.org/wiki/Standardization_of_Office_Open_XML
>
> --
> Nathaniel J. Smith -- https://vorpus.org
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion



_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Adding to the non-dispatched implementation of NumPy methods

Marten van Kerkwijk

I somewhat share Nathaniel's worry that by providing `__numpy_implementation__` we essentially get stuck with the implementations we have currently, rather than having the hoped-for freedom to remove all the `np.asarray` coercion. In that respect, an advantage of using `_wrapped` is that it is clearly a private method, so anybody is automatically forewarned that this can change.

In principle, ndarray.__array_function__ would be more logical, but as noted in the PR, the problem is that it is non-trivial for a regular __array_function__ implementation to coerce all the arguments to ndarray itself.

Which suggests that perhaps what is missing is a general routine that does that, i.e., that re-uses the dispatcher.

-- Marten

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Adding to the non-dispatched implementation of NumPy methods

Stephan Hoyer-2
__numpy_implementation__ is indeed simply a slot for third-parties to access NumPy's implementation. It should be considered "NumPy's current implementation", not "NumPy's implementation as of 1.14". Of course, in practice these will remain very similar, because we are already very conservative about how we change NumPy.

I would love to have clean well-defined coercion semantics for every NumPy function, which would be implicitly adopted by `__numpy_implementation__` (e.g., we could say that every function always coerces its arguments with `np.asarray()`). But I think that's an orthogonal issue. We have been supporting some ad-hoc duck typing in NumPy for a long time (e.g., the `.sum()` method which is called by `np.sum()`). Removing that would require a deprecation cycle, which may indeed be warranted once we're sure we're happy with __array_function__. But I don't think the deprecation cycle will be any worse if the implementation is also exposed via `__numpy_implementation__`.

We should definitely still think about a cleaner "core" implementation of NumPy functions in terms of a minimal core. One recent example of this can be found JAX (see https://github.com/google/jax/blob/04b45e4086249bad691a33438e8bb6fcf639d001/jax/numpy/lax_numpy.py). This would be something appropriate to put into a more generic function attribute on NumPy functions, perhaps `__array_implementation__`. But I don't think formalizing `__numpy_implementation__` as a way to get access to NumPy's default implementation will limit our future options here.

Cheers,
Stephan


On Tue, Apr 16, 2019 at 6:44 AM Marten van Kerkwijk <[hidden email]> wrote:

I somewhat share Nathaniel's worry that by providing `__numpy_implementation__` we essentially get stuck with the implementations we have currently, rather than having the hoped-for freedom to remove all the `np.asarray` coercion. In that respect, an advantage of using `_wrapped` is that it is clearly a private method, so anybody is automatically forewarned that this can change.

In principle, ndarray.__array_function__ would be more logical, but as noted in the PR, the problem is that it is non-trivial for a regular __array_function__ implementation to coerce all the arguments to ndarray itself.

Which suggests that perhaps what is missing is a general routine that does that, i.e., that re-uses the dispatcher.

-- Marten
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Adding to the non-dispatched implementation of NumPy methods

Stephan Hoyer-2
Are there still concerns here? If not, I would love to move ahead with these changes so we can get this into NumPy 1.17.

On Tue, Apr 16, 2019 at 10:23 AM Stephan Hoyer <[hidden email]> wrote:
__numpy_implementation__ is indeed simply a slot for third-parties to access NumPy's implementation. It should be considered "NumPy's current implementation", not "NumPy's implementation as of 1.14". Of course, in practice these will remain very similar, because we are already very conservative about how we change NumPy.

I would love to have clean well-defined coercion semantics for every NumPy function, which would be implicitly adopted by `__numpy_implementation__` (e.g., we could say that every function always coerces its arguments with `np.asarray()`). But I think that's an orthogonal issue. We have been supporting some ad-hoc duck typing in NumPy for a long time (e.g., the `.sum()` method which is called by `np.sum()`). Removing that would require a deprecation cycle, which may indeed be warranted once we're sure we're happy with __array_function__. But I don't think the deprecation cycle will be any worse if the implementation is also exposed via `__numpy_implementation__`.

We should definitely still think about a cleaner "core" implementation of NumPy functions in terms of a minimal core. One recent example of this can be found JAX (see https://github.com/google/jax/blob/04b45e4086249bad691a33438e8bb6fcf639d001/jax/numpy/lax_numpy.py). This would be something appropriate to put into a more generic function attribute on NumPy functions, perhaps `__array_implementation__`. But I don't think formalizing `__numpy_implementation__` as a way to get access to NumPy's default implementation will limit our future options here.

Cheers,
Stephan


On Tue, Apr 16, 2019 at 6:44 AM Marten van Kerkwijk <[hidden email]> wrote:

I somewhat share Nathaniel's worry that by providing `__numpy_implementation__` we essentially get stuck with the implementations we have currently, rather than having the hoped-for freedom to remove all the `np.asarray` coercion. In that respect, an advantage of using `_wrapped` is that it is clearly a private method, so anybody is automatically forewarned that this can change.

In principle, ndarray.__array_function__ would be more logical, but as noted in the PR, the problem is that it is non-trivial for a regular __array_function__ implementation to coerce all the arguments to ndarray itself.

Which suggests that perhaps what is missing is a general routine that does that, i.e., that re-uses the dispatcher.

-- Marten
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Adding to the non-dispatched implementation of NumPy methods

Nathaniel Smith
Your last email didn't really clarify anything for me. I get that np.func.__numpy_implementation__ is intended to have the semantics of numpy's implementation of func, but that doesn't tell me much :-). And also, that's exactly the definition of np.func, isn't it?

You're talking about ~doubling the size of numpy's API, and don't seem able to even articulate what the new API's commitments are. This still makes me nervous. Maybe it should have a NEP? What's your testing strategy for all the new functions?

On Mon, Apr 22, 2019, 09:22 Stephan Hoyer <[hidden email]> wrote:
Are there still concerns here? If not, I would love to move ahead with these changes so we can get this into NumPy 1.17.

On Tue, Apr 16, 2019 at 10:23 AM Stephan Hoyer <[hidden email]> wrote:
__numpy_implementation__ is indeed simply a slot for third-parties to access NumPy's implementation. It should be considered "NumPy's current implementation", not "NumPy's implementation as of 1.14". Of course, in practice these will remain very similar, because we are already very conservative about how we change NumPy.

I would love to have clean well-defined coercion semantics for every NumPy function, which would be implicitly adopted by `__numpy_implementation__` (e.g., we could say that every function always coerces its arguments with `np.asarray()`). But I think that's an orthogonal issue. We have been supporting some ad-hoc duck typing in NumPy for a long time (e.g., the `.sum()` method which is called by `np.sum()`). Removing that would require a deprecation cycle, which may indeed be warranted once we're sure we're happy with __array_function__. But I don't think the deprecation cycle will be any worse if the implementation is also exposed via `__numpy_implementation__`.

We should definitely still think about a cleaner "core" implementation of NumPy functions in terms of a minimal core. One recent example of this can be found JAX (see https://github.com/google/jax/blob/04b45e4086249bad691a33438e8bb6fcf639d001/jax/numpy/lax_numpy.py). This would be something appropriate to put into a more generic function attribute on NumPy functions, perhaps `__array_implementation__`. But I don't think formalizing `__numpy_implementation__` as a way to get access to NumPy's default implementation will limit our future options here.

Cheers,
Stephan


On Tue, Apr 16, 2019 at 6:44 AM Marten van Kerkwijk <[hidden email]> wrote:

I somewhat share Nathaniel's worry that by providing `__numpy_implementation__` we essentially get stuck with the implementations we have currently, rather than having the hoped-for freedom to remove all the `np.asarray` coercion. In that respect, an advantage of using `_wrapped` is that it is clearly a private method, so anybody is automatically forewarned that this can change.

In principle, ndarray.__array_function__ would be more logical, but as noted in the PR, the problem is that it is non-trivial for a regular __array_function__ implementation to coerce all the arguments to ndarray itself.

Which suggests that perhaps what is missing is a general routine that does that, i.e., that re-uses the dispatcher.

-- Marten
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Adding to the non-dispatched implementation of NumPy methods

ralfgommers


On Mon, Apr 22, 2019 at 9:26 PM Nathaniel Smith <[hidden email]> wrote:
Your last email didn't really clarify anything for me. I get that np.func.__numpy_implementation__ is intended to have the semantics of numpy's implementation of func, but that doesn't tell me much :-). And also, that's exactly the definition of np.func, isn't it?

You're talking about ~doubling the size of numpy's API,

I think we can already get both the NEP 18 wrapped functions and their underlying implementations today, based on the value of NUMPY_EXPERIMENTAL_ARRAY_FUNCTION.
It looks to me like all this proposed change does is bypass a do-very-little wrapper.

and don't seem able to even articulate what the new API's commitments are. This still makes me nervous. Maybe it should have a NEP? What's your testing strategy for all the new functions?

The current decorator mechanism already checks that the signatures match, so it shouldn't be possible to get a mismatch. So probably not much is needed beyond some assert_equal(np.func(...), np.func.__numpy_implementation__(...)) checks.

@Stephan the PR for the NEP change is very hard to parse. Maybe easier to just open a PR with an implementation for one or a few functions + associated tests?

Cheers,
Ralf



On Mon, Apr 22, 2019, 09:22 Stephan Hoyer <[hidden email]> wrote:
Are there still concerns here? If not, I would love to move ahead with these changes so we can get this into NumPy 1.17.

On Tue, Apr 16, 2019 at 10:23 AM Stephan Hoyer <[hidden email]> wrote:
__numpy_implementation__ is indeed simply a slot for third-parties to access NumPy's implementation. It should be considered "NumPy's current implementation", not "NumPy's implementation as of 1.14". Of course, in practice these will remain very similar, because we are already very conservative about how we change NumPy.

I would love to have clean well-defined coercion semantics for every NumPy function, which would be implicitly adopted by `__numpy_implementation__` (e.g., we could say that every function always coerces its arguments with `np.asarray()`). But I think that's an orthogonal issue. We have been supporting some ad-hoc duck typing in NumPy for a long time (e.g., the `.sum()` method which is called by `np.sum()`). Removing that would require a deprecation cycle, which may indeed be warranted once we're sure we're happy with __array_function__. But I don't think the deprecation cycle will be any worse if the implementation is also exposed via `__numpy_implementation__`.

We should definitely still think about a cleaner "core" implementation of NumPy functions in terms of a minimal core. One recent example of this can be found JAX (see https://github.com/google/jax/blob/04b45e4086249bad691a33438e8bb6fcf639d001/jax/numpy/lax_numpy.py). This would be something appropriate to put into a more generic function attribute on NumPy functions, perhaps `__array_implementation__`. But I don't think formalizing `__numpy_implementation__` as a way to get access to NumPy's default implementation will limit our future options here.

Cheers,
Stephan


On Tue, Apr 16, 2019 at 6:44 AM Marten van Kerkwijk <[hidden email]> wrote:

I somewhat share Nathaniel's worry that by providing `__numpy_implementation__` we essentially get stuck with the implementations we have currently, rather than having the hoped-for freedom to remove all the `np.asarray` coercion. In that respect, an advantage of using `_wrapped` is that it is clearly a private method, so anybody is automatically forewarned that this can change.

In principle, ndarray.__array_function__ would be more logical, but as noted in the PR, the problem is that it is non-trivial for a regular __array_function__ implementation to coerce all the arguments to ndarray itself.

Which suggests that perhaps what is missing is a general routine that does that, i.e., that re-uses the dispatcher.

-- Marten
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Adding to the non-dispatched implementation of NumPy methods

Stephan Hoyer-2
On Mon, Apr 22, 2019 at 2:20 PM Ralf Gommers <[hidden email]> wrote:


On Mon, Apr 22, 2019 at 9:26 PM Nathaniel Smith <[hidden email]> wrote:
Your last email didn't really clarify anything for me. I get that np.func.__numpy_implementation__ is intended to have the semantics of numpy's implementation of func, but that doesn't tell me much :-). And also, that's exactly the definition of np.func, isn't it?

My understanding of the protocol we came up with in NEP-18 is that every NumPy function (that takes array-like arguments) now has two parts to its implementation:
1. The NEP-18 part involving calling the dispatcher function, and checking for/calling __array_function__ attributes on array-like arguments. This part is documented in NEP-18.
2. The original function definition, which is called if either (a) no __array_function__ attributes exist, or (b) the only __array_function__ attribute is numpy.ndarray.__array_function__. This part is documented in the docstring of the NumPy function.

"__numpy_implementation__" provides a short-cut to (2) without (1). That's it.

OK, thinking about this a little bit more, there is other one (rare) difference: in cases where a function has deprecated arguments, we are currently only issuing the deprecation warnings in the dispatcher function, rather than in both the dispatcher and the implementation. This is all the more reason to discourage users from calling __numpy_implementation__ directly (I'll update the NEP), but it's still fine to call __numpy_implementation__ from within __array_function__ methods themselves.

I guess the other option would be to make it programmatically impossible to access implementations outside of __array_function__, by making numpy_implementation an argument used to call __array_function__() rather than making it an attribute on NumPy functions. I don't like this as much, for two reasons:
1. It would break every existing implementation of __array_function__ before it launches. We did reserve the right to do this, but it's still a little unfriendly to our early adopters.
2. There are still cases where users will prefer to call np.concatenate.__numpy_implementation__ for extra performance, even knowing that they will miss any hypothetical deprecation warnings and removed/renamed function arguments.

You're talking about ~doubling the size of numpy's API,

I think we can already get both the NEP 18 wrapped functions and their underlying implementations today, based on the value of NUMPY_EXPERIMENTAL_ARRAY_FUNCTION.
It looks to me like all this proposed change does is bypass a do-very-little wrapper.

This is how I think of it.

and don't seem able to even articulate what the new API's commitments are. This still makes me nervous. Maybe it should have a NEP? What's your testing strategy for all the new functions?

The current decorator mechanism already checks that the signatures match, so it shouldn't be possible to get a mismatch. So probably not much is needed beyond some assert_equal(np.func(...), np.func.__numpy_implementation__(...)) checks.

@Stephan the PR for the NEP change is very hard to parse. Maybe easier to just open a PR with an implementation for one or a few functions + associated tests?

Sure, here's a full implementation (with tests): https://github.com/numpy/numpy/pull/13389

I have not included tests on every numpy function, but we didn't write those for each NumPy function with __array_function__ overrides, either -- the judgment was that the changes are mechanistic enough that writing a unit test for each function would not be worthwhile.

Also you'll note that my PR includes only a single change to np.ndarray.__array_function__ (swapping out __wrapped__ -> __numpy_implementation__). This is because we had actually already changed the implementation of ndarray.__array_function__ without updating the NEP, per prior discussion on the mailing list [1]. The existing use of the __wrapped__ attribute is an undocumented optimization / implementation detail.

 
Cheers,
Ralf



On Mon, Apr 22, 2019, 09:22 Stephan Hoyer <[hidden email]> wrote:
Are there still concerns here? If not, I would love to move ahead with these changes so we can get this into NumPy 1.17.

On Tue, Apr 16, 2019 at 10:23 AM Stephan Hoyer <[hidden email]> wrote:
__numpy_implementation__ is indeed simply a slot for third-parties to access NumPy's implementation. It should be considered "NumPy's current implementation", not "NumPy's implementation as of 1.14". Of course, in practice these will remain very similar, because we are already very conservative about how we change NumPy.

I would love to have clean well-defined coercion semantics for every NumPy function, which would be implicitly adopted by `__numpy_implementation__` (e.g., we could say that every function always coerces its arguments with `np.asarray()`). But I think that's an orthogonal issue. We have been supporting some ad-hoc duck typing in NumPy for a long time (e.g., the `.sum()` method which is called by `np.sum()`). Removing that would require a deprecation cycle, which may indeed be warranted once we're sure we're happy with __array_function__. But I don't think the deprecation cycle will be any worse if the implementation is also exposed via `__numpy_implementation__`.

We should definitely still think about a cleaner "core" implementation of NumPy functions in terms of a minimal core. One recent example of this can be found JAX (see https://github.com/google/jax/blob/04b45e4086249bad691a33438e8bb6fcf639d001/jax/numpy/lax_numpy.py). This would be something appropriate to put into a more generic function attribute on NumPy functions, perhaps `__array_implementation__`. But I don't think formalizing `__numpy_implementation__` as a way to get access to NumPy's default implementation will limit our future options here.

Cheers,
Stephan


On Tue, Apr 16, 2019 at 6:44 AM Marten van Kerkwijk <[hidden email]> wrote:

I somewhat share Nathaniel's worry that by providing `__numpy_implementation__` we essentially get stuck with the implementations we have currently, rather than having the hoped-for freedom to remove all the `np.asarray` coercion. In that respect, an advantage of using `_wrapped` is that it is clearly a private method, so anybody is automatically forewarned that this can change.

In principle, ndarray.__array_function__ would be more logical, but as noted in the PR, the problem is that it is non-trivial for a regular __array_function__ implementation to coerce all the arguments to ndarray itself.

Which suggests that perhaps what is missing is a general routine that does that, i.e., that re-uses the dispatcher.

-- Marten
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Adding to the non-dispatched implementation of NumPy methods

Nathaniel Smith
On Mon, Apr 22, 2019 at 11:13 PM Stephan Hoyer <[hidden email]> wrote:

>
>> On Mon, Apr 22, 2019 at 9:26 PM Nathaniel Smith <[hidden email]> wrote:
>>>
>>> Your last email didn't really clarify anything for me. I get that np.func.__numpy_implementation__ is intended to have the semantics of numpy's implementation of func, but that doesn't tell me much :-). And also, that's exactly the definition of np.func, isn't it?
>
> My understanding of the protocol we came up with in NEP-18 is that every NumPy function (that takes array-like arguments) now has two parts to its implementation:
> 1. The NEP-18 part involving calling the dispatcher function, and checking for/calling __array_function__ attributes on array-like arguments. This part is documented in NEP-18.
> 2. The original function definition, which is called if either (a) no __array_function__ attributes exist, or (b) the only __array_function__ attribute is numpy.ndarray.__array_function__. This part is documented in the docstring of the NumPy function.
>
> "__numpy_implementation__" provides a short-cut to (2) without (1). That's it.

OK, so the semantics are: the same as the normal function, except we
pretend that none of the arguments have an __array_function__
attribute?

That's much clearer to me than how you were phrasing it before :-).
Though now the name "__numpy_implementation__" doesn't seem very
evocative of what it does... numpy's dispatch sequence has changed a
lot in the past (mostly adding new coercion rules), and will probably
change in the future, and "__numpy_implementation__" doesn't give much
guidance about which parts of the dispatch sequence should be skipped
as "dispatch" and which should be included as "implementation". Maybe
something like __skipping_array_function__?

-n

--
Nathaniel J. Smith -- https://vorpus.org
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Adding to the non-dispatched implementation of NumPy methods

Marten van Kerkwijk
Hi All,

Reading the discussion again, I've gotten somewhat unsure that it is helpful to formalize a way to call an implementation that we can and hopefully will change. Why not just leave it at __wrapped__? I think the name is no worse and it is more obvious that one relies on something private.

I ask in part since I could see a good case for having a special method that is available only for functions that do no explicit casting to array, i.e., that are ready to accept array mimics (and for which we're willing to guarantee that would not change). For instance, functions like np.sinc that really have no business using more than ufuncs under the hood, i.e., which any class that has __array_ufunc__ can call safely. Or (eventually) all those functions that just end up calling `concatenate` - those again could easily be made safe for a class that just overrides `np.concatenate` using __array_function__. In essence, this would be any function that does not do `np.as(any)array` but just relies on array attributes.

But the above obviously presumes a vision of where this is headed, which I'm not sure is shared...

All the best,

Marten

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Adding to the non-dispatched implementation of NumPy methods

ralfgommers


On Tue, Apr 23, 2019 at 4:31 PM Marten van Kerkwijk <[hidden email]> wrote:
Hi All,

Reading the discussion again, I've gotten somewhat unsure that it is helpful to formalize a way to call an implementation that we can and hopefully will change. Why not just leave it at __wrapped__? I think the name is no worse and it is more obvious that one relies on something private.

I'm not convinced about the name either. NEP 18 also suggests adopting the protocol in other libraries, so for SciPy would we have to name it __scipy_implementation__? Not sure that's better or worse than a generic __wrapped__

I don't see why the numpy implementation must be considered private though. It's public today, and there's little wrong with keeping it public. The "it can change" doesn't really apply, it has the same backwards compat guarantees going forward that we have now.


I ask in part since I could see a good case for having a special method that is available only for functions that do no explicit casting to array, i.e., that are ready to accept array mimics (and for which we're willing to guarantee that would not change). For instance, functions like np.sinc that really have no business using more than ufuncs under the hood, i.e., which any class that has __array_ufunc__ can call safely. Or (eventually) all those functions that just end up calling `concatenate` - those again could easily be made safe for a class that just overrides `np.concatenate` using __array_function__. In essence, this would be any function that does not do `np.as(any)array` but just relies on array attributes.

But the above obviously presumes a vision of where this is headed, which I'm not sure is shared...

This is an orthogonal topic I think - you want multiple implementations, "safe" and "unsafe" (or fast vs checking for invalids vs robust for subclasses, etc. - lots of options here).

Cheers,
Ralf



_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Adding to the non-dispatched implementation of NumPy methods

Stephan Hoyer-2
In reply to this post by Nathaniel Smith
On Tue, Apr 23, 2019 at 12:27 AM Nathaniel Smith <[hidden email]> wrote:
On Mon, Apr 22, 2019 at 11:13 PM Stephan Hoyer <[hidden email]> wrote:
>
>> On Mon, Apr 22, 2019 at 9:26 PM Nathaniel Smith <[hidden email]> wrote:
>>>
>>> Your last email didn't really clarify anything for me. I get that np.func.__numpy_implementation__ is intended to have the semantics of numpy's implementation of func, but that doesn't tell me much :-). And also, that's exactly the definition of np.func, isn't it?
>
> My understanding of the protocol we came up with in NEP-18 is that every NumPy function (that takes array-like arguments) now has two parts to its implementation:
> 1. The NEP-18 part involving calling the dispatcher function, and checking for/calling __array_function__ attributes on array-like arguments. This part is documented in NEP-18.
> 2. The original function definition, which is called if either (a) no __array_function__ attributes exist, or (b) the only __array_function__ attribute is numpy.ndarray.__array_function__. This part is documented in the docstring of the NumPy function.
>
> "__numpy_implementation__" provides a short-cut to (2) without (1). That's it.

OK, so the semantics are: the same as the normal function, except we
pretend that none of the arguments have an __array_function__
attribute?

That's much clearer to me than how you were phrasing it before :-).

OK, I will make sure something like this ends up in the NEP :)
 
Though now the name "__numpy_implementation__" doesn't seem very
evocative of what it does... numpy's dispatch sequence has changed a
lot in the past (mostly adding new coercion rules), and will probably
change in the future, and "__numpy_implementation__" doesn't give much
guidance about which parts of the dispatch sequence should be skipped
as "dispatch" and which should be included as "implementation". Maybe
something like __skipping_array_function__?

With "__numpy_implementation__" I was hoping to evoke "the implementation used by numpy.ndarray.__array_function__" and "the implementation for NumPy arrays" rather than "the implementation found in the NumPy library." So it would still be appropriate to use on functions defined in SciPy, as long as they are defined on NumPy arrays.

That said, this is clearly going to remain a source of confusion. So let's see if we can do better.

Taking a step back, there will be three generic parts to NumPy functions after NEP-18:
1. Dispatching with __array_function__
2. Coercion to NumPy arrays (sometimes skipped if an object has the necessary duck-typing methods)
3. Implementation (either in C or is terms of other NumPy functions/methods)

Currently, NumPy functions do steps (2) and (3) together. What we're asking for here is a way to continue this behavior in the future, by optionally skipping step (1). But in the future, as Marten notes below, we should not rule out cases where we also want to skip straight to step (3), without step (2).

"__skipping_array_function__"  would be a reasonable choice, though it does not evoke the "numpy array specific"  aspect that I want to emphasis. Also, it has the unfortunate aspect of being named after what it doesn't do, rather than what it does.

"__numpy_ndarray_implementation__" and "__numpy_array_implementation__" are a bit verbose, but maybe they would be better?

The generic "__wrapped__" seems like a pretty bad choice to me, both because it's not at all descriptive and because it's generically used by functools.wraps -- which means np.ndarray.__array_function__ could inadvertently succeed when called with non-NumPy functions. Let's at least stick to unique names for our protocols :).

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Adding to the non-dispatched implementation of NumPy methods

Nathaniel Smith
On Wed, Apr 24, 2019 at 9:45 PM Stephan Hoyer <[hidden email]> wrote:

> With "__numpy_implementation__" I was hoping to evoke "the implementation used by numpy.ndarray.__array_function__" and "the implementation for NumPy arrays" rather than "the implementation found in the NumPy library." So it would still be appropriate to use on functions defined in SciPy, as long as they are defined on NumPy arrays.
>
> That said, this is clearly going to remain a source of confusion. So let's see if we can do better.
>
> Taking a step back, there will be three generic parts to NumPy functions after NEP-18:
> 1. Dispatching with __array_function__
> 2. Coercion to NumPy arrays (sometimes skipped if an object has the necessary duck-typing methods)
> 3. Implementation (either in C or is terms of other NumPy functions/methods)
>
> Currently, NumPy functions do steps (2) and (3) together. What we're asking for here is a way to continue this behavior in the future, by optionally skipping step (1). But in the future, as Marten notes below, we should not rule out cases where we also want to skip straight to step (3), without step (2).
>
> "__skipping_array_function__"  would be a reasonable choice, though it does not evoke the "numpy array specific"  aspect that I want to emphasis. Also, it has the unfortunate aspect of being named after what it doesn't do, rather than what it does.
>
> "__numpy_ndarray_implementation__" and "__numpy_array_implementation__" are a bit verbose, but maybe they would be better?

When you say "numpy array specific" and
"__numpy_(nd)array_implementation__", that sounds to me like you're
trying to say "just step 3, skipping steps 1 and 2"? Step 3 is the one
that operates on ndarrays...

When we have some kind of __asduckarray__ coercion, then that will
complicate things too, because presumably we'll do something like

1. __array_function__ dispatch
2. __asduckarray__ coercion
3. __array_function__ dispatch again
4. ndarray coercion
5. [either "the implementation", or __array_function__ dispatch again,
depending on how you want to think about it]

-n

--
Nathaniel J. Smith -- https://vorpus.org
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Adding to the non-dispatched implementation of NumPy methods

Stephan Hoyer-2
On Wed, Apr 24, 2019 at 9:56 PM Nathaniel Smith <[hidden email]> wrote:
When you say "numpy array specific" and
"__numpy_(nd)array_implementation__", that sounds to me like you're
trying to say "just step 3, skipping steps 1 and 2"? Step 3 is the one
that operates on ndarrays...

My thinking was that if we implement NumPy functions with duck typing (e.g., `np.stack()` in terms of  `.shape` + `np.concatenate()`), then step (3) could in some sense be the generic "array implementation", not only for NumPy arrays.
 
When we have some kind of __asduckarray__ coercion, then that will
complicate things too, because presumably we'll do something like

1. __array_function__ dispatch
2. __asduckarray__ coercion
3. __array_function__ dispatch again
4. ndarray coercion
5. [either "the implementation", or __array_function__ dispatch again,
depending on how you want to think about it]

I was thinking of something a little simpler: do __asduckarray__ rather than numpy.ndarray coercion inside the implementation of NumPy functions. Then making use of NumPy's implementations would be a matter of calling the NumPy implementation without ndarray coercion from side __array_function__.

e.g.,

class MyArray:
    def __duck_array__(self):
        return self
    def __array_function__(self, func, types, args, kwargs):
        ...
        if func in {np.stack, np.atleast_1d, ...}:
            # use NumPy's "duck typing" implementations for these functions
            return func.__duck_array_implementation__(*args, **kwargs)
        elif func == np.concatenate:
            # write my own version of np.concatenate
            ...

This would let you make use of duck typing in a controlled way if you use __array_function__. np.stack.__duck_array_implementation__ would look exactly like np.stack, except np.asanyarray() would be replaced by np.asduckarray().

The reason why we need the separate __duck_array_implementation__ and __numpy_array_implementation__/__skipping_array_function__ is because there are also use cases where you *don't* want to worry about how np.stack is implemented under the hood (i.e., in terms of np.concatenate), and want to go straight to the coercive numpy.ndarray implementation. This lets you avoid both the complexity and overhead associated with further dispatch checks.

I don't think we want repeated dispatching with __array_function__. That seems like a recipe for slow performance and confusion.

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Adding to the non-dispatched implementation of NumPy methods

Marten van Kerkwijk
It seems we are adding to the wishlist!  I see four so far:
1. Exposed in API, can be overridden with __array_ufunc__
2. One that converts everything to ndarray (or subclass); essentially the current implementation;
3. One that does asduckarray
4. One that assumes all arguments are arrays.

Maybe handiest would be if there is a method to coerce all relevant arguments with a function of one's choice? I.e., in the example of Stephan, one would have
```
if function in JUST_COERCE:
    coerced_args, coerced_kwargs = function.__coerce__(np.asanyarray, *args, **kwargs)
    return function.__implementation__(*coerced_args, **coerced_kwargs)
```
Actually, this might in fact work with the plan proposed here, if we allow for an extra, optional kwarg that contains the coercion function, that is
```
    return function.__implementation__(*args, coercion_function=np.asanyarray, **kwargs)
```

The possible advantage of this over yet more dunder methods is that one can fine-tune the extent to which something has to mimic an array properly (e.g., run `asanyarray` only if `shape` is not present).

It would be nice, though, if we could end up with also option 4 being available, if only because code that just can assume ndarray will be easiest to read.

All the best,

Marten

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Adding to the non-dispatched implementation of NumPy methods

Hameer Abbasi
On Thursday, Apr 25, 2019 at 9:45 PM, Marten van Kerkwijk <[hidden email]> wrote:
It seems we are adding to the wishlist!  I see four so far:
1. Exposed in API, can be overridden with __array_ufunc__
2. One that converts everything to ndarray (or subclass); essentially the current implementation;
3. One that does asduckarray
4. One that assumes all arguments are arrays.

Maybe handiest would be if there is a method to coerce all relevant arguments with a function of one's choice? I.e., in the example of Stephan, one would have
```
if function in JUST_COERCE:
    coerced_args, coerced_kwargs = function.__coerce__(np.asanyarray, *args, **kwargs)
    return function.__implementation__(*coerced_args, **coerced_kwargs)
```
Actually, this might in fact work with the plan proposed here, if we allow for an extra, optional kwarg that contains the coercion function, that is
```
    return function.__implementation__(*args, coercion_function=np.asanyarray, **kwargs)
```

The possible advantage of this over yet more dunder methods is that one can fine-tune the extent to which something has to mimic an array properly (e.g., run `asanyarray` only if `shape` is not present).

It would be nice, though, if we could end up with also option 4 being available, if only because code that just can assume ndarray will be easiest to read.

All the best,

Marten
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

Hi everyone,

Although, in general, I agree with Stephan’s design goals, I agree with Marten that the number of protocols are getting larger and may get out of hand if not handled properly. There’s even one Marten forgot to mention: __array_dtype__. I have been working on a project that I consider to have all the essential features that Marten proposes, mostly within one framework. It’s called uarray (for universal array) and can be found over at 

It adopts the “separation of implementation from interface” principles from the beginning. Here’s how it works: There are MultiMethods and Backends. A Backend registers implementations for a given MultiMethod. A MultiMethod defines the signature, along with the elements that can be dispatched over, along with their types. To it, NumPy is (and I realise this is going to be controversial, since this is the NumPy mailing list), just another backend.

Here’s how it addresses Marten’s concerns:
  • Everything is made into a MultiMethod. Then, the multimethod marks objects it’d like to dispatch over. For the status quo, this is arrays. But thinking long-term, we could dispatch over abstract ufuncs and dtypes as well. For ufuncs, ufunc.__call__ and ufunc.reduce are also MultiMethods.
  • Coercion works by extracting marked dispatchables, converting them into native library equivalents and then passing them back into the function. For example, it would convert lists (or anything marked as an array) to arrays. What it could also do is convert dtype=‘int64’ to an actual dtype, and so on.
  • __asduckarray__ is rendered unnecessary… Coercion handles that.
You can check out the usage examples in the tests:


Best Regards,
Hameer Abbasi

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (710 bytes) Download Attachment
1234