
12

Hi All,
In https://github.com/numpy/numpy/pull/12801, Tyler has been trying to use the new `where` argument for reductions to implement `nansum`, etc., using simplifications that boil down to `np.sum(..., where=~isnan(...))`.
A problem that occurs is that `np.sum` will use a `.sum` method if that is present, and for matrix, the `.sum` method does not take a `where` argument. Since the `where` argument has been introduced only recently, similar problems may happen for array mimics that implement their own `.sum` method.
The question now is what to do, with options being:
1. Let's stick with the existing implementation; the speedup is not that great anyway. 2. Use try/except around the new implementation and use the old one if it fails. 3. As (2), but emit a deprecation warning. This will help array mimics, but not matrix (unless someone makes a PR; would we even accept it?); 4. Use the new implementation. `matrix` should be gone anyway and array mimics can either update their `.sum()` method or override `np.nansum` with `__array_function__`.
Personally, I'd prefer (4), but realize that (3) is probably the more safer approach, even if it is really annoying to still be taking into account matrix deficiencies.
All the best,
Marten
p.s. One could also avoid the `.sum()` override altogether by doing `np.add.reduce(..., where=...)`, but this would probably break just as much.
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


On Mon, 20190610 at 13:47 0400, Marten van Kerkwijk wrote: Hi All,
In https://github.com/numpy/numpy/pull/12801, Tyler has been trying to use the new `where` argument for reductions to implement `nansum`, etc., using simplifications that boil down to `np.sum(..., where=~isnan(...))`.
A problem that occurs is that `np.sum` will use a `.sum` method if that is present, and for matrix, the `.sum` method does not take a `where` argument. Since the `where` argument has been introduced only recently, similar problems may happen for array mimics that implement their own `.sum` method.
Hi Marten! I ran into a similar issue with the initial kwarg when I implemented it, except at that time I just used the np._NoValue.
The question now is what to do, with options being:
1. Let's stick with the existing implementation; the speedup is not that great anyway. 2. Use try/except around the new implementation and use the old one if it fails. 3. As (2), but emit a deprecation warning. This will help array mimics, but not matrix (unless someone makes a PR; would we even accept it?); 4. Use the new implementation. `matrix` should be gone anyway and array mimics can either update their `.sum()` method or override `np.nansum` with `__array_function__`.
Personally, I'd prefer (4), but realize that (3) is probably the more safer approach, even if it is really annoying to still be taking into account matrix deficiencies.
If nansum does any other kind of dispatch that should be kept around in any case. Otherwise it failed then and it would fail now. We can catch and raise the right type of exception for backwards compatibility if needed.
All the best,
Marten
p.s. One could also avoid the `.sum()` override altogether by doing `np.add.reduce(..., where=...)`, but this would probably break just as much.
_______________________________________________ NumPyDiscussion mailing list [hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


On Mon, Jun 10, 2019 at 7:47 PM Marten van Kerkwijk < [hidden email]> wrote: Hi All,
In https://github.com/numpy/numpy/pull/12801, Tyler has been trying to use the new `where` argument for reductions to implement `nansum`, etc., using simplifications that boil down to `np.sum(..., where=~isnan(...))`.
A problem that occurs is that `np.sum` will use a `.sum` method if that is present, and for matrix, the `.sum` method does not take a `where` argument. Since the `where` argument has been introduced only recently, similar problems may happen for array mimics that implement their own `.sum` method.
The question now is what to do, with options being:
1. Let's stick with the existing implementation; the speedup is not that great anyway. 2. Use try/except around the new implementation and use the old one if it fails. 3. As (2), but emit a deprecation warning. This will help array mimics, but not matrix (unless someone makes a PR; would we even accept it?); 4. Use the new implementation. `matrix` should be gone anyway and array mimics can either update their `.sum()` method or override `np.nansum` with `__array_function__`.
Personally, I'd prefer (4), but realize that (3) is probably the more safer approach, even if it is really annoying to still be taking into account matrix deficiencies.
Honestly, I agree with Tyler's assessment when he closed the PR:
"there's not much evidence of remarkable performance improvement, and
many of the other nan functions would have more complicated
implementation details that are unlikely to make the relative
performance much better. It doesn't seem to be a priority either."
The code also got more rather than less complex. So why do this? Yes you give a reason why it may help duck arrays, but it also breaks things apparently.
Re matrix: it's not deprecated yet and is still fairly widely used, so yes we'd accept a PR and no we should not break it gratuitously.
Cheers,
Ralf
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


On Tue, 20190611 at 10:56 +0200, Ralf Gommers wrote:
>
>
> On Mon, Jun 10, 2019 at 7:47 PM Marten van Kerkwijk <
> [hidden email]> wrote:
> > Hi All,
> >
> > In https://github.com/numpy/numpy/pull/12801, Tyler has been trying
> > to use the new `where` argument for reductions to implement
> > `nansum`, etc., using simplifications that boil down to
> > `np.sum(..., where=~isnan(...))`.
> >
> > A problem that occurs is that `np.sum` will use a `.sum` method if
> > that is present, and for matrix, the `.sum` method does not take a
> > `where` argument. Since the `where` argument has been introduced
> > only recently, similar problems may happen for array mimics that
> > implement their own `.sum` method.
> >
> > The question now is what to do, with options being:
> > 1. Let's stick with the existing implementation; the speedup is
> > not that great anyway.
> > 2. Use try/except around the new implementation and use the old one
> > if it fails.
> > 3. As (2), but emit a deprecation warning. This will help array
> > mimics, but not matrix (unless someone makes a PR; would we even
> > accept it?);
> > 4. Use the new implementation. `matrix` should be gone anyway and
> > array mimics can either update their `.sum()` method or override
> > `np.nansum` with `__array_function__`.
> >
> > Personally, I'd prefer (4), but realize that (3) is probably the
> > more safer approach, even if it is really annoying to still be
> > taking into account matrix deficiencies.
> >
>
> Honestly, I agree with Tyler's assessment when he closed the PR:
> "there's not much evidence of remarkable performance improvement, and
> many of the other nan functions would have more complicated
> implementation details that are unlikely to make the relative
> performance much better. It doesn't seem to be a priority either."
>
> The code also got more rather than less complex. So why do this? Yes
> you give a reason why it may help duck arrays, but it also breaks
> things apparently.
There could probably be reasons for this, since replacing with 0 may
not work always for object arrays. But that is rather hypothetical
(e.g. using add as string concatenation).
Calling `np.add.reduce` instead of the sum method could be slightly
more compatible, but could have its own set of issue.
But I suppose it is true, unless we plan on adding specialized `where`
loops or a nanadd ufunc itself the gain is likely so small that it it
arguably may make more sense to defer this until there is an an actual
benefit (and less pain due to adoption of the new protocols).
Best,
Sebastian
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


OK, fair enough to just drop this! In this particular case, I do not care that much about ducktyping, just about making the implementation cleaner (which it is if one doesn't have to worry about diversion via a `.sum()` method).
In a way, I brought it up mostly as a concrete example of an internal implementation which we cannot change to an objectively cleaner one because other packages rely on an outofdate numpy API.
 Marten
On Tue, Jun 11, 2019 at 11:53 AM Sebastian Berg < [hidden email]> wrote: On Tue, 20190611 at 10:56 +0200, Ralf Gommers wrote:
>
>
> On Mon, Jun 10, 2019 at 7:47 PM Marten van Kerkwijk <
> [hidden email]> wrote:
> > Hi All,
> >
> > In https://github.com/numpy/numpy/pull/12801, Tyler has been trying
> > to use the new `where` argument for reductions to implement
> > `nansum`, etc., using simplifications that boil down to
> > `np.sum(..., where=~isnan(...))`.
> >
> > A problem that occurs is that `np.sum` will use a `.sum` method if
> > that is present, and for matrix, the `.sum` method does not take a
> > `where` argument. Since the `where` argument has been introduced
> > only recently, similar problems may happen for array mimics that
> > implement their own `.sum` method.
> >
> > The question now is what to do, with options being:
> > 1. Let's stick with the existing implementation; the speedup is
> > not that great anyway.
> > 2. Use try/except around the new implementation and use the old one
> > if it fails.
> > 3. As (2), but emit a deprecation warning. This will help array
> > mimics, but not matrix (unless someone makes a PR; would we even
> > accept it?);
> > 4. Use the new implementation. `matrix` should be gone anyway and
> > array mimics can either update their `.sum()` method or override
> > `np.nansum` with `__array_function__`.
> >
> > Personally, I'd prefer (4), but realize that (3) is probably the
> > more safer approach, even if it is really annoying to still be
> > taking into account matrix deficiencies.
> >
>
> Honestly, I agree with Tyler's assessment when he closed the PR:
> "there's not much evidence of remarkable performance improvement, and
> many of the other nan functions would have more complicated
> implementation details that are unlikely to make the relative
> performance much better. It doesn't seem to be a priority either."
>
> The code also got more rather than less complex. So why do this? Yes
> you give a reason why it may help duck arrays, but it also breaks
> things apparently.
There could probably be reasons for this, since replacing with 0 may
not work always for object arrays. But that is rather hypothetical
(e.g. using add as string concatenation).
Calling `np.add.reduce` instead of the sum method could be slightly
more compatible, but could have its own set of issue.
But I suppose it is true, unless we plan on adding specialized `where`
loops or a nanadd ufunc itself the gain is likely so small that it it
arguably may make more sense to defer this until there is an an actual
benefit (and less pain due to adoption of the new protocols).
Best,
Sebastian
>
> Re matrix: it's not deprecated yet and is still fairly widely used,
> so yes we'd accept a PR and no we should not break it gratuitously.
>
> Cheers,
> Ralf
>
> _______________________________________________
> NumPyDiscussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


In a way, I brought it up mostly as a concrete example of an internal implementation which we cannot change to an objectively cleaner one because other packages rely on an outofdate numpy API.
Should have added: rely on an outofdate numpy API where we have multiple ways for packages to provide their own overrides. But I guess in this case it is really matrix which is the problem. If so, maybe just adding kwargs to it is something we should consider.
 Marten
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


On Tue, 11 Jun 2019 15:10:16 0400, Marten van Kerkwijk wrote:
> In a way, I brought it up mostly as a concrete example of an internal
> implementation which we cannot change to an objectively cleaner one because
> other packages rely on an outofdate numpy API.
This, and the comments Nathaniel made on the array function thread, are
important to take note of. Would it be worth updating NEP 18 with a
list of pitfalls? Or should this be a new informational NEP that
discusses—on a higher level—the benefits, risks, and design
considerations of providing protocols?
Stéfan
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


On Wed, Jun 12, 2019 at 12:02 AM Stefan van der Walt < [hidden email]> wrote: On Tue, 11 Jun 2019 15:10:16 0400, Marten van Kerkwijk wrote:
> In a way, I brought it up mostly as a concrete example of an internal
> implementation which we cannot change to an objectively cleaner one because
> other packages rely on an outofdate numpy API.
I think this is not the right way to describe the problem (see below).
This, and the comments Nathaniel made on the array function thread, are
important to take note of. Would it be worth updating NEP 18 with a
list of pitfalls? Or should this be a new informational NEP that
discusses—on a higher level—the benefits, risks, and design
considerations of providing protocols?
That would be a nice thing to do (the higher level one), but in this case I think the issue has little to do with NEP 18. The summary of the issue in this thread is a little brief, so let me try to clarify.
1. np.sum gained a new `where=` keyword in 1.17.0 2. using np.sum(x) will detect a `x.sum` method if it's present and try to use that 3. the `_wrapreduction` utility that forwards the function to the method will compare signatures of np.sum and x.sum, and throw an error if there's a mismatch for any keywords that have a value other than the default np._NoValue
Code to check this: >>> x1 = np.arange(5) >>> x2 = np.asmatrix(x1) >>> np.sum(x1) # works >>> np.sum(x2) # works >>> np.sum(x1, where=x1>3) # works >>> np.sum(x2, where=x2>3) # _wrapreduction throws TypeError ... TypeError: sum() got an unexpected keyword argument 'where'
Note that this is not specific to np.matrix. Using pandas.Series you also get a TypeError: >>> y = pd.Series(x1) >>> np.sum(y) # works >>> np.sum(y, where=y>3) # pandas throws TypeError ...
TypeError: sum() got an unexpected keyword argument 'where'
The issue is that when we have this kind of forwarding logic, irrespective of how it's implemented, new keywords cannot be used until the arraylike objects with the methods that get forwarded to gain the same keyword.
tl;dr this is simply a cost we have to be aware of when either proposing to add new keywords, or when proposing any kind of dispatching logic (in this case `_wrapreduction`).
Regarding internal use of `np.sum(..., where=)`: this should not be done until at least 45 versions from now, and preferably way longer than that. Because doing so will break already released versions of Pandas, Dask, and other libraries with arraylike objects.
Cheers,
Ralf
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


Hi Ralf,
You're right, the problem is with the added keyword argument (which would appear also if we did not still have to support the old .sum method override but just dispatched to __array_ufunc__ with `np.add.reduce`  maybe worse given that it seems likely the reduce method has seen much less testing in __array_ufunc__ implementations).
Still, I do think the question stands: we implement a `nansum` for our ndarray class a certain way, and provide ways to override it (three now, in fact). Is it really reasonable to expect that we wait 4 versions for other packages to keep up with this, and thus get stuck with given internal implementations?
Aside: note that the present version of the nanfunctions relies on turning the arguments into arrays and copying 0s into it  that suggests that currently they do not work for duck arrays like Dask.
All the best,
Marten
On Wed, Jun 12, 2019 at 12:02 AM Stefan van der Walt < [hidden email]> wrote: On Tue, 11 Jun 2019 15:10:16 0400, Marten van Kerkwijk wrote:
> In a way, I brought it up mostly as a concrete example of an internal
> implementation which we cannot change to an objectively cleaner one because
> other packages rely on an outofdate numpy API.
I think this is not the right way to describe the problem (see below).
This, and the comments Nathaniel made on the array function thread, are
important to take note of. Would it be worth updating NEP 18 with a
list of pitfalls? Or should this be a new informational NEP that
discusses—on a higher level—the benefits, risks, and design
considerations of providing protocols?
That would be a nice thing to do (the higher level one), but in this case I think the issue has little to do with NEP 18. The summary of the issue in this thread is a little brief, so let me try to clarify.
1. np.sum gained a new `where=` keyword in 1.17.0 2. using np.sum(x) will detect a `x.sum` method if it's present and try to use that 3. the `_wrapreduction` utility that forwards the function to the method will compare signatures of np.sum and x.sum, and throw an error if there's a mismatch for any keywords that have a value other than the default np._NoValue
Code to check this: >>> x1 = np.arange(5) >>> x2 = np.asmatrix(x1) >>> np.sum(x1) # works >>> np.sum(x2) # works >>> np.sum(x1, where=x1>3) # works >>> np.sum(x2, where=x2>3) # _wrapreduction throws TypeError ... TypeError: sum() got an unexpected keyword argument 'where'
Note that this is not specific to np.matrix. Using pandas.Series you also get a TypeError: >>> y = pd.Series(x1) >>> np.sum(y) # works >>> np.sum(y, where=y>3) # pandas throws TypeError ...
TypeError: sum() got an unexpected keyword argument 'where'
The issue is that when we have this kind of forwarding logic, irrespective of how it's implemented, new keywords cannot be used until the arraylike objects with the methods that get forwarded to gain the same keyword.
tl;dr this is simply a cost we have to be aware of when either proposing to add new keywords, or when proposing any kind of dispatching logic (in this case `_wrapreduction`).
Regarding internal use of `np.sum(..., where=)`: this should not be done until at least 45 versions from now, and preferably way longer than that. Because doing so will break already released versions of Pandas, Dask, and other libraries with arraylike objects.
Cheers,
Ralf
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


On Wed, Jun 12, 2019 at 5:55 PM Marten van Kerkwijk < [hidden email]> wrote: Hi Ralf,
You're right, the problem is with the added keyword argument (which would appear also if we did not still have to support the old .sum method override but just dispatched to __array_ufunc__ with `np.add.reduce`  maybe worse given that it seems likely the reduce method has seen much less testing in __array_ufunc__ implementations).
Still, I do think the question stands: we implement a `nansum` for our ndarray class a certain way, and provide ways to override it (three now, in fact). Is it really reasonable to expect that we wait 4 versions for other packages to keep up with this, and thus get stuck with given internal implementations?
Aside: note that the present version of the nanfunctions relies on turning the arguments into arrays and copying 0s into it  that suggests that currently they do not work for duck arrays like Dask.
Agreed. We could safely rewrite things to use np.asarray(), without any need to worry about backends compatibility. From an API perspective, nothing would change  we already cast inputs into base numpy arrays inside the _replace_nan() routine.
On Wed, Jun 12, 2019 at 12:02 AM Stefan van der Walt < [hidden email]> wrote: On Tue, 11 Jun 2019 15:10:16 0400, Marten van Kerkwijk wrote:
> In a way, I brought it up mostly as a concrete example of an internal
> implementation which we cannot change to an objectively cleaner one because
> other packages rely on an outofdate numpy API.
I think this is not the right way to describe the problem (see below).
This, and the comments Nathaniel made on the array function thread, are
important to take note of. Would it be worth updating NEP 18 with a
list of pitfalls? Or should this be a new informational NEP that
discusses—on a higher level—the benefits, risks, and design
considerations of providing protocols?
That would be a nice thing to do (the higher level one), but in this case I think the issue has little to do with NEP 18. The summary of the issue in this thread is a little brief, so let me try to clarify.
1. np.sum gained a new `where=` keyword in 1.17.0 2. using np.sum(x) will detect a `x.sum` method if it's present and try to use that 3. the `_wrapreduction` utility that forwards the function to the method will compare signatures of np.sum and x.sum, and throw an error if there's a mismatch for any keywords that have a value other than the default np._NoValue
Code to check this: >>> x1 = np.arange(5) >>> x2 = np.asmatrix(x1) >>> np.sum(x1) # works >>> np.sum(x2) # works >>> np.sum(x1, where=x1>3) # works >>> np.sum(x2, where=x2>3) # _wrapreduction throws TypeError ... TypeError: sum() got an unexpected keyword argument 'where'
Note that this is not specific to np.matrix. Using pandas.Series you also get a TypeError: >>> y = pd.Series(x1) >>> np.sum(y) # works >>> np.sum(y, where=y>3) # pandas throws TypeError ...
TypeError: sum() got an unexpected keyword argument 'where'
The issue is that when we have this kind of forwarding logic, irrespective of how it's implemented, new keywords cannot be used until the arraylike objects with the methods that get forwarded to gain the same keyword.
tl;dr this is simply a cost we have to be aware of when either proposing to add new keywords, or when proposing any kind of dispatching logic (in this case `_wrapreduction`).
Regarding internal use of `np.sum(..., where=)`: this should not be done until at least 45 versions from now, and preferably way longer than that. Because doing so will break already released versions of Pandas, Dask, and other libraries with arraylike objects.
Cheers,
Ralf
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


On Wed, Jun 12, 2019 at 5:55 PM Marten van Kerkwijk < [hidden email]> wrote: Hi Ralf,
You're right, the problem is with the added keyword argument (which would appear also if we did not still have to support the old .sum method override but just dispatched to __array_ufunc__ with `np.add.reduce`  maybe worse given that it seems likely the reduce method has seen much less testing in __array_ufunc__ implementations).
Still, I do think the question stands: we implement a `nansum` for our ndarray class a certain way, and provide ways to override it (three now, in fact). Is it really reasonable to expect that we wait 4 versions for other packages to keep up with this, and thus get stuck with given internal implementations?
In general, I'd say yes. This is a problem of our own making. If an implementation uses np.sum on an argument that can be a subclass or duck array (i.e. we didn't coerce with asarray), then the code is calling out to outside of numpy. At that point we have effectively made something public. We can still change it, but only if we're sure that we don't break things in the process (i.e. we then insert a new asarray, something you're not happy about in general ....).
I don't get the "three ways to override nansum" by the way. There's only one I think: __array_function__. There may be more for the sum() method.
Another way to say the same thing: if a subclass does everything right, and we decide to add a new keyword to some function in 1.17 without considering those subclasses, then turning around straight after and saying "oh those subclasses rely on our old API, it may be okay to either break them or send them a deprecation warning" (which is I think what you were advocating for  your 4 and 3 options) is not reasonable.
Aside: note that the present version of the nanfunctions relies on turning the arguments into arrays and copying 0s into it  that suggests that currently they do not work for duck arrays like Dask.
There being an issue with matrix in the PR suggests there is an issue for subclasses at least?
Agreed. We could safely rewrite things to use np.asarray(), without any need to worry about backends compatibility. From an API perspective, nothing would change  we already cast inputs into base numpy arrays inside the _replace_nan() routine.
In that case yes, nansum can be rewritten. However, it's only because of the use of (as)array  if it were asanyarray that would already scupper the plan.
Cheers,
Ralf
On Wed, Jun 12, 2019 at 12:02 AM Stefan van der Walt < [hidden email]> wrote: On Tue, 11 Jun 2019 15:10:16 0400, Marten van Kerkwijk wrote:
> In a way, I brought it up mostly as a concrete example of an internal
> implementation which we cannot change to an objectively cleaner one because
> other packages rely on an outofdate numpy API.
I think this is not the right way to describe the problem (see below).
This, and the comments Nathaniel made on the array function thread, are
important to take note of. Would it be worth updating NEP 18 with a
list of pitfalls? Or should this be a new informational NEP that
discusses—on a higher level—the benefits, risks, and design
considerations of providing protocols?
That would be a nice thing to do (the higher level one), but in this case I think the issue has little to do with NEP 18. The summary of the issue in this thread is a little brief, so let me try to clarify.
1. np.sum gained a new `where=` keyword in 1.17.0 2. using np.sum(x) will detect a `x.sum` method if it's present and try to use that 3. the `_wrapreduction` utility that forwards the function to the method will compare signatures of np.sum and x.sum, and throw an error if there's a mismatch for any keywords that have a value other than the default np._NoValue
Code to check this: >>> x1 = np.arange(5) >>> x2 = np.asmatrix(x1) >>> np.sum(x1) # works >>> np.sum(x2) # works >>> np.sum(x1, where=x1>3) # works >>> np.sum(x2, where=x2>3) # _wrapreduction throws TypeError ... TypeError: sum() got an unexpected keyword argument 'where'
Note that this is not specific to np.matrix. Using pandas.Series you also get a TypeError: >>> y = pd.Series(x1) >>> np.sum(y) # works >>> np.sum(y, where=y>3) # pandas throws TypeError ...
TypeError: sum() got an unexpected keyword argument 'where'
The issue is that when we have this kind of forwarding logic, irrespective of how it's implemented, new keywords cannot be used until the arraylike objects with the methods that get forwarded to gain the same keyword.
tl;dr this is simply a cost we have to be aware of when either proposing to add new keywords, or when proposing any kind of dispatching logic (in this case `_wrapreduction`).
Regarding internal use of `np.sum(..., where=)`: this should not be done until at least 45 versions from now, and preferably way longer than that. Because doing so will break already released versions of Pandas, Dask, and other libraries with arraylike objects.
Cheers,
Ralf
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


Hi All,
`nanfunctions` use `asanyarray` currently, which as Ralf notes scuppers the plan to use `asarray`  sorry to have been sloppy with stating they "convert to array".
And, yes, I'll admit to forgetting that we are introducing `where` only in 1.17  clearly we cannot rely on other classes to have already implemented it!! (This is the problem with having done it myself  it seems ages ago!)
For the old implementation, there was indeed only one way to override (`__array_function__`) for nonsubclasses, but for the new implementation there are three ways, though the first two overlap:
 support `.sum()` with all its arguments and `__array_ufunc__` for `isnan`
 support `__array_ufunc__` including reductions (with all their arguments)  support `__array_function__`
I think that eventually one would like to move to the case where there is only one (`__array_ufunc__` in this case, since really it is just a combination of two ufuncs, isnan and add.reduce in this case).
Anyway, I guess this is still a good example to consider for how we should go about getting to a new implementation, ideally with just a singleway to override?
Indeed, how do we actually envisage deprecating the use of `__array_function__` for a given part of the numpy API? Are we allowed to go coldturkey if the new implementation is covered by `__array_ufunc__`?
All the best,
Marten
On Wed, Jun 12, 2019 at 5:55 PM Marten van Kerkwijk < [hidden email]> wrote: Hi Ralf,
You're right, the problem is with the added keyword argument (which would appear also if we did not still have to support the old .sum method override but just dispatched to __array_ufunc__ with `np.add.reduce`  maybe worse given that it seems likely the reduce method has seen much less testing in __array_ufunc__ implementations).
Still, I do think the question stands: we implement a `nansum` for our ndarray class a certain way, and provide ways to override it (three now, in fact). Is it really reasonable to expect that we wait 4 versions for other packages to keep up with this, and thus get stuck with given internal implementations?
In general, I'd say yes. This is a problem of our own making. If an implementation uses np.sum on an argument that can be a subclass or duck array (i.e. we didn't coerce with asarray), then the code is calling out to outside of numpy. At that point we have effectively made something public. We can still change it, but only if we're sure that we don't break things in the process (i.e. we then insert a new asarray, something you're not happy about in general ....).
I don't get the "three ways to override nansum" by the way. There's only one I think: __array_function__. There may be more for the sum() method.
Another way to say the same thing: if a subclass does everything right, and we decide to add a new keyword to some function in 1.17 without considering those subclasses, then turning around straight after and saying "oh those subclasses rely on our old API, it may be okay to either break them or send them a deprecation warning" (which is I think what you were advocating for  your 4 and 3 options) is not reasonable.
Aside: note that the present version of the nanfunctions relies on turning the arguments into arrays and copying 0s into it  that suggests that currently they do not work for duck arrays like Dask.
There being an issue with matrix in the PR suggests there is an issue for subclasses at least?
Agreed. We could safely rewrite things to use np.asarray(), without any need to worry about backends compatibility. From an API perspective, nothing would change  we already cast inputs into base numpy arrays inside the _replace_nan() routine.
In that case yes, nansum can be rewritten. However, it's only because of the use of (as)array  if it were asanyarray that would already scupper the plan.
Cheers,
Ralf
On Wed, Jun 12, 2019 at 12:02 AM Stefan van der Walt < [hidden email]> wrote: On Tue, 11 Jun 2019 15:10:16 0400, Marten van Kerkwijk wrote:
> In a way, I brought it up mostly as a concrete example of an internal
> implementation which we cannot change to an objectively cleaner one because
> other packages rely on an outofdate numpy API.
I think this is not the right way to describe the problem (see below).
This, and the comments Nathaniel made on the array function thread, are
important to take note of. Would it be worth updating NEP 18 with a
list of pitfalls? Or should this be a new informational NEP that
discusses—on a higher level—the benefits, risks, and design
considerations of providing protocols?
That would be a nice thing to do (the higher level one), but in this case I think the issue has little to do with NEP 18. The summary of the issue in this thread is a little brief, so let me try to clarify.
1. np.sum gained a new `where=` keyword in 1.17.0 2. using np.sum(x) will detect a `x.sum` method if it's present and try to use that 3. the `_wrapreduction` utility that forwards the function to the method will compare signatures of np.sum and x.sum, and throw an error if there's a mismatch for any keywords that have a value other than the default np._NoValue
Code to check this: >>> x1 = np.arange(5) >>> x2 = np.asmatrix(x1) >>> np.sum(x1) # works >>> np.sum(x2) # works >>> np.sum(x1, where=x1>3) # works >>> np.sum(x2, where=x2>3) # _wrapreduction throws TypeError ... TypeError: sum() got an unexpected keyword argument 'where'
Note that this is not specific to np.matrix. Using pandas.Series you also get a TypeError: >>> y = pd.Series(x1) >>> np.sum(y) # works >>> np.sum(y, where=y>3) # pandas throws TypeError ...
TypeError: sum() got an unexpected keyword argument 'where'
The issue is that when we have this kind of forwarding logic, irrespective of how it's implemented, new keywords cannot be used until the arraylike objects with the methods that get forwarded to gain the same keyword.
tl;dr this is simply a cost we have to be aware of when either proposing to add new keywords, or when proposing any kind of dispatching logic (in this case `_wrapreduction`).
Regarding internal use of `np.sum(..., where=)`: this should not be done until at least 45 versions from now, and preferably way longer than that. Because doing so will break already released versions of Pandas, Dask, and other libraries with arraylike objects.
Cheers,
Ralf
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


On Thu, Jun 13, 2019 at 2:51 PM Marten van Kerkwijk < [hidden email]> wrote: Hi All,
`nanfunctions` use `asanyarray` currently, which as Ralf notes scuppers the plan to use `asarray`  sorry to have been sloppy with stating they "convert to array".
And, yes, I'll admit to forgetting that we are introducing `where` only in 1.17  clearly we cannot rely on other classes to have already implemented it!! (This is the problem with having done it myself  it seems ages ago!)
For the old implementation, there was indeed only one way to override (`__array_function__`) for nonsubclasses, but for the new implementation there are three ways, though the first two overlap:
 support `.sum()` with all its arguments and `__array_ufunc__` for `isnan`
 support `__array_ufunc__` including reductions (with all their arguments)  support `__array_function__`
I think that eventually one would like to move to the case where there is only one (`__array_ufunc__` in this case, since really it is just a combination of two ufuncs, isnan and add.reduce in this case).
I don't think the other ones are really ways of "overriding nansum". They happen to work, but rely on internal implementation details and then overriding functions inside those details.
It sounds like you're suggestion a new way of declaring "this is a canonical implementation that duck arrays can rely on". I can see that happening, but we probably want some standard way to mark those functions, have a note in the docs and some tests.
Anyway, I guess this is still a good example to consider for how we should go about getting to a new implementation, ideally with just a singleway to override?
Indeed, how do we actually envisage deprecating the use of `__array_function__` for a given part of the numpy API? Are we allowed to go coldturkey if the new implementation is covered by `__array_ufunc__`?
I think __array_function__ is still the best way to do this (that's the only actual override, so most robust and performant likely), so I don't see any reason for a deprecation.
Cheers,
Ralf
On Wed, Jun 12, 2019 at 5:55 PM Marten van Kerkwijk < [hidden email]> wrote: Hi Ralf,
You're right, the problem is with the added keyword argument (which would appear also if we did not still have to support the old .sum method override but just dispatched to __array_ufunc__ with `np.add.reduce`  maybe worse given that it seems likely the reduce method has seen much less testing in __array_ufunc__ implementations).
Still, I do think the question stands: we implement a `nansum` for our ndarray class a certain way, and provide ways to override it (three now, in fact). Is it really reasonable to expect that we wait 4 versions for other packages to keep up with this, and thus get stuck with given internal implementations?
In general, I'd say yes. This is a problem of our own making. If an implementation uses np.sum on an argument that can be a subclass or duck array (i.e. we didn't coerce with asarray), then the code is calling out to outside of numpy. At that point we have effectively made something public. We can still change it, but only if we're sure that we don't break things in the process (i.e. we then insert a new asarray, something you're not happy about in general ....).
I don't get the "three ways to override nansum" by the way. There's only one I think: __array_function__. There may be more for the sum() method.
Another way to say the same thing: if a subclass does everything right, and we decide to add a new keyword to some function in 1.17 without considering those subclasses, then turning around straight after and saying "oh those subclasses rely on our old API, it may be okay to either break them or send them a deprecation warning" (which is I think what you were advocating for  your 4 and 3 options) is not reasonable.
Aside: note that the present version of the nanfunctions relies on turning the arguments into arrays and copying 0s into it  that suggests that currently they do not work for duck arrays like Dask.
There being an issue with matrix in the PR suggests there is an issue for subclasses at least?
Agreed. We could safely rewrite things to use np.asarray(), without any need to worry about backends compatibility. From an API perspective, nothing would change  we already cast inputs into base numpy arrays inside the _replace_nan() routine.
In that case yes, nansum can be rewritten. However, it's only because of the use of (as)array  if it were asanyarray that would already scupper the plan.
Cheers,
Ralf
On Wed, Jun 12, 2019 at 12:02 AM Stefan van der Walt < [hidden email]> wrote: On Tue, 11 Jun 2019 15:10:16 0400, Marten van Kerkwijk wrote:
> In a way, I brought it up mostly as a concrete example of an internal
> implementation which we cannot change to an objectively cleaner one because
> other packages rely on an outofdate numpy API.
I think this is not the right way to describe the problem (see below).
This, and the comments Nathaniel made on the array function thread, are
important to take note of. Would it be worth updating NEP 18 with a
list of pitfalls? Or should this be a new informational NEP that
discusses—on a higher level—the benefits, risks, and design
considerations of providing protocols?
That would be a nice thing to do (the higher level one), but in this case I think the issue has little to do with NEP 18. The summary of the issue in this thread is a little brief, so let me try to clarify.
1. np.sum gained a new `where=` keyword in 1.17.0 2. using np.sum(x) will detect a `x.sum` method if it's present and try to use that 3. the `_wrapreduction` utility that forwards the function to the method will compare signatures of np.sum and x.sum, and throw an error if there's a mismatch for any keywords that have a value other than the default np._NoValue
Code to check this: >>> x1 = np.arange(5) >>> x2 = np.asmatrix(x1) >>> np.sum(x1) # works >>> np.sum(x2) # works >>> np.sum(x1, where=x1>3) # works >>> np.sum(x2, where=x2>3) # _wrapreduction throws TypeError ... TypeError: sum() got an unexpected keyword argument 'where'
Note that this is not specific to np.matrix. Using pandas.Series you also get a TypeError: >>> y = pd.Series(x1) >>> np.sum(y) # works >>> np.sum(y, where=y>3) # pandas throws TypeError ...
TypeError: sum() got an unexpected keyword argument 'where'
The issue is that when we have this kind of forwarding logic, irrespective of how it's implemented, new keywords cannot be used until the arraylike objects with the methods that get forwarded to gain the same keyword.
tl;dr this is simply a cost we have to be aware of when either proposing to add new keywords, or when proposing any kind of dispatching logic (in this case `_wrapreduction`).
Regarding internal use of `np.sum(..., where=)`: this should not be done until at least 45 versions from now, and preferably way longer than that. Because doing so will break already released versions of Pandas, Dask, and other libraries with arraylike objects.
Cheers,
Ralf
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


Hi Ralf, others,
Anyway, I guess this is still a good example to consider for how we should go about getting to a new implementation, ideally with just a singleway to override?
Indeed, how do we actually envisage deprecating the use of `__array_function__` for a given part of the numpy API? Are we allowed to go coldturkey if the new implementation is covered by `__array_ufunc__`?
I think __array_function__ is still the best way to do this (that's the only actual override, so most robust and performant likely), so I don't see any reason for a deprecation.
Yes, I fear I have to agree for the nanfunctions, at least for now...
But how about `np.sum` itself? Right now, it is overridden by __array_function__ but classes without __array_function__ support can also override it through the method lookup and through __array_ufunc__.
Would/should there be a point where we just have `sum = np.add.reduce` and drop other overrides? If so, how do we get there?
One option might be start reversing the order in `_wrapreduction`  try `__array_ufunc__` if it is defined and only if that fails try the `.sum` method.
All the best,
Marten
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


On Thu, Jun 13, 2019 at 9:35 AM Marten van Kerkwijk < [hidden email]> wrote: Hi Ralf, others,
Anyway, I guess this is still a good example to consider for how we should go about getting to a new implementation, ideally with just a singleway to override?
Indeed, how do we actually envisage deprecating the use of `__array_function__` for a given part of the numpy API? Are we allowed to go coldturkey if the new implementation is covered by `__array_ufunc__`?
I think __array_function__ is still the best way to do this (that's the only actual override, so most robust and performant likely), so I don't see any reason for a deprecation.
Yes, I fear I have to agree for the nanfunctions, at least for now...
But how about `np.sum` itself? Right now, it is overridden by __array_function__ but classes without __array_function__ support can also override it through the method lookup and through __array_ufunc__.
Would/should there be a point where we just have `sum = np.add.reduce` and drop other overrides? If so, how do we get there?
One option might be start reversing the order in `_wrapreduction`  try `__array_ufunc__` if it is defined and only if that fails try the `.sum` method.
Yes, I think we would need to do this sort of thing. It's a bit of trouble, but probably doable with some decorator magic. It would indeed be nice for sum() to eventually just be np.add.reduce, though to be honest I'm not entirely sure it's worth the trouble of a long deprecation cycle  people have been relying on the fallback calling of methods for a long time.
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


On Thu, Jun 13, 2019 at 12:46 PM Stephan Hoyer < [hidden email]> wrote: <snip>
But how about `np.sum` itself? Right now, it is overridden by __array_function__ but classes without __array_function__ support can also override it through the method lookup and through __array_ufunc__.
Would/should there be a point where we just have `sum = np.add.reduce` and drop other overrides? If so, how do we get there?
One option might be start reversing the order in `_wrapreduction`  try `__array_ufunc__` if it is defined and only if that fails try the `.sum` method.
Yes, I think we would need to do this sort of thing. It's a bit of trouble, but probably doable with some decorator magic. It would indeed be nice for sum() to eventually just be np.add.reduce, though to be honest I'm not entirely sure it's worth the trouble of a long deprecation cycle  people have been relying on the fallback calling of methods for a long time.
I guess the one immediate question is whether `np.sum` and the like should be overridden by `__array_function__` at all, given that what should be the future recommended override already works.
(And, yes, arguably too late to change it now!)
 Marten
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


On Thu, Jun 13, 2019 at 7:07 PM Marten van Kerkwijk < [hidden email]> wrote:
On Thu, Jun 13, 2019 at 12:46 PM Stephan Hoyer < [hidden email]> wrote: <snip>
But how about `np.sum` itself? Right now, it is overridden by __array_function__ but classes without __array_function__ support can also override it through the method lookup and through __array_ufunc__.
Would/should there be a point where we just have `sum = np.add.reduce` and drop other overrides? If so, how do we get there?
One option might be start reversing the order in `_wrapreduction`  try `__array_ufunc__` if it is defined and only if that fails try the `.sum` method.
Yes, I think we would need to do this sort of thing. It's a bit of trouble, but probably doable with some decorator magic. It would indeed be nice for sum() to eventually just be np.add.reduce, though to be honest I'm not entirely sure it's worth the trouble of a long deprecation cycle  people have been relying on the fallback calling of methods for a long time.
I guess the one immediate question is whether `np.sum` and the like should be overridden by `__array_function__` at all, given that what should be the future recommended override already works.
I'm not sure I understand the rationale for this. Design consistency matters. Right now the rule is simple: all ufuncs have __array_ufunc__, and other functions __array_function__. Why keep on tweaking things for little benefit?
Ralf
(And, yes, arguably too late to change it now!)
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


Hi Ralf,
I guess the one immediate question is whether `np.sum` and the like should be overridden by `__array_function__` at all, given that what should be the future recommended override already works.
I'm not sure I understand the rationale for this. Design consistency matters. Right now the rule is simple: all ufuncs have __array_ufunc__, and other functions __array_function__. Why keep on tweaking things for little benefit?
I'm mostly trying to understand how we would actually change things. I guess your quite logical argument for consistency is that it requires `np.sum is np.add.reduce`. But to do that, one would have to get rid of the `.sum()` method override, and then deprecate using `__array_function__` on `np.sum`.
A perhaps clearer example is `np.isposinf`, where the implementation truly consists of ufunc calls only (indeed, it is in `lib/ufunc_like`). I guess following your consistency logic, one would not remove the `__array_function__` override until it actually became a ufunc itself.
Anyway, I think this discussion has been useful, if only in making it yet more clear how difficult it is to deprecate anything!
All the best,
Marten
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


On Thu, Jun 13, 2019 at 9:43 PM Marten van Kerkwijk < [hidden email]> wrote: Hi Ralf,
I guess the one immediate question is whether `np.sum` and the like should be overridden by `__array_function__` at all, given that what should be the future recommended override already works.
I'm not sure I understand the rationale for this. Design consistency matters. Right now the rule is simple: all ufuncs have __array_ufunc__, and other functions __array_function__. Why keep on tweaking things for little benefit?
I'm mostly trying to understand how we would actually change things.
I think we do have ways to deprecate things and reason about how to make the tradeoffs to do so. Function overrides are not a whole lot different I think, we can apply the same method (plus there's a special bit in NEP 18 that we reserve the right to change functions into ufuncs and use `__array_ufunc__`).
I guess your quite logical argument for consistency is that it requires `np.sum is np.add.reduce`. But to do that, one would have to get rid of the `.sum()` method override, and then deprecate using `__array_function__` on `np.sum`.
A perhaps clearer example is `np.isposinf`, where the implementation truly consists of ufunc calls only (indeed, it is in `lib/ufunc_like`). I guess following your consistency logic, one would not remove the `__array_function__` override until it actually became a ufunc itself.
Correct.
More importantly, I think we should not even consider *discussing* removing` __array_function__` from np.isposinf (or any similar one off situation) before there's a new bigger picture design. This is not about "deprecation is hard", this is about doing things with purpose rather than adhoc, as well as recognizing that lots of small changes are a major drain on our limited maintainer resources. About the latter issue I wrote a blog post recently, perhaps that clarifies the previous sentence a bit and gives some more insight in my perspective: https://rgommers.github.io/2019/06/thecostofanopensourcecontribution/
Cheers,
Ralf
Anyway, I think this discussion has been useful, if only in making it yet more clear how difficult it is to deprecate anything!
All the best,
Marten
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


Hi Ralf,
Thanks both for the reply and sharing the link. I recognize much (from both sides!).
<snip>
More importantly, I think we should not even consider *discussing* removing` __array_function__` from np.isposinf (or any similar one off situation) before there's a new bigger picture design. This is not about "deprecation is hard", this is about doing things with purpose rather than adhoc, as well as recognizing that lots of small changes are a major drain on our limited maintainer resources. About the latter issue I wrote a blog post recently, perhaps that clarifies the previous sentence a bit and gives some more insight in my perspective: https://rgommers.github.io/2019/06/thecostofanopensourcecontribution/
Yes, I definitely did not mean to imply that a goal was to change just `isposinf`, `sum`, or `nansum` (the goal of the PR that started this thread was to clean up the whole `nanfunctions` module). Rather, to use them as examples to see what policy there actually is or should be; and I do worry that with __array_function__, however happy I am that it now exists (finally, Quantity can be concatenated!), we're going the Microsoft route of just layering on top of old code if even for the simplest cases there is no obvious path for how to remove it.
Anyway, this discussion is probably gotten beyond the point where much is added. I'll close the `nansum` PR.
All the best,
Marten
p.s. I would say that deprecations within numpy currently *are* hard. E.g., the rate at which we add `DEPRECATED` to the C code is substantially larger than that with which we actually remove any longdeprecated behaviour.
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion

12
