asanyarray vs. asarray

classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

asanyarray vs. asarray

mattip

Was there discussion around which of `asarray` or asanyarray` to prefer? PR 11162, https://github.com/numpy/numpy/pull/11162, proposes `asanyarray` in place of `asarray` at the entrance to `_quantile_ureduce_func` to preserve ndarray subclasses. Should we be looking into changing all the `asarray` calls into `asanyarray`?


Matti


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: asanyarray vs. asarray

einstein.edison
Hi all

On Friday, Oct 19, 2018 at 10:28 AM, Matti Picus <[hidden email]> wrote:

Was there discussion around which of `asarray` or asanyarray` to prefer? PR 11162, https://github.com/numpy/numpy/pull/11162, proposes `asanyarray` in place of `asarray` at the entrance to `_quantile_ureduce_func` to preserve ndarray subclasses. Should we be looking into changing all the `asarray` calls into `asanyarray`?

I suspect that this will cause a large number of problems around np.matrix, so unless we deprecate that, this might cause a large amount of problems. The problem with np.matrix is that it’s a subclass, but it’s not substitutable for the base class, and so violates SOLID.

There are efforts to remove np.matrix, with the largest consumer being scipy.sparse, so unless that’s revamped, deprecating np.matrix is kind of hard to do.


Matti

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

Best Regards,
Hameer Abbasi

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: asanyarray vs. asarray

Marten van Kerkwijk
There are exceptions for `matrix` in quite a few places, and there now is warning for `maxtrix` - it might not be bad to use `asanyarray` and add an exception for `maxtrix`. Indeed, I quite like the suggestion by Eric Wieser to just add the exception to `asanyarray` itself - that way when matrix is truly deprecated, it will be a very easy change.

-- Marten

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: asanyarray vs. asarray

Stephan Hoyer-2
I don't think it makes much sense to change NumPy's existing usage of asarray() to asanyarray() unless we add subok=True arguments (which default to False). But this ends up cluttering NumPy's public API, which is also undesirable. The preferred way to override NumPy functions going forward should be __array_function__.

On Fri, Oct 19, 2018 at 8:13 AM Marten van Kerkwijk <[hidden email]> wrote:
There are exceptions for `matrix` in quite a few places, and there now is warning for `maxtrix` - it might not be bad to use `asanyarray` and add an exception for `maxtrix`. Indeed, I quite like the suggestion by Eric Wieser to just add the exception to `asanyarray` itself - that way when matrix is truly deprecated, it will be a very easy change.

-- Marten
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: asanyarray vs. asarray

einstein.edison
Hi!

On Friday, Oct 19, 2018 at 6:09 PM, Stephan Hoyer <[hidden email]> wrote:
I don't think it makes much sense to change NumPy's existing usage of asarray() to asanyarray() unless we add subok=True arguments (which default to False). But this ends up cluttering NumPy's public API, which is also undesirable.

Agreed so far.

The preferred way to override NumPy functions going forward should be __array_function__.


I think we should “soft support” i.e. allow but consider unsupported, the case where one of NumPy’s functions is implemented in terms of others and “passing through” an array results in the correct behaviour for that array.


On Fri, Oct 19, 2018 at 8:13 AM Marten van Kerkwijk <[hidden email]> wrote:
There are exceptions for `matrix` in quite a few places, and there now is warning for `maxtrix` - it might not be bad to use `asanyarray` and add an exception for `maxtrix`. Indeed, I quite like the suggestion by Eric Wieser to just add the exception to `asanyarray` itself - that way when matrix is truly deprecated, it will be a very easy change.

-- Marten
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

Best Regards,
Hameer Abbasi

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: asanyarray vs. asarray

ralfgommers


On Fri, Oct 19, 2018 at 4:15 PM Hameer Abbasi <[hidden email]> wrote:
Hi!

On Friday, Oct 19, 2018 at 6:09 PM, Stephan Hoyer <[hidden email]> wrote:
I don't think it makes much sense to change NumPy's existing usage of asarray() to asanyarray() unless we add subok=True arguments (which default to False). But this ends up cluttering NumPy's public API, which is also undesirable.

Agreed so far.

I'm not sure I agree. "subok" is very unpythonic; the average numpy library function should work fine for a well-behaved subclass (i.e. most things out there except np.matrix).

The preferred way to override NumPy functions going forward should be __array_function__.


I think we should “soft support” i.e. allow but consider unsupported, the case where one of NumPy’s functions is implemented in terms of others and “passing through” an array results in the correct behaviour for that array.

I don't think we have or want such a concept as "soft support". We intend to not break anything that now has asanyarray, i.e. it's supported and ideally we have regression tests for all such functions. For anything we transition over from asarray to asanyarray, PRs should come with new tests.



On Fri, Oct 19, 2018 at 8:13 AM Marten van Kerkwijk <[hidden email]> wrote:
There are exceptions for `matrix` in quite a few places, and there now is warning for `maxtrix` - it might not be bad to use `asanyarray` and add an exception for `maxtrix`. Indeed, I quite like the suggestion by Eric Wieser to just add the exception to `asanyarray` itself - that way when matrix is truly deprecated, it will be a very easy change.
I don't quite understand this. Adding exceptions is not deprecation - we then may as well just rip np.matrix out straight away.

What I suggested in the call about this issue is that it's not very effective to treat functions like percentile/quantile one by one without an overarching strategy. A way forward could be for someone to write an overview of which sets of functions now have asanyarray (and actually work with subclasses), which ones we can and want to change now, and which ones we can and want to change after np.matrix is gone. Also, some guidelines for new functions that we add to numpy would be handy. I suspect we've been adding new functions that use asarray rather than asanyarray, which is probably undesired.

Cheers,
Ralf

 

-- Marten
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

Best Regards,
Hameer Abbasi
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: asanyarray vs. asarray

Nathaniel Smith


On Fri, Oct 19, 2018 at 3:28 PM, Ralf Gommers <[hidden email]> wrote:


On Fri, Oct 19, 2018 at 4:15 PM Hameer Abbasi <[hidden email]> wrote:
Hi!

On Friday, Oct 19, 2018 at 6:09 PM, Stephan Hoyer <[hidden email]> wrote:
I don't think it makes much sense to change NumPy's existing usage of asarray() to asanyarray() unless we add subok=True arguments (which default to False). But this ends up cluttering NumPy's public API, which is also undesirable.

Agreed so far.

I'm not sure I agree. "subok" is very unpythonic; the average numpy library function should work fine for a well-behaved subclass (i.e. most things out there except np.matrix).

Masked arrays also tend to break code that's not expecting them (e.g. on a masked array, arr.sum()/arr.size will silently compute some meaningless nonsense instead of the mean, and there are lots of formulas out there that have some similarities with 'mean'). And people do all kinds of weird things in third-party array subclasses.

Obviously we can't remove asanyarray or break existing code that assumes particular numpy functions use asanyarray, but fundamentally asanyarray is just not an API that makes sense or can be supported in a general way, and our overall goal is to get people to gradually transition away from using ndarray subclasses in general. That's why we're doing all this work to make duck arrays work. So extending asanyarray support doesn't seem like a good priority to spend our limited resources on, to me.

-n

--
Nathaniel J. Smith -- https://vorpus.org

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: asanyarray vs. asarray

Stephan Hoyer-2
In reply to this post by ralfgommers
I think we should “soft support” i.e. allow but consider unsupported, the case where one of NumPy’s functions is implemented in terms of others and “passing through” an array results in the correct behaviour for that array.

I don't think we have or want such a concept as "soft support". We intend to not break anything that now has asanyarray, i.e. it's supported and ideally we have regression tests for all such functions. For anything we transition over from asarray to asanyarray, PRs should come with new tests.

The problem with asanyarray() is that there isn't any well defined subclass API for NumPy, beyond "mostly works like a NumPy array." If every NumPy subclass strictly obeyed the Liskov Substitution Principle asanyarray() would be fine, but in practice every subclass I've encountered deviates from  the behavior of numpy.ndarray in some way.

The means the NumPy codebase has ended up littered with hacks/workarounds to support various specific subclasses, and new subclasses still don't work reliably. This makes it challenging to change existing code. For an example of how bad this is gotten, look at all the work-arounds I had to add to support np.testing.assert_array_equal() on ndarray subclasses in this recent PR:

My hope is that __array_function__ will finally let us put a stop to this by offering a better alternative to subclassing.

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: asanyarray vs. asarray

ralfgommers
In reply to this post by ralfgommers


On Fri, Oct 19, 2018 at 10:28 PM Ralf Gommers <[hidden email]> wrote:


On Fri, Oct 19, 2018 at 4:15 PM Hameer Abbasi <[hidden email]> wrote:
Hi!

On Friday, Oct 19, 2018 at 6:09 PM, Stephan Hoyer <[hidden email]> wrote:
I don't think it makes much sense to change NumPy's existing usage of asarray() to asanyarray() unless we add subok=True arguments (which default to False). But this ends up cluttering NumPy's public API, which is also undesirable.

Agreed so far.

I'm not sure I agree. "subok" is very unpythonic; the average numpy library function should work fine for a well-behaved subclass (i.e. most things out there except np.matrix).

The preferred way to override NumPy functions going forward should be __array_function__.


I think we should “soft support” i.e. allow but consider unsupported, the case where one of NumPy’s functions is implemented in terms of others and “passing through” an array results in the correct behaviour for that array.

I don't think we have or want such a concept as "soft support". We intend to not break anything that now has asanyarray, i.e. it's supported and ideally we have regression tests for all such functions. For anything we transition over from asarray to asanyarray, PRs should come with new tests.



On Fri, Oct 19, 2018 at 8:13 AM Marten van Kerkwijk <[hidden email]> wrote:
There are exceptions for `matrix` in quite a few places, and there now is warning for `maxtrix` - it might not be bad to use `asanyarray` and add an exception for `maxtrix`. Indeed, I quite like the suggestion by Eric Wieser to just add the exception to `asanyarray` itself - that way when matrix is truly deprecated, it will be a very easy change.
I don't quite understand this. Adding exceptions is not deprecation - we then may as well just rip np.matrix out straight away.

What I suggested in the call about this issue is that it's not very effective to treat functions like percentile/quantile one by one without an overarching strategy. A way forward could be for someone to write an overview of which sets of functions now have asanyarray (and actually work with subclasses), which ones we can and want to change now, and which ones we can and want to change after np.matrix is gone. Also, some guidelines for new functions that we add to numpy would be handy. I suspect we've been adding new functions that use asarray rather than asanyarray, which is probably undesired.

Thanks Nathaniel and Stephan. Your comments on my other two points are both clear and correct (and have been made a number of times before). I think the "write an overview so we can stop making ad-hoc decisions and having these discussions" is the most important point I was trying to make though. If we had such a doc and it concluded "hence we don't change anything, __array_function__ is the only way to go" then we can just close PRs like https://github.com/numpy/numpy/pull/11162 straight away.

Cheers,
Ralf


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: asanyarray vs. asarray

Marten van Kerkwijk
Hi All,

It seems there are two extreme possibilities for general functions:
1. Put `asarray` everywhere. The main benefit that I can see is that even if people put in list instead of arrays, one is guaranteed to have shape, dtype, etc. But it seems a bit like calling `int` on everything that might get used as an index, instead of letting the actual indexing do the proper thing and call `__index__`.
2. Do not coerce at all, but rather write code assuming something is an array already. This will often, but not always, just work for array mimics, with coercion done only where necessary (e.g., in lower-lying C code such as that of the ufuncs which has a smaller API surface and can be overridden more easily).

The current __array_function__ work may well provide us with a way to combine both, if we (over time) move the coercion inside `ndarray.__array_function__` so that the actual implementation *can* assume it deals with pure ndarray - then, when relevant, calling that implementation will be what subclasses/duck arrays can happily do (and it is up to them to ensure this works).

Of course, the above does not really answer what to do in the meantime. But perhaps it helps in thinking of what we are actually aiming for.

One last thing: could we please stop bashing subclasses? One can subclass essentially everything in python, often to great advantage. Subclasses such as MaskedArray and, yes, Quantity, are widely used, and if they cause problems perhaps that should be seen as a sign that ndarray subclassing should be made easier and clearer.

All the best,

Marten


On Fri, Oct 19, 2018 at 7:02 PM Ralf Gommers <[hidden email]> wrote:


On Fri, Oct 19, 2018 at 10:28 PM Ralf Gommers <[hidden email]> wrote:


On Fri, Oct 19, 2018 at 4:15 PM Hameer Abbasi <[hidden email]> wrote:
Hi!

On Friday, Oct 19, 2018 at 6:09 PM, Stephan Hoyer <[hidden email]> wrote:
I don't think it makes much sense to change NumPy's existing usage of asarray() to asanyarray() unless we add subok=True arguments (which default to False). But this ends up cluttering NumPy's public API, which is also undesirable.

Agreed so far.

I'm not sure I agree. "subok" is very unpythonic; the average numpy library function should work fine for a well-behaved subclass (i.e. most things out there except np.matrix).

The preferred way to override NumPy functions going forward should be __array_function__.


I think we should “soft support” i.e. allow but consider unsupported, the case where one of NumPy’s functions is implemented in terms of others and “passing through” an array results in the correct behaviour for that array.

I don't think we have or want such a concept as "soft support". We intend to not break anything that now has asanyarray, i.e. it's supported and ideally we have regression tests for all such functions. For anything we transition over from asarray to asanyarray, PRs should come with new tests.



On Fri, Oct 19, 2018 at 8:13 AM Marten van Kerkwijk <[hidden email]> wrote:
There are exceptions for `matrix` in quite a few places, and there now is warning for `maxtrix` - it might not be bad to use `asanyarray` and add an exception for `maxtrix`. Indeed, I quite like the suggestion by Eric Wieser to just add the exception to `asanyarray` itself - that way when matrix is truly deprecated, it will be a very easy change.
I don't quite understand this. Adding exceptions is not deprecation - we then may as well just rip np.matrix out straight away.

What I suggested in the call about this issue is that it's not very effective to treat functions like percentile/quantile one by one without an overarching strategy. A way forward could be for someone to write an overview of which sets of functions now have asanyarray (and actually work with subclasses), which ones we can and want to change now, and which ones we can and want to change after np.matrix is gone. Also, some guidelines for new functions that we add to numpy would be handy. I suspect we've been adding new functions that use asarray rather than asanyarray, which is probably undesired.

Thanks Nathaniel and Stephan. Your comments on my other two points are both clear and correct (and have been made a number of times before). I think the "write an overview so we can stop making ad-hoc decisions and having these discussions" is the most important point I was trying to make though. If we had such a doc and it concluded "hence we don't change anything, __array_function__ is the only way to go" then we can just close PRs like https://github.com/numpy/numpy/pull/11162 straight away.

Cheers,
Ralf

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: asanyarray vs. asarray

Eric Wieser

Subclasses such as MaskedArray and, yes, Quantity, are widely used, and if they cause problems perhaps that should be seen as a sign that ndarray subclassing should be made easier and clearer.

Both maskedarray and quantity seem like something that would make more sense at the dtype level if our dtype system was easier to extend. It might be good to compile a list of subclassing applications, and split them into “this ought to be a dtype” and “this ought to be a different type of container”.


On Fri, 19 Oct 2018 at 18:24 Marten van Kerkwijk <[hidden email]> wrote:
Hi All,

It seems there are two extreme possibilities for general functions:
1. Put `asarray` everywhere. The main benefit that I can see is that even if people put in list instead of arrays, one is guaranteed to have shape, dtype, etc. But it seems a bit like calling `int` on everything that might get used as an index, instead of letting the actual indexing do the proper thing and call `__index__`.
2. Do not coerce at all, but rather write code assuming something is an array already. This will often, but not always, just work for array mimics, with coercion done only where necessary (e.g., in lower-lying C code such as that of the ufuncs which has a smaller API surface and can be overridden more easily).

The current __array_function__ work may well provide us with a way to combine both, if we (over time) move the coercion inside `ndarray.__array_function__` so that the actual implementation *can* assume it deals with pure ndarray - then, when relevant, calling that implementation will be what subclasses/duck arrays can happily do (and it is up to them to ensure this works).

Of course, the above does not really answer what to do in the meantime. But perhaps it helps in thinking of what we are actually aiming for.

One last thing: could we please stop bashing subclasses? One can subclass essentially everything in python, often to great advantage. Subclasses such as MaskedArray and, yes, Quantity, are widely used, and if they cause problems perhaps that should be seen as a sign that ndarray subclassing should be made easier and clearer.

All the best,

Marten


On Fri, Oct 19, 2018 at 7:02 PM Ralf Gommers <[hidden email]> wrote:


On Fri, Oct 19, 2018 at 10:28 PM Ralf Gommers <[hidden email]> wrote:


On Fri, Oct 19, 2018 at 4:15 PM Hameer Abbasi <[hidden email]> wrote:
Hi!

On Friday, Oct 19, 2018 at 6:09 PM, Stephan Hoyer <[hidden email]> wrote:
I don't think it makes much sense to change NumPy's existing usage of asarray() to asanyarray() unless we add subok=True arguments (which default to False). But this ends up cluttering NumPy's public API, which is also undesirable.

Agreed so far.

I'm not sure I agree. "subok" is very unpythonic; the average numpy library function should work fine for a well-behaved subclass (i.e. most things out there except np.matrix).

The preferred way to override NumPy functions going forward should be __array_function__.


I think we should “soft support” i.e. allow but consider unsupported, the case where one of NumPy’s functions is implemented in terms of others and “passing through” an array results in the correct behaviour for that array.

I don't think we have or want such a concept as "soft support". We intend to not break anything that now has asanyarray, i.e. it's supported and ideally we have regression tests for all such functions. For anything we transition over from asarray to asanyarray, PRs should come with new tests.



On Fri, Oct 19, 2018 at 8:13 AM Marten van Kerkwijk <[hidden email]> wrote:
There are exceptions for `matrix` in quite a few places, and there now is warning for `maxtrix` - it might not be bad to use `asanyarray` and add an exception for `maxtrix`. Indeed, I quite like the suggestion by Eric Wieser to just add the exception to `asanyarray` itself - that way when matrix is truly deprecated, it will be a very easy change.
I don't quite understand this. Adding exceptions is not deprecation - we then may as well just rip np.matrix out straight away.

What I suggested in the call about this issue is that it's not very effective to treat functions like percentile/quantile one by one without an overarching strategy. A way forward could be for someone to write an overview of which sets of functions now have asanyarray (and actually work with subclasses), which ones we can and want to change now, and which ones we can and want to change after np.matrix is gone. Also, some guidelines for new functions that we add to numpy would be handy. I suspect we've been adding new functions that use asarray rather than asanyarray, which is probably undesired.

Thanks Nathaniel and Stephan. Your comments on my other two points are both clear and correct (and have been made a number of times before). I think the "write an overview so we can stop making ad-hoc decisions and having these discussions" is the most important point I was trying to make though. If we had such a doc and it concluded "hence we don't change anything, __array_function__ is the only way to go" then we can just close PRs like https://github.com/numpy/numpy/pull/11162 straight away.

Cheers,
Ralf

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: asanyarray vs. asarray

Charles R Harris


On Fri, Oct 19, 2018 at 7:50 PM Eric Wieser <[hidden email]> wrote:

Subclasses such as MaskedArray and, yes, Quantity, are widely used, and if they cause problems perhaps that should be seen as a sign that ndarray subclassing should be made easier and clearer.

Both maskedarray and quantity seem like something that would make more sense at the dtype level if our dtype system was easier to extend. It might be good to compile a list of subclassing applications, and split them into “this ought to be a dtype” and “this ought to be a different type of container”.


Wes Mckinney has been benchmarking masks vs sentinel values for arrow: http://wesmckinney.com/blog/bitmaps-vs-sentinel-values/. The (bit) masks are faster. I'm not convinced dtypes are the way to go. 

Chuck


On Fri, 19 Oct 2018 at 18:24 Marten van Kerkwijk <[hidden email]> wrote:
Hi All,

It seems there are two extreme possibilities for general functions:
1. Put `asarray` everywhere. The main benefit that I can see is that even if people put in list instead of arrays, one is guaranteed to have shape, dtype, etc. But it seems a bit like calling `int` on everything that might get used as an index, instead of letting the actual indexing do the proper thing and call `__index__`.
2. Do not coerce at all, but rather write code assuming something is an array already. This will often, but not always, just work for array mimics, with coercion done only where necessary (e.g., in lower-lying C code such as that of the ufuncs which has a smaller API surface and can be overridden more easily).

The current __array_function__ work may well provide us with a way to combine both, if we (over time) move the coercion inside `ndarray.__array_function__` so that the actual implementation *can* assume it deals with pure ndarray - then, when relevant, calling that implementation will be what subclasses/duck arrays can happily do (and it is up to them to ensure this works).

Of course, the above does not really answer what to do in the meantime. But perhaps it helps in thinking of what we are actually aiming for.

One last thing: could we please stop bashing subclasses? One can subclass essentially everything in python, often to great advantage. Subclasses such as MaskedArray and, yes, Quantity, are widely used, and if they cause problems perhaps that should be seen as a sign that ndarray subclassing should be made easier and clearer.

All the best,

Marten


On Fri, Oct 19, 2018 at 7:02 PM Ralf Gommers <[hidden email]> wrote:


On Fri, Oct 19, 2018 at 10:28 PM Ralf Gommers <[hidden email]> wrote:


On Fri, Oct 19, 2018 at 4:15 PM Hameer Abbasi <[hidden email]> wrote:
Hi!

On Friday, Oct 19, 2018 at 6:09 PM, Stephan Hoyer <[hidden email]> wrote:
I don't think it makes much sense to change NumPy's existing usage of asarray() to asanyarray() unless we add subok=True arguments (which default to False). But this ends up cluttering NumPy's public API, which is also undesirable.

Agreed so far.

I'm not sure I agree. "subok" is very unpythonic; the average numpy library function should work fine for a well-behaved subclass (i.e. most things out there except np.matrix).

The preferred way to override NumPy functions going forward should be __array_function__.


I think we should “soft support” i.e. allow but consider unsupported, the case where one of NumPy’s functions is implemented in terms of others and “passing through” an array results in the correct behaviour for that array.

I don't think we have or want such a concept as "soft support". We intend to not break anything that now has asanyarray, i.e. it's supported and ideally we have regression tests for all such functions. For anything we transition over from asarray to asanyarray, PRs should come with new tests.



On Fri, Oct 19, 2018 at 8:13 AM Marten van Kerkwijk <[hidden email]> wrote:
There are exceptions for `matrix` in quite a few places, and there now is warning for `maxtrix` - it might not be bad to use `asanyarray` and add an exception for `maxtrix`. Indeed, I quite like the suggestion by Eric Wieser to just add the exception to `asanyarray` itself - that way when matrix is truly deprecated, it will be a very easy change.
I don't quite understand this. Adding exceptions is not deprecation - we then may as well just rip np.matrix out straight away.

What I suggested in the call about this issue is that it's not very effective to treat functions like percentile/quantile one by one without an overarching strategy. A way forward could be for someone to write an overview of which sets of functions now have asanyarray (and actually work with subclasses), which ones we can and want to change now, and which ones we can and want to change after np.matrix is gone. Also, some guidelines for new functions that we add to numpy would be handy. I suspect we've been adding new functions that use asarray rather than asanyarray, which is probably undesired.

Thanks Nathaniel and Stephan. Your comments on my other two points are both clear and correct (and have been made a number of times before). I think the "write an overview so we can stop making ad-hoc decisions and having these discussions" is the most important point I was trying to make though. If we had such a doc and it concluded "hence we don't change anything, __array_function__ is the only way to go" then we can just close PRs like https://github.com/numpy/numpy/pull/11162 straight away.

Cheers,
Ralf

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: asanyarray vs. asarray

Nathaniel Smith
On Fri, Oct 19, 2018 at 7:00 PM, Charles R Harris
<[hidden email]> wrote:

>
> On Fri, Oct 19, 2018 at 7:50 PM Eric Wieser <[hidden email]>
> wrote:
>>
>> Subclasses such as MaskedArray and, yes, Quantity, are widely used, and if
>> they cause problems perhaps that should be seen as a sign that ndarray
>> subclassing should be made easier and clearer.
>>
>> Both maskedarray and quantity seem like something that would make more
>> sense at the dtype level if our dtype system was easier to extend. It might
>> be good to compile a list of subclassing applications, and split them into
>> “this ought to be a dtype” and “this ought to be a different type of
>> container”.
>
> Wes Mckinney has been benchmarking masks vs sentinel values for arrow:
> http://wesmckinney.com/blog/bitmaps-vs-sentinel-values/. The (bit) masks are
> faster. I'm not convinced dtypes are the way to go.

We need to add better support for both user-defined dtypes and for
user-defined containers in any case. So we're going to support both
missing value strategies regardless, and people will be able to choose
based on engineering trade-offs. A missing value dtype is going to
integrate much more easily into the rest of numpy than a new container
where you have to reimplement indexing etc., but maybe custom
containers can be faster. Okay, cool, they're both on PyPI, pick your
favorite!

Trying to wedge masks into *ndarray* seems like a non-starter, though,
because it would require auditing and updating basically all code
using the numpy C API.

-n

--
Nathaniel J. Smith -- https://vorpus.org
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: asanyarray vs. asarray

Nathaniel Smith
In reply to this post by Marten van Kerkwijk
On Fri, Oct 19, 2018 at 6:23 PM, Marten van Kerkwijk
<[hidden email]> wrote:

> Hi All,
>
> It seems there are two extreme possibilities for general functions:
> 1. Put `asarray` everywhere. The main benefit that I can see is that even if
> people put in list instead of arrays, one is guaranteed to have shape,
> dtype, etc. But it seems a bit like calling `int` on everything that might
> get used as an index, instead of letting the actual indexing do the proper
> thing and call `__index__`.
> 2. Do not coerce at all, but rather write code assuming something is an
> array already. This will often, but not always, just work for array mimics,
> with coercion done only where necessary (e.g., in lower-lying C code such as
> that of the ufuncs which has a smaller API surface and can be overridden
> more easily).

Between these two options, Numpy's APIs are very firmly on the side of
"option 1", and this is common in most public APIs I'm familiar with
(e.g. scipy). I guess you could try to reopen the discussion, but
you'd be pushing against 15+ years of precedent there...

> The current __array_function__ work may well provide us with a way to
> combine both, if we (over time) move the coercion inside
> `ndarray.__array_function__` so that the actual implementation *can* assume
> it deals with pure ndarray - then, when relevant, calling that
> implementation will be what subclasses/duck arrays can happily do (and it is
> up to them to ensure this works).
>
> Of course, the above does not really answer what to do in the meantime. But
> perhaps it helps in thinking of what we are actually aiming for.

We need some kind of asduckarray(), that coerces lists and similar but
allows duck-arrays to pass through.

> One last thing: could we please stop bashing subclasses? One can subclass
> essentially everything in python, often to great advantage. Subclasses such
> as MaskedArray and, yes, Quantity, are widely used, and if they cause
> problems perhaps that should be seen as a sign that ndarray subclassing
> should be made easier and clearer.

Who's bashing? I've spent years thinking about this and come to the
conclusion that there are no viable solutions to the problems with
subclassing ndarray, but that's not the same as bashing :-). If you've
thought of something we've missed, you should share it...

(I also know lots of senior Python devs who believe that using
Python's subclassing support is pretty much always a mistake – this
talk is popularly cited: https://www.youtube.com/watch?v=3MNVP9-hglc –
but the issues with ndarray are much more severe than for the average
Python class.)

-n

--
Nathaniel J. Smith -- https://vorpus.org
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: asanyarray vs. asarray

teoliphant
In reply to this post by Marten van Kerkwijk


On Fri, Oct 19, 2018 at 8:24 PM Marten van Kerkwijk <[hidden email]> wrote:
Hi All,

It seems there are two extreme possibilities for general functions:
1. Put `asarray` everywhere. The main benefit that I can see is that even if people put in list instead of arrays, one is guaranteed to have shape, dtype, etc. But it seems a bit like calling `int` on everything that might get used as an index, instead of letting the actual indexing do the proper thing and call `__index__`.

Yes, actually getting a proper "array protocol" into Python would be a fantastic approach.   We have been working with Lenore Mullin who is a researcher on the mathematics of arrays on what it means to be an array and believe we can come up with an actual array protocol that perhaps could be put into Python itself (though that isn't our immediate goal right now). 

 
2. Do not coerce at all, but rather write code assuming something is an array already. This will often, but not always, just work for array mimics, with coercion done only where necessary (e.g., in lower-lying C code such as that of the ufuncs which has a smaller API surface and can be overridden more easily).

The current __array_function__ work may well provide us with a way to combine both, if we (over time) move the coercion inside `ndarray.__array_function__` so that the actual implementation *can* assume it deals with pure ndarray - then, when relevant, calling that implementation will be what subclasses/duck arrays can happily do (and it is up to them to ensure this works).

Also, we could get rid of asarray entirely by changing expectations.  This automatic conversion code throughout NumPy and SciPy is an example of the confusion in both of these libraries between "user-oriented interfaces" and "developer-oriented interfaces".   A developer just wants the library to use duck-typing and then raise errors if you don't provide the right type (i.e. a list instead of an array).  The user-interface could happen in Jupyter, or be isolated to a high-level library or meta-code approach (of which there are several possibilities for Python). 

 

Of course, the above does not really answer what to do in the meantime. But perhaps it helps in thinking of what we are actually aiming for. 

One last thing: could we please stop bashing subclasses? One can subclass essentially everything in python, often to great advantage. Subclasses such as MaskedArray and, yes, Quantity, are widely used, and if they cause problems perhaps that should be seen as a sign that ndarray subclassing should be made easier and clearer.


I agree that we can stop bashing subclasses in general.   The problem with numpy subclasses is that they were made without adherence to SOLID:  https://en.wikipedia.org/wiki/SOLID.  In particular the Liskov substitution principle:  https://en.wikipedia.org/wiki/Liskov_substitution_principle . Much of this is my fault.  Being a scientist/engineer more than a computer scientist, I had no idea what these principles were and did not properly apply them in creating np.matrix which clearly violates the substitution principle. 

We can clean all this and more up.

But, we really need to start talking about NumPy 2.0 to do it.   Now that Python 3.x is really here, we can raise the money for it and get it done.  We don't have to just rely on volunteer time. 

The world will thank us for actually pushing NumPy 2.0.  I know not everyone agrees, but for whatever its worth, I feel very, very strongly about this, and despite not being very active on this list for the past years, I do have a lot of understanding about how the current code actually works (and where and why its warts are).

-Travis




 
All the best,

Marten


On Fri, Oct 19, 2018 at 7:02 PM Ralf Gommers <[hidden email]> wrote:


On Fri, Oct 19, 2018 at 10:28 PM Ralf Gommers <[hidden email]> wrote:


On Fri, Oct 19, 2018 at 4:15 PM Hameer Abbasi <[hidden email]> wrote:
Hi!

On Friday, Oct 19, 2018 at 6:09 PM, Stephan Hoyer <[hidden email]> wrote:
I don't think it makes much sense to change NumPy's existing usage of asarray() to asanyarray() unless we add subok=True arguments (which default to False). But this ends up cluttering NumPy's public API, which is also undesirable.

Agreed so far.

I'm not sure I agree. "subok" is very unpythonic; the average numpy library function should work fine for a well-behaved subclass (i.e. most things out there except np.matrix).

The preferred way to override NumPy functions going forward should be __array_function__.


I think we should “soft support” i.e. allow but consider unsupported, the case where one of NumPy’s functions is implemented in terms of others and “passing through” an array results in the correct behaviour for that array.

I don't think we have or want such a concept as "soft support". We intend to not break anything that now has asanyarray, i.e. it's supported and ideally we have regression tests for all such functions. For anything we transition over from asarray to asanyarray, PRs should come with new tests.



On Fri, Oct 19, 2018 at 8:13 AM Marten van Kerkwijk <[hidden email]> wrote:
There are exceptions for `matrix` in quite a few places, and there now is warning for `maxtrix` - it might not be bad to use `asanyarray` and add an exception for `maxtrix`. Indeed, I quite like the suggestion by Eric Wieser to just add the exception to `asanyarray` itself - that way when matrix is truly deprecated, it will be a very easy change.
I don't quite understand this. Adding exceptions is not deprecation - we then may as well just rip np.matrix out straight away.

What I suggested in the call about this issue is that it's not very effective to treat functions like percentile/quantile one by one without an overarching strategy. A way forward could be for someone to write an overview of which sets of functions now have asanyarray (and actually work with subclasses), which ones we can and want to change now, and which ones we can and want to change after np.matrix is gone. Also, some guidelines for new functions that we add to numpy would be handy. I suspect we've been adding new functions that use asarray rather than asanyarray, which is probably undesired.

Thanks Nathaniel and Stephan. Your comments on my other two points are both clear and correct (and have been made a number of times before). I think the "write an overview so we can stop making ad-hoc decisions and having these discussions" is the most important point I was trying to make though. If we had such a doc and it concluded "hence we don't change anything, __array_function__ is the only way to go" then we can just close PRs like https://github.com/numpy/numpy/pull/11162 straight away.

Cheers,
Ralf

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: asanyarray vs. asarray

Chris Barker - NOAA Federal
On Fri, Oct 26, 2018 at 7:12 PM, Travis Oliphant <[hidden email]> wrote:
 
 agree that we can stop bashing subclasses in general.   The problem with numpy subclasses is that they were made without adherence to SOLID:  https://en.wikipedia.org/wiki/SOLID.  In particular the Liskov substitution principle:  https://en.wikipedia.org/wiki/Liskov_substitution_principle

...
 
did not properly apply them in creating np.matrix which clearly violates the substitution principle. 

So -- could a matrix subclass be made "properly"? or is that an example of something that should not have been a subclass?

-CHB


--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

[hidden email]

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: asanyarray vs. asarray

ralfgommers


On Mon, Oct 29, 2018 at 4:31 PM Chris Barker <[hidden email]> wrote:
On Fri, Oct 26, 2018 at 7:12 PM, Travis Oliphant <[hidden email]> wrote:
 
 agree that we can stop bashing subclasses in general.   The problem with numpy subclasses is that they were made without adherence to SOLID:  https://en.wikipedia.org/wiki/SOLID.  In particular the Liskov substitution principle:  https://en.wikipedia.org/wiki/Liskov_substitution_principle

...
 
did not properly apply them in creating np.matrix which clearly violates the substitution principle. 

So -- could a matrix subclass be made "properly"? or is that an example of something that should not have been a subclass?

The latter - changing the behavior of multiplication breaks the principle.

Ralf


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: asanyarray vs. asarray

Eric Wieser

The latter - changing the behavior of multiplication breaks the principle.

But this is not the main reason for deprecating matrix - almost all of the problems I’ve seen have been caused by the way that matrices behave when sliced. The way that m[i][j] and m[i,j] are different is just one example of this, the fact that they must be 2d is another.

Matrices behaving differently on multiplication isn’t super different in my mind to how string arrays fail to multiply at all.

Eric

On Mon, 29 Oct 2018 at 20:54 Ralf Gommers <[hidden email]> wrote:

On Mon, Oct 29, 2018 at 4:31 PM Chris Barker <[hidden email]> wrote:
On Fri, Oct 26, 2018 at 7:12 PM, Travis Oliphant <[hidden email]> wrote:
 
 agree that we can stop bashing subclasses in general.   The problem with numpy subclasses is that they were made without adherence to SOLID:  https://en.wikipedia.org/wiki/SOLID.  In particular the Liskov substitution principle:  https://en.wikipedia.org/wiki/Liskov_substitution_principle

...
 
did not properly apply them in creating np.matrix which clearly violates the substitution principle. 

So -- could a matrix subclass be made "properly"? or is that an example of something that should not have been a subclass?

The latter - changing the behavior of multiplication breaks the principle.

Ralf

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: asanyarray vs. asarray

Eric Moore-11


On Tue, Oct 30, 2018 at 12:49 AM Eric Wieser <[hidden email]> wrote:

The latter - changing the behavior of multiplication breaks the principle.

But this is not the main reason for deprecating matrix - almost all of the problems I’ve seen have been caused by the way that matrices behave when sliced. The way that m[i][j] and m[i,j] are different is just one example of this, the fact that they must be 2d is another.

Matrices behaving differently on multiplication isn’t super different in my mind to how string arrays fail to multiply at all.


The difference is that string arrays are not numeric.  This is an issue since people want to pass a matrix Into places that want to multiple element wise but that then breaks that code unless special provisions are taken.  Numerical codes don’t work on string arrays anyway. 

Eric





Eric

On Mon, 29 Oct 2018 at 20:54 Ralf Gommers <[hidden email]> wrote:

On Mon, Oct 29, 2018 at 4:31 PM Chris Barker <[hidden email]> wrote:
On Fri, Oct 26, 2018 at 7:12 PM, Travis Oliphant <[hidden email]> wrote:
 
 agree that we can stop bashing subclasses in general.   The problem with numpy subclasses is that they were made without adherence to SOLID:  https://en.wikipedia.org/wiki/SOLID.  In particular the Liskov substitution principle:  https://en.wikipedia.org/wiki/Liskov_substitution_principle

...
 
did not properly apply them in creating np.matrix which clearly violates the substitution principle. 

So -- could a matrix subclass be made "properly"? or is that an example of something that should not have been a subclass?

The latter - changing the behavior of multiplication breaks the principle.

Ralf

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: asanyarray vs. asarray

Stephan Hoyer-2
In reply to this post by Eric Wieser
On Mon, Oct 29, 2018 at 9:49 PM Eric Wieser <[hidden email]> wrote:

The latter - changing the behavior of multiplication breaks the principle.

But this is not the main reason for deprecating matrix - almost all of the problems I’ve seen have been caused by the way that matrices behave when sliced. The way that m[i][j] and m[i,j] are different is just one example of this, the fact that they must be 2d is another.

Matrices behaving differently on multiplication isn’t super different in my mind to how string arrays fail to multiply at all.

Eric

It's certainly fine for arithmetic to work differently on an element-wise basis or even to error. But np.matrix changes the shape of results from various ndarray operations (e.g., both multiplication and indexing), which is more than any dtype can do.

The Liskov substitution principle (LSP) suggests that the set of reasonable ndarray subclasses are exactly those that could also in principle correspond to a new dtype. Of np.ndarray subclasses in wide-spread use, I think only the various "array with units" types come close satisfying this criteria. They only fall short insofar as they present a misleading dtype (without unit information).

The main problem with subclassing for numpy.ndarray is that it guarantees too much: a large set of operations/methods along with a specific memory layout exposed as part of its public API. Worse, ndarray itself is a little quirky (e.g., with indexing, and its handling of scalars vs. 0d arrays). In practice, it's basically impossible to layer on complex behavior with these exact semantics, so only extremely minimal ndarray subclasses don't violate LSP.

Once we have more easily extended dtypes, I suspect most of the good use cases for subclassing will have gone away.

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
12