Proposal to accept NEP-18, __array_function__ protocol

classic Classic list List threaded Threaded
62 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Re: Proposal to accept NEP-18, __array_function__ protocol

Nathaniel Smith
On Tue, Aug 21, 2018 at 9:39 AM, Stephan Hoyer <[hidden email]> wrote:

> On Tue, Aug 21, 2018 at 12:21 AM Nathaniel Smith <[hidden email]> wrote:
>>
>> On Wed, Aug 15, 2018 at 9:45 AM, Stephan Hoyer <[hidden email]> wrote:
>> > This avoids a classic subclassing problem that has plagued NumPy for
>> > years,
>> > where overriding the behavior of method A causes apparently unrelated
>> > method
>> > B to break, because it relied on method A internally. In NumPy, this
>> > constrained our implementation of np.median(), because it needed to call
>> > np.mean() in order for subclasses implementing units to work properly.
>>
>> I don't think I follow... if B uses A internally, then overriding A
>> shouldn't cause B to break, unless the overridden A is buggy.
>
>
> Let me try another example with arrays with units. My understanding of the
> contract provided by unit implementations is their behavior should never
> deviate from NumPy unless an operation raises an error. (This is more
> explicit for arrays with units because they raise errors for operations with
> incompatible units, but practically speaking almost all duck arrays will
> have at least some unsupported operations in NumPy's giant API.)
>
> It is quite possible that NumPy functions could be (re)written in a way that
> is incompatible with some unit implementations but is perfectly valid for
> "full" duck arrays. We actually see this even within NumPy already -- for
> example, see this recent PR adding support for the datetime64 dtype to
> percentile:
> https://github.com/numpy/numpy/pull/11627

I clicked the link, but I don't see anything about units?

Of course units are a tricky example to make inferences from, because
they aren't a good fit for the duck array concept in general. (In
terms of numpy's core semantics, data-with-units is a special dtype,
not a special container type.)

From your mention of "full" duck arrays I guess you're thinking of
this distinction?:
http://www.numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html#principle-1-focus-on-full-duck-arrays-but-dont-rule-out-partial-duck-arrays

You're right: if numpy changes the implementation of some high-level
function to use protocol-A instead of protocol-B, and there's some
partial-duck-array that only implements protocol-B, then it gets
broken. Of course, in general __array_function__ has the same problem:
if sklearn changes their implementation of some function to call numpy
function A instead of numpy function B, and there's a
partial-duck-array that only implements numpy function B, then sklearn
is broken. I think that as duck arrays roll out, we're just going to
have to get used to dealing with breakage like this sometimes.

The advantage of __array_function__ is that we get to ignore these
issues within numpy itself. The advantage of having focused-protocols
is that they make it easier to implement full duck arrays, and they
give us a vocabulary for talking about degrees of partiality. For
example, with __array_concatenate__, a duck array either supports all
the concatenation/stacking operations or none of them – so sklearn
never has to worry that switching between np.row_stack and np.stack
will cause issues.

> A lesser case of this are changes in NumPy causing performance issues for
> users of duck arrays, which is basically inevitable if we share
> implementations.

NumPy (and Python in general) is never going to make everything 100%
optimized all the time. Over and over we choose to accept small
inefficiencies in order to improve maintainability. How big are these
inefficiencies – 1% overhead, 10% overhead, 10x overhead...? Do they
show up everywhere, or just for a few key functions? What's the
maintenance cost of making NumPy's whole API overrideable, in terms of
making it harder for us to evolve numpy? What about for users dealing
with a proliferation of subtly incompatible implementations?

You may be right that the tradeoffs work out so that every API needs
to be individually overridable and the benefits are worth it, but we
at least need to be asking these questions.

>> And when we fix a bug in row_stack, this means we also have to fix it
>> in all the copy-paste versions, which won't happen, so np.row_stack
>> has different semantics on different objects, even if they started out
>> matching. The NDArrayOperatorsMixin reduces the number of duplicate
>> copies of the same code that need to be updated, but 2 copies is still
>> a lot worse than 1 copy :-).
>
>
> I see your point, but in all seriousness if encounter a bug in np.row_stack
> at this point we might just call it a feature instead.

Yeah, you're right, row_stack is a bad example :-). But of course the
point is that it's literally any bug-fix or added feature in numpy's
public API.

Here's a better, more concrete example: back in 2015, you added
np.stack (PR #5605), which was a great new feature. Its implementation
was entirely in terms of np.concatenate and other basic APIs like
.ndim, asanyarray, etc.

In the smallish-set-of-designed-protocols world, as soon as that's
merged into numpy, you're done: it works on sparse arrays, dask
arrays, tensorflow tensors, etc. People can use it as soon as they
upgrade their numpy.

In the __array_function__ world, merging into numpy is only the
beginning: now you have to go make new PRs to sparse, dask,
tensorflow, etc., get them merged, released, etc. Downstream projects
may refuse to use it until it's supported in multiple projects that
have their own release cycles, etc.

Or another example: at a workshop a few years ago, Matti put up some
of the source code to numpypy to demonstrate what it looked like. I
immediately spotted a subtle bug, because I happened to know that it
was one we'd found and fixed recently. (IIRC it was the thing where
arr[...] should return a view of arr, not arr itself.) Of course
indexing for duck arrays is its own mess that's somewhat orthogonal to
__array_function__, but the basic point is that numpy has a lot of
complex error-prone semantics, and we are still actively finding and
fixing issues in numpy's own implementations.

>>
>> > 1. The details of how NumPy implements a high-level function in terms of
>> > overloaded functions now becomes an implicit part of NumPy’s public API. For
>> > example, refactoring stack to use np.block() instead of np.concatenate()
>> > internally would now become a breaking change.
>>
>> The way I'm imagining this would work is, we guarantee not to take a
>> function that used to be implemented in terms of overridable
>> operations, and refactor it so it's implemented in terms of
>> overridable operations. So long as people have correct implementations
>> of __array_concatenate__ and __array_block__, they shouldn't care
>> which one we use. In the interim period where we have
>> __array_concatenate__ but there's no such thing as __array_block__,
>> then that refactoring would indeed break things, so we shouldn't do
>> that :-). But we could fix that by adding __array_block__.
>
>
> ""we guarantee not to take a function that used to be implemented in terms
> of overridable operations, and refactor it so it's implemented in terms of
> overridable operations"
> Did you miss a "not" in here somewhere, e.g., "refactor it so it's NOT
> implemented"?

Yeah, sorry.

> If we ever tried to do something like this, I'm pretty sure that it just
> wouldn't happen -- unless we also change NumPy's extremely conservative
> approach to breaking third-party code. np.block() is much more complex to
> implement than np.concatenate(), and users would resist being forced to
> handle that complexity if they don't need it. (Example: TensorFlow has a
> concatenate function, but not block.)

I agree, we probably wouldn't do this particular change.

>> > 2. Array libraries may prefer to implement high level functions
>> > differently than NumPy. For example, a library might prefer to implement a
>> > fundamental operations like mean() directly rather than relying on sum()
>> > followed by division. More generally, it’s not clear yet what exactly
>> > qualifies as core functionality, and figuring this out could be a large
>> > project.
>>
>> True. And this is a very general problem... for example, the
>> appropriate way to implement logistic regression is very different
>> in-core versus out-of-core. You're never going to be able to take code
>> written for ndarray, drop in an arbitrary new array object, and get
>> optimal results in all cases -- that's just way too ambitious to hope
>> for. There will be cases where reducing to operations like sum() and
>> division is fine. There will be cases where you have a high-level
>> operation like logistic regression, where reducing to sum() and
>> division doesn't work, but reducing to slightly-higher-level
>> operations like np.mean also doesn't work, because you need to redo
>> the whole high-level operation. And then there will be cases where
>> sum() and division are too low-level, but mean() is high-level enough
>> to make the critical difference. It's that last one where it's
>> important to be able to override mean() directly. Are there a lot of
>> cases like this?
>
>
> mean() is not entirely hypothetical. TensorFlow and Eigen actually do
> implement mean separately from sum, though to be honest it's not entirely
> clear to me why:
> https://github.com/tensorflow/tensorflow/blob/1c1dad105a57bb13711492a8ba5ab9d10c91b5df/tensorflow/core/kernels/reduction_ops_mean.cc
> https://eigen.tuxfamily.org/dox/unsupported/TensorFunctors_8h_source.html
>
> I do think this probably will come up with some frequency for other
> operations, but the bigger answer here really is consistency -- it allows
> projects and their users to have very clearly defined dependencies on
> NumPy's API. They don't need to worry about any implementation details from
> NumPy leaking into their override of a function.

When you say "consistency" here that means: "they can be sure that
when they disagree with the numpy devs about the
semantics/implementation of a numpy API, then the numpy API will
consistently act the way they want, not the way the numpy devs want".
Right?

This is a very double-edged sword :-).

>> > 3. We don’t yet have an overloading system for attributes and methods on
>> > array objects, e.g., for accessing .dtype and .shape. This should be the
>> > subject of a future NEP, but until then we should be reluctant to rely on
>> > these properties.
>>
>> This one I don't understand. If you have a duck-array object, and you
>> want to access its .dtype or .shape attributes, you just... write
>> myobj.dtype or myobj.shape? That doesn't need a NEP though so I must
>> be missing something :-).
>
>
> We don't have np.asduckarray() yet or whatever we'll end up calling our
> proposed casting function from NEP 22, so we don't have a fully fleshed out
> mechanism for NumPy to declare "this object needs to support .shape and
> .dtype, or I'm going to cast it into something that does".

That's true, but it's just as big a problem for NEP 18, because
__array_function__ is never going to do much if you've already coerced
the thing to an ndarray. Some kind of asduckarray solution is
basically a prerequisite to any other duck array features.

-n

--
Nathaniel J. Smith -- https://vorpus.org
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Proposal to accept NEP-18, __array_function__ protocol

Nathaniel Smith
In reply to this post by Stephan Hoyer-2
On Tue, Aug 21, 2018 at 6:12 PM, Stephan Hoyer <[hidden email]> wrote:

> On Tue, Aug 21, 2018 at 12:21 AM Nathaniel Smith <[hidden email]> wrote:
>>
>> >> My suggestion: at numpy import time, check for an envvar, like say
>> >> NUMPY_EXPERIMENTAL_ARRAY_FUNCTION=1. If it's not set, then all the
>> >> __array_function__ dispatches turn into no-ops. This lets interested
>> >> downstream libraries and users try this out, but makes sure that we
>> >> won't have a hundred thousand end users depending on it without
>> >> realizing.
>> >>
>> >>
>> >>
>> >> - makes it easy for end-users to check how much overhead this adds (by
>> >> running their code with it enabled vs disabled)
>> >> - if/when we decide to commit to supporting it for real, we just
>> >> remove the envvar.
>> >
>> >
>> > I'm slightly concerned that the cost of reading an environment variable
>> > with
>> > os.environ could exaggerate the performance cost of __array_function__.
>> > It
>> > takes about 1 microsecond to read an environment variable on my laptop,
>> > which is comparable to the full overhead of __array_function__.
>>
>> That's why I said "at numpy import time" :-). I was imagining we'd
>> check it once at import, and then from then on it'd be stashed in some
>> C global, so after that the overhead would just be a single
>> predictable branch 'if (array_function_is_enabled) { ... }'.
>
>
> Indeed, I missed the "at numpy import time" bit :).
>
> In that case, I'm concerned that it isn't always possible to set environment
> variables once before importing NumPy. The environment variable solution
> works great if users have full control of their own Python binaries, but
> that isn't always the case today in this era of server-less infrastructure
> and online notebooks.
>
> One example offhand is Google's Colaboratory
> (https://research.google.com/colaboratory), a web based Jupyter notebook.
> NumPy is always loaded when a notebook is opened, as you can check from
> inspecting sys.modules. Now, I work with the developers of Colaboratory, so
> we could probably figure out a work-around together, but I'm pretty sure
> this would also come up in the context of other tools.

I mean, the idea of the envvar is to be a temporary measure enable
devs to experiment with a provisional feature, while being awkward
enough that people don't build lots of stuff assuming its there. It
doesn't have to 100% supported in every environment.

> Another problem is unit testing. Does pytest use a separate Python process
> for running the tests in each file? I don't know and that feels like an
> implementation detail that I shouldn't have to know :). Yes, in principle I
> could use a subprocess in my __array_function__ for unit tests, but that
> would be really awkward.

Set the envvar before invoking pytest?

For numpy itself we'll need to write a few awkward tests involving
subprocesses to make sure the envvar parsing is working properly, but
I don't think this is a big deal. As long as we only have 1-2 places
that __array_function__ dispatch funnels through, we just need to make
sure that they work properly with/without the envvar; no need to test
every API separately. Or if it is an issue we can have some private
API that's only available to the numpy test suite...

>> > So we may
>> > want to switch to an explicit Python API instead, e.g.,
>> > np.enable_experimental_array_function().
>>
>> If we do this, then libraries that want to use __array_function__ will
>> just call it themselves at import time. The point of the env-var is
>> that our policy is not to break end-users, so if we want an API to be
>> provisional and experimental then it's end-users who need to be aware
>> of that before using it. (This is also an advantage of checking the
>> envvar only at import time: it means libraries can't easily just
>> setenv() to enable the functionality behind users' backs.)
>
>
> I'm in complete agreement that only authors of end-user applications should
> invoke this option, but just because something is technically possible
> doesn't mean that people will actually do it or that we need to support that
> use case :).

I didn't say "authors of end-user applications", I said "end-users" :-).

That said, I dunno. My intuition is that if we have a function call
like this then libraries that define __array_function__ will merrily
call it in their package __init__ and it accomplishes nothing, but
maybe I'm being too cynical and untrusting.

-n

--
Nathaniel J. Smith -- https://vorpus.org
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Proposal to accept NEP-18, __array_function__ protocol

einstein.edison
In reply to this post by Nathaniel Smith
Hi Nathaniel and Stephan,

Since this conversation is getting a bit lengthy and I see a lot of repeated stuff, I’ll summarise the arguments for everyone’s benefit and then present my own viewpoints:

Nathaniel:
  • Undue maintenance burden on NumPy, since semantics have to match exactly
  • Implementations of functions may change, which may break downstream library compatibility
  • There may be time taken in merging this everywhere, so why not take time to define proper protocols?
  • Hide this entire interface behind an environment variable, possibly to be removed later.
Stephan:
  • Semantics don’t have to match exactly, that isn’t the intent  of most duck-arrays.
  • This won’t happen given NumPy’s conservativeness.
  • The protocols will just be copies of __array_function__, but less capable
  • Provide an interface that only end-users may turn on.

My viewpoints:
  • I don’t think any Duck array implementers intend to copy semantics on that level. Dask, which is the most complete one, doesn’t have views, only copies. Many other semantics simply don’t match. The intent is to allow for code that expresses, well, intent (no pun intended) instead of relying heavily on semantics, but that can use arbitrary duck-array implementations instead of just ndarray.
  • Most of the implementations in NumPy are pretty stable, and the only thing that’s likely to happen here is bug fixes. And we are free to fix bugs those; I doubt implementation-specific bugs will be copied. However, these first two points are for/against duck arrays in general, and not specific to this protocol, so IMO this discussion is completely orthogonal to this one.
  • I agree with Stephan here: Defining a minimum API for NumPy that will complete duck arrays will produce a lot of functions in every case that cannot be overridden, as they simply cannot be expressed in terms of the protocols we have added so far. This will lead to more protocols being produced, and so on ad infinitum. We have to consider the burden that such a design would place on the maintainers of NumPy as well… I personally feel that the amount of such protocols we’ll so need are large enough that this line of action is more burdensome, rather than less. I prefer an approach with __array_function__ + mailing list ping before adding a function.
  • May I propose an alternative that was already discussed, and one that I think everyone will be okay with: We put all overridable functions inside a new submodule, numpy.api, that will initially be a shallow-ish copy of the numpy module. I say ish because all modules inside NumPy will need to be shallow-copied as well. If we need to add __array_function__, we can always do that there. Normal users are using “regular” NumPy unless they know they’re using the API, but it is separately accessible. As for hiding it completely goes: We have to realise, the Python computation landscape is fragmenting. The slower we are, the more fragmented it will become. NumPy already isn’t “the standard” for machine learning.

Regards,
Hameer Abbasi

On 22. Aug 2018, at 03:46, Nathaniel Smith <[hidden email]> wrote:

On Tue, Aug 21, 2018 at 9:39 AM, Stephan Hoyer <[hidden email]> wrote:
On Tue, Aug 21, 2018 at 12:21 AM Nathaniel Smith <[hidden email]> wrote:

On Wed, Aug 15, 2018 at 9:45 AM, Stephan Hoyer <[hidden email]> wrote:
This avoids a classic subclassing problem that has plagued NumPy for
years,
where overriding the behavior of method A causes apparently unrelated
method
B to break, because it relied on method A internally. In NumPy, this
constrained our implementation of np.median(), because it needed to call
np.mean() in order for subclasses implementing units to work properly.

I don't think I follow... if B uses A internally, then overriding A
shouldn't cause B to break, unless the overridden A is buggy.


Let me try another example with arrays with units. My understanding of the
contract provided by unit implementations is their behavior should never
deviate from NumPy unless an operation raises an error. (This is more
explicit for arrays with units because they raise errors for operations with
incompatible units, but practically speaking almost all duck arrays will
have at least some unsupported operations in NumPy's giant API.)

It is quite possible that NumPy functions could be (re)written in a way that
is incompatible with some unit implementations but is perfectly valid for
"full" duck arrays. We actually see this even within NumPy already -- for
example, see this recent PR adding support for the datetime64 dtype to
percentile:
https://github.com/numpy/numpy/pull/11627

I clicked the link, but I don't see anything about units?

Of course units are a tricky example to make inferences from, because
they aren't a good fit for the duck array concept in general. (In
terms of numpy's core semantics, data-with-units is a special dtype,
not a special container type.)

From your mention of "full" duck arrays I guess you're thinking of
this distinction?:
http://www.numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html#principle-1-focus-on-full-duck-arrays-but-dont-rule-out-partial-duck-arrays

You're right: if numpy changes the implementation of some high-level
function to use protocol-A instead of protocol-B, and there's some
partial-duck-array that only implements protocol-B, then it gets
broken. Of course, in general __array_function__ has the same problem:
if sklearn changes their implementation of some function to call numpy
function A instead of numpy function B, and there's a
partial-duck-array that only implements numpy function B, then sklearn
is broken. I think that as duck arrays roll out, we're just going to
have to get used to dealing with breakage like this sometimes.

The advantage of __array_function__ is that we get to ignore these
issues within numpy itself. The advantage of having focused-protocols
is that they make it easier to implement full duck arrays, and they
give us a vocabulary for talking about degrees of partiality. For
example, with __array_concatenate__, a duck array either supports all
the concatenation/stacking operations or none of them – so sklearn
never has to worry that switching between np.row_stack and np.stack
will cause issues.

A lesser case of this are changes in NumPy causing performance issues for
users of duck arrays, which is basically inevitable if we share
implementations.

NumPy (and Python in general) is never going to make everything 100%
optimized all the time. Over and over we choose to accept small
inefficiencies in order to improve maintainability. How big are these
inefficiencies – 1% overhead, 10% overhead, 10x overhead...? Do they
show up everywhere, or just for a few key functions? What's the
maintenance cost of making NumPy's whole API overrideable, in terms of
making it harder for us to evolve numpy? What about for users dealing
with a proliferation of subtly incompatible implementations?

You may be right that the tradeoffs work out so that every API needs
to be individually overridable and the benefits are worth it, but we
at least need to be asking these questions.

And when we fix a bug in row_stack, this means we also have to fix it
in all the copy-paste versions, which won't happen, so np.row_stack
has different semantics on different objects, even if they started out
matching. The NDArrayOperatorsMixin reduces the number of duplicate
copies of the same code that need to be updated, but 2 copies is still
a lot worse than 1 copy :-).


I see your point, but in all seriousness if encounter a bug in np.row_stack
at this point we might just call it a feature instead.

Yeah, you're right, row_stack is a bad example :-). But of course the
point is that it's literally any bug-fix or added feature in numpy's
public API.

Here's a better, more concrete example: back in 2015, you added
np.stack (PR #5605), which was a great new feature. Its implementation
was entirely in terms of np.concatenate and other basic APIs like
.ndim, asanyarray, etc.

In the smallish-set-of-designed-protocols world, as soon as that's
merged into numpy, you're done: it works on sparse arrays, dask
arrays, tensorflow tensors, etc. People can use it as soon as they
upgrade their numpy.

In the __array_function__ world, merging into numpy is only the
beginning: now you have to go make new PRs to sparse, dask,
tensorflow, etc., get them merged, released, etc. Downstream projects
may refuse to use it until it's supported in multiple projects that
have their own release cycles, etc.

Or another example: at a workshop a few years ago, Matti put up some
of the source code to numpypy to demonstrate what it looked like. I
immediately spotted a subtle bug, because I happened to know that it
was one we'd found and fixed recently. (IIRC it was the thing where
arr[...] should return a view of arr, not arr itself.) Of course
indexing for duck arrays is its own mess that's somewhat orthogonal to
__array_function__, but the basic point is that numpy has a lot of
complex error-prone semantics, and we are still actively finding and
fixing issues in numpy's own implementations.


1. The details of how NumPy implements a high-level function in terms of
overloaded functions now becomes an implicit part of NumPy’s public API. For
example, refactoring stack to use np.block() instead of np.concatenate()
internally would now become a breaking change.

The way I'm imagining this would work is, we guarantee not to take a
function that used to be implemented in terms of overridable
operations, and refactor it so it's implemented in terms of
overridable operations. So long as people have correct implementations
of __array_concatenate__ and __array_block__, they shouldn't care
which one we use. In the interim period where we have
__array_concatenate__ but there's no such thing as __array_block__,
then that refactoring would indeed break things, so we shouldn't do
that :-). But we could fix that by adding __array_block__.


""we guarantee not to take a function that used to be implemented in terms
of overridable operations, and refactor it so it's implemented in terms of
overridable operations"
Did you miss a "not" in here somewhere, e.g., "refactor it so it's NOT
implemented"?

Yeah, sorry.

If we ever tried to do something like this, I'm pretty sure that it just
wouldn't happen -- unless we also change NumPy's extremely conservative
approach to breaking third-party code. np.block() is much more complex to
implement than np.concatenate(), and users would resist being forced to
handle that complexity if they don't need it. (Example: TensorFlow has a
concatenate function, but not block.)

I agree, we probably wouldn't do this particular change.

2. Array libraries may prefer to implement high level functions
differently than NumPy. For example, a library might prefer to implement a
fundamental operations like mean() directly rather than relying on sum()
followed by division. More generally, it’s not clear yet what exactly
qualifies as core functionality, and figuring this out could be a large
project.

True. And this is a very general problem... for example, the
appropriate way to implement logistic regression is very different
in-core versus out-of-core. You're never going to be able to take code
written for ndarray, drop in an arbitrary new array object, and get
optimal results in all cases -- that's just way too ambitious to hope
for. There will be cases where reducing to operations like sum() and
division is fine. There will be cases where you have a high-level
operation like logistic regression, where reducing to sum() and
division doesn't work, but reducing to slightly-higher-level
operations like np.mean also doesn't work, because you need to redo
the whole high-level operation. And then there will be cases where
sum() and division are too low-level, but mean() is high-level enough
to make the critical difference. It's that last one where it's
important to be able to override mean() directly. Are there a lot of
cases like this?


mean() is not entirely hypothetical. TensorFlow and Eigen actually do
implement mean separately from sum, though to be honest it's not entirely
clear to me why:
https://github.com/tensorflow/tensorflow/blob/1c1dad105a57bb13711492a8ba5ab9d10c91b5df/tensorflow/core/kernels/reduction_ops_mean.cc
https://eigen.tuxfamily.org/dox/unsupported/TensorFunctors_8h_source.html

I do think this probably will come up with some frequency for other
operations, but the bigger answer here really is consistency -- it allows
projects and their users to have very clearly defined dependencies on
NumPy's API. They don't need to worry about any implementation details from
NumPy leaking into their override of a function.

When you say "consistency" here that means: "they can be sure that
when they disagree with the numpy devs about the
semantics/implementation of a numpy API, then the numpy API will
consistently act the way they want, not the way the numpy devs want".
Right?

This is a very double-edged sword :-).

3. We don’t yet have an overloading system for attributes and methods on
array objects, e.g., for accessing .dtype and .shape. This should be the
subject of a future NEP, but until then we should be reluctant to rely on
these properties.

This one I don't understand. If you have a duck-array object, and you
want to access its .dtype or .shape attributes, you just... write
myobj.dtype or myobj.shape? That doesn't need a NEP though so I must
be missing something :-).


We don't have np.asduckarray() yet or whatever we'll end up calling our
proposed casting function from NEP 22, so we don't have a fully fleshed out
mechanism for NumPy to declare "this object needs to support .shape and
.dtype, or I'm going to cast it into something that does".

That's true, but it's just as big a problem for NEP 18, because
__array_function__ is never going to do much if you've already coerced
the thing to an ndarray. Some kind of asduckarray solution is
basically a prerequisite to any other duck array features.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Proposal to accept NEP-18, __array_function__ protocol

ralfgommers


On Wed, Aug 22, 2018 at 4:22 AM Hameer Abbasi <[hidden email]> wrote:
May I propose an alternative that was already discussed, and one that I think everyone will be okay with:

That's a dangerous assumption on this list:)

We put all overridable functions inside a new submodule, numpy.api, that will initially be a shallow-ish copy of the numpy module.

This is not desirable. There are projects (e.g. statsmodels) that have added a .api submodule before. It's generally considered not a good idea, it's not very Pythonic. Everything one can import that doesn't have an underscore is normally part of the API of a package. In this particular case, I definitely prefer an envvar and relying on what is documented as part of __array_function__ rather than a new namespace.

Cheers,
Ralf

I say ish because all modules inside NumPy will need to be shallow-copied as well. If we need to add __array_function__, we can always do that there. Normal users are using “regular” NumPy unless they know they’re using the API, but it is separately accessible. As for hiding it completely goes: We have to realise, the Python computation landscape is fragmenting. The slower we are, the more fragmented it will become. NumPy already isn’t “the standard” for machine learning.

Regards,
Hameer Abbasi


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Proposal to accept NEP-18, __array_function__ protocol

Stephan Hoyer-2
In reply to this post by Nathaniel Smith
On Tue, Aug 21, 2018 at 6:47 PM Nathaniel Smith <[hidden email]> wrote:
On Tue, Aug 21, 2018 at 9:39 AM, Stephan Hoyer <[hidden email]> wrote:
> It is quite possible that NumPy functions could be (re)written in a way that
> is incompatible with some unit implementations but is perfectly valid for
> "full" duck arrays. We actually see this even within NumPy already -- for
> example, see this recent PR adding support for the datetime64 dtype to
> percentile:
> https://github.com/numpy/numpy/pull/11627

I clicked the link, but I don't see anything about units?

To clarify: np.datetime64 arrays can be considered a variant of NumPy arrays that support units. Namely, they use a dtype for representing time units.

I expect that the issues we've encountered with datetime64 will be indicative of some of the sort of issues that authors of unit-aware arrays will encounter.

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Proposal to accept NEP-18, __array_function__ protocol

Stephan Hoyer-2
In reply to this post by Nathaniel Smith
On Tue, Aug 21, 2018 at 6:57 PM Nathaniel Smith <[hidden email]> wrote:
I mean, the idea of the envvar is to be a temporary measure enable
devs to experiment with a provisional feature, while being awkward
enough that people don't build lots of stuff assuming its there. It
doesn't have to 100% supported in every environment.

My understanding of the idea of the envvar is to obtain informed consent from NumPy users, e.g., "I understand that this is a unsupported experimental feature that may be removed in the future without warning."

It's pretty important for me personally that's it's possible to use this in a flexible set of environments, and in particular to have something that works in my preferred notebook environment. How else are we going to test this?

Every limitation that we put into the experimental version of this feature decreases the likelihood that it gets used enough to know if it's even a viable solution. If it's too awkward, nobody's even going to bother testing it, and this whole effort will fall flat on its face.
 
> I'm in complete agreement that only authors of end-user applications should
> invoke this option, but just because something is technically possible
> doesn't mean that people will actually do it or that we need to support that
> use case :).

I didn't say "authors of end-user applications", I said "end-users" :-).

These are mostly the same for NumPy, but I do disagree with you here. Ultimately we have to trust application developers to make the right choices for their tools. If they are willing to accept that maintenance burden of either (1) potentially being stuck on NumPy 1.16 forever or (2) needing to rewrite their code, that's their tradeoff to make. It's a little preposterous to force this decision onto end-users, who may not even know a tool is written in NumPy.
 
That said, I dunno. My intuition is that if we have a function call
like this then libraries that define __array_function__ will merrily
call it in their package __init__ and it accomplishes nothing, but
maybe I'm being too cynical and untrusting.

People can do lots of dumb things in Python (e.g., monkeypatching) -- the language doesn't stop them. Fortunately this mostly isn't a problem.

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Proposal to accept NEP-18, __array_function__ protocol

einstein.edison
Hi everyone,

On 23. Aug 2018, at 17:35, Stephan Hoyer <[hidden email]> wrote:

On Tue, Aug 21, 2018 at 6:57 PM Nathaniel Smith <[hidden email]> wrote:
I mean, the idea of the envvar is to be a temporary measure enable
devs to experiment with a provisional feature, while being awkward
enough that people don't build lots of stuff assuming its there. It
doesn't have to 100% supported in every environment.

My understanding of the idea of the envvar is to obtain informed consent from NumPy users, e.g., "I understand that this is a unsupported experimental feature that may be removed in the future without warning."

It's pretty important for me personally that's it's possible to use this in a flexible set of environments, and in particular to have something that works in my preferred notebook environment. How else are we going to test this?

Every limitation that we put into the experimental version of this feature decreases the likelihood that it gets used enough to know if it's even a viable solution. If it's too awkward, nobody's even going to bother testing it, and this whole effort will fall flat on its face.
 
> I'm in complete agreement that only authors of end-user applications should
> invoke this option, but just because something is technically possible
> doesn't mean that people will actually do it or that we need to support that
> use case :).

I didn't say "authors of end-user applications", I said "end-users" :-).

These are mostly the same for NumPy, but I do disagree with you here. Ultimately we have to trust application developers to make the right choices for their tools. If they are willing to accept that maintenance burden of either (1) potentially being stuck on NumPy 1.16 forever or (2) needing to rewrite their code, that's their tradeoff to make. It's a little preposterous to force this decision onto end-users, who may not even know a tool is written in NumPy.
 
That said, I dunno. My intuition is that if we have a function call
like this then libraries that define __array_function__ will merrily
call it in their package __init__ and it accomplishes nothing, but
maybe I'm being too cynical and untrusting.

People can do lots of dumb things in Python (e.g., monkeypatching) -- the language doesn't stop them. Fortunately this mostly isn't a problem.

I might add that most duck array authors are highly unlikely to be newcomers to the Python space. We should just put a big warning there while enabling and that’ll be enough to scare away most devs from doing it by default.

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

Best Regards,
Hameer Abbasi
Sent from my iPhone

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Proposal to accept NEP-18, __array_function__ protocol

Stephan Hoyer-2
In reply to this post by Nathaniel Smith
RE: the types argument

On Tue, Aug 21, 2018 at 12:21 AM Nathaniel Smith <[hidden email]> wrote:
This is much more of a detail as compared to the rest of the
discussion, so I don't want to quibble too much about it. (Especially
since if we keep things really-provisional, we can change our mind
about the argument later :-).) Mostly I'm just confused, because there
are lots of __dunder__ functions in Python (and NumPy), and none of
them take a special 'types' argument... so what's special about
__array_function__ that makes it necessary/worthwhile?

What's special about __array_function__ is that it's a hook that lets you override an entire API through a single interface. Unlike protocols like __add__, implementers of __array_function__ don't know exactly which arguments could have implemented the operation. 
 
Any implementation of, say, concatenate-via-array_function is going to
involve iterating through all the arguments and looking at each of
them to figure out what kind of object it is and how to handle it,
right? That's true whether or not they've done a "pre-check" using the
types set, so in theory it's just as easy to return NotImplemented at
that point. But I guess your point in the last paragraph is that this
means there will be lots of chances to mess up the
NotImplemented-returning code in particular, especially since it's
less likely to be tested than the happy path, which seems plausible.
So basically the point of the types set is to let people factor out
that little bit of lots of functions into one common place?

It's also a pragmatic choice: libraries like dask.array and autograd.numpy have already implemented NumPy's API without overrides. These projects follow the current numpy convention: non-native array objects are coerced into native arrays (i.e., dask or autograd arrays). They don't do any type checking.

I doubt there would be much appetite for writing alternative versions of these APIs that return NotImplemented instead -- especially while this feature remains experimental.
 
I guess some careful devs might be unhappy with paying extra so that other
lazier devs can get away with being lazy, but maybe it's a good
tradeoff for us (esp. since as numpy devs, we'll be getting the bug
reports regardless :-)).

The only extra amount we pay extra is the price of converting these types into a Python data structure and passing them into the __array_function__ method call. We already had to collect them for __array_function__ itself to identify unique types to call -- so this is a pretty minimal extra cost.
 
If that's the goal, then it does make me wonder if there might be a
more direct way to accomplish it -- like, should we let classes define
an __array_function_types__ attribute that numpy would check before
even trying to dispatch to __array_function__?

This could potentially work, but now the __array_function__ protocol itself becomes more complex and out of sync with __array_ufunc__. It's a much smaller amount of additional complexity to add an additional passed argument.

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Proposal to accept NEP-18, __array_function__ protocol

einstein.edison


On 23. Aug 2018, at 18:37, Stephan Hoyer <[hidden email]> wrote:

RE: the types argument

On Tue, Aug 21, 2018 at 12:21 AM Nathaniel Smith <[hidden email]> wrote:
This is much more of a detail as compared to the rest of the
discussion, so I don't want to quibble too much about it. (Especially
since if we keep things really-provisional, we can change our mind
about the argument later :-).) Mostly I'm just confused, because there
are lots of __dunder__ functions in Python (and NumPy), and none of
them take a special 'types' argument... so what's special about
__array_function__ that makes it necessary/worthwhile?

What's special about __array_function__ is that it's a hook that lets you override an entire API through a single interface. Unlike protocols like __add__, implementers of __array_function__ don't know exactly which arguments could have implemented the operation. 
 
Any implementation of, say, concatenate-via-array_function is going to
involve iterating through all the arguments and looking at each of
them to figure out what kind of object it is and how to handle it,
right? That's true whether or not they've done a "pre-check" using the
types set, so in theory it's just as easy to return NotImplemented at
that point. But I guess your point in the last paragraph is that this
means there will be lots of chances to mess up the
NotImplemented-returning code in particular, especially since it's
less likely to be tested than the happy path, which seems plausible.
So basically the point of the types set is to let people factor out
that little bit of lots of functions into one common place?

It's also a pragmatic choice: libraries like dask.array and autograd.numpy have already implemented NumPy's API without overrides. These projects follow the current numpy convention: non-native array objects are coerced into native arrays (i.e., dask or autograd arrays). They don't do any type checking.

I doubt there would be much appetite for writing alternative versions of these APIs that return NotImplemented instead -- especially while this feature remains experimental.
 
I guess some careful devs might be unhappy with paying extra so that other
lazier devs can get away with being lazy, but maybe it's a good
tradeoff for us (esp. since as numpy devs, we'll be getting the bug
reports regardless :-)).

The only extra amount we pay extra is the price of converting these types into a Python data structure and passing them into the __array_function__ method call. We already had to collect them for __array_function__ itself to identify unique types to call -- so this is a pretty minimal extra cost.
 
If that's the goal, then it does make me wonder if there might be a
more direct way to accomplish it -- like, should we let classes define
an __array_function_types__ attribute that numpy would check before
even trying to dispatch to __array_function__?

This could potentially work, but now the __array_function__ protocol itself becomes more complex and out of sync with __array_ufunc__. It's a much smaller amount of additional complexity to add an additional passed argument.

I might add that if it’s a mandatory part of the protocol, then not all things will work. For example, if XArray and Dask want to support sparse arrays, they’ll need to add an explicit dependency.

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Proposal to accept NEP-18, __array_function__ protocol

Nathaniel Smith
In reply to this post by einstein.edison
On Thu, Aug 23, 2018 at 9:02 AM,  <[hidden email]> wrote:
> I might add that most duck array authors are highly unlikely to be newcomers
> to the Python space. We should just put a big warning there while enabling
> and that’ll be enough to scare away most devs from doing it by default.

That's a reasonable idea... a Big Obnoxious Warning(tm) when it's
enabled, or on first use, would achieve a lot of the same purpose.
E.g.

if this_is_the_first_array_function_usage():
    sys.stderr.write(
        "WARNING: this program uses NumPy's experimental
'__array_function__' feature.\n"
        "It may change or be removed without warning, which might
break this program.\n"
        "For details see
http://www.numpy.org/neps/nep-0018-array-function-protocol.html\n"
    )

-n

--
Nathaniel J. Smith -- https://vorpus.org
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Proposal to accept NEP-18, __array_function__ protocol

einstein.edison

Hi everyone,

On Fri, Aug 24, 2018 at 9:38 AM Nathaniel Smith <[hidden email]> wrote:
On Thu, Aug 23, 2018 at 9:02 AM,  <[hidden email]> wrote:
> I might add that most duck array authors are highly unlikely to be newcomers
> to the Python space. We should just put a big warning there while enabling
> and that’ll be enough to scare away most devs from doing it by default.

That's a reasonable idea... a Big Obnoxious Warning(tm) when it's
enabled, or on first use, would achieve a lot of the same purpose.
E.g.

if this_is_the_first_array_function_usage():
    sys.stderr.write(
        "WARNING: this program uses NumPy's experimental
'__array_function__' feature.\n"
        "It may change or be removed without warning, which might
break this program.\n"
        "For details see
http://www.numpy.org/neps/nep-0018-array-function-protocol.html\n"
    )

-n

 
I was thinking of a FutureWarning... That's essentially what it's for. Writing to stderr looks un-pythonic to me.
 
--
Nathaniel J. Smith -- https://vorpus.org
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

Best Regards,
Hameer Abbasi

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Proposal to accept NEP-18, __array_function__ protocol

Stephan Hoyer-2
In reply to this post by einstein.edison
On Thu, Aug 23, 2018 at 1:06 PM Hameer Abbasi <[hidden email]> wrote:
I might add that if it’s a mandatory part of the protocol, then not all things will work. For example, if XArray and Dask want to support sparse arrays, they’ll need to add an explicit dependency.

I don't follow -- can you please elaborate?

If you don't want to do anything with the 'types' argument, you can simply ignore it.

The problem of identifying whether arguments have valid types or not remains unchanged from the situation with __add__ or __array_ufunc__. 'types' just gives you another optional tool to help solve it.

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Proposal to accept NEP-18, __array_function__ protocol

einstein.edison


On Fri, Aug 24, 2018 at 5:55 PM Stephan Hoyer <[hidden email]> wrote:
On Thu, Aug 23, 2018 at 1:06 PM Hameer Abbasi <[hidden email]> wrote:
I might add that if it’s a mandatory part of the protocol, then not all things will work. For example, if XArray and Dask want to support sparse arrays, they’ll need to add an explicit dependency.

I don't follow -- can you please elaborate?

If we make specifying __array_function_types__ a mandatory part -- And such that it is a whitelist, the XArray or Dask would need to import sparse in order to specify that they accept mixing sparse arrays with native arrays (i.e. for adding sparse.SparseArray to __array_function_types__). Which is basically what I mean. It might be a 'soft' dependency, but there will be a dependency nonetheless.
 

If you don't want to do anything with the 'types' argument, you can simply ignore it.

The problem of identifying whether arguments have valid types or not remains unchanged from the situation with __add__ or __array_ufunc__. 'types' just gives you another optional tool to help solve it.
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Proposal to accept NEP-18, __array_function__ protocol

Nathaniel Smith
On Fri, Aug 24, 2018, 09:07 Hameer Abbasi <[hidden email]> wrote:


On Fri, Aug 24, 2018 at 5:55 PM Stephan Hoyer <[hidden email]> wrote:
On Thu, Aug 23, 2018 at 1:06 PM Hameer Abbasi <[hidden email]> wrote:
I might add that if it’s a mandatory part of the protocol, then not all things will work. For example, if XArray and Dask want to support sparse arrays, they’ll need to add an explicit dependency.

I don't follow -- can you please elaborate?

If we make specifying __array_function_types__ a mandatory part -- And such that it is a whitelist, the XArray or Dask would need to import sparse in order to specify that they accept mixing sparse arrays with native arrays (i.e. for adding sparse.SparseArray to __array_function_types__). Which is basically what I mean. It might be a 'soft' dependency, but there will be a dependency nonetheless.

Oh yeah, if we did this then we definitely wouldn't want to make it mandatory. Some `__array_function__` implementations might want to do checking another way, or support different types in different overloaded functions, or be able to handle arbitrary types.

-n

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Proposal to accept NEP-18, __array_function__ protocol

Stephan Hoyer-2
In reply to this post by einstein.edison
On Fri, Aug 24, 2018 at 1:36 AM Hameer Abbasi <[hidden email]> wrote:

On Fri, Aug 24, 2018 at 9:38 AM Nathaniel Smith <[hidden email]> wrote:
On Thu, Aug 23, 2018 at 9:02 AM,  <[hidden email]> wrote:
> I might add that most duck array authors are highly unlikely to be newcomers
> to the Python space. We should just put a big warning there while enabling
> and that’ll be enough to scare away most devs from doing it by default.

That's a reasonable idea... a Big Obnoxious Warning(tm) when it's
enabled, or on first use, would achieve a lot of the same purpose.
E.g.

if this_is_the_first_array_function_usage():
    sys.stderr.write(
        "WARNING: this program uses NumPy's experimental
'__array_function__' feature.\n"
        "It may change or be removed without warning, which might
break this program.\n"
        "For details see
http://www.numpy.org/neps/nep-0018-array-function-protocol.html\n"
    )

-n

 
I was thinking of a FutureWarning... That's essentially what it's for. Writing to stderr looks un-pythonic to me.

Issuing a FutureWarning seems roughly appropriate here. The Python 3.7 docs write:
"Base category for warnings about deprecated features when those warnings are intended for end users of applications that are written in Python."

Writing to sys.stderr directly is generally considered poor practice for a Python libraries.

In my experience FutureWarning does a good job of satisfying the goals of being a "Big Obnoxious Warning" while still being silence-able and testable with standard tools.

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Proposal to accept NEP-18, __array_function__ protocol

Nathaniel Smith
On Fri, Aug 24, 2018 at 1:46 PM, Stephan Hoyer <[hidden email]> wrote:

> On Fri, Aug 24, 2018 at 1:36 AM Hameer Abbasi <[hidden email]>
> wrote:
>>
>>
>> On Fri, Aug 24, 2018 at 9:38 AM Nathaniel Smith <[hidden email]> wrote:
>>>
>>> On Thu, Aug 23, 2018 at 9:02 AM,  <[hidden email]> wrote:
>>> > I might add that most duck array authors are highly unlikely to be
>>> > newcomers
>>> > to the Python space. We should just put a big warning there while
>>> > enabling
>>> > and that’ll be enough to scare away most devs from doing it by default.
>>>
>>> That's a reasonable idea... a Big Obnoxious Warning(tm) when it's
>>> enabled, or on first use, would achieve a lot of the same purpose.
>>> E.g.
>>>
>>> if this_is_the_first_array_function_usage():
>>>     sys.stderr.write(
>>>         "WARNING: this program uses NumPy's experimental
>>> '__array_function__' feature.\n"
>>>         "It may change or be removed without warning, which might
>>> break this program.\n"
>>>         "For details see
>>> http://www.numpy.org/neps/nep-0018-array-function-protocol.html\n"
>>>     )
>>>
>>> -n
>>>
>>
>> I was thinking of a FutureWarning... That's essentially what it's for.
>> Writing to stderr looks un-pythonic to me.
>
>
> Issuing a FutureWarning seems roughly appropriate here. The Python 3.7 docs
> write:
> "Base category for warnings about deprecated features when those warnings
> are intended for end users of applications that are written in Python."
>
> Writing to sys.stderr directly is generally considered poor practice for a
> Python libraries.
>
> In my experience FutureWarning does a good job of satisfying the goals of
> being a "Big Obnoxious Warning" while still being silence-able and testable
> with standard tools.

Yeah, the reason warnings are normally recommended is because
normally, you want to make it easy to silence. But this is the rare
case where I didn't want to make it easy to silence, so I didn't
suggest using a warning :-).

Calling warnings.warn (or the C equivalent) is also very expensive,
even if the warning ultimately isn't displayed. I guess we could do
our own tracking of whether we've displayed the warning yet, and only
even attempt to issue it once, but that partially defeats the purpose
of using warnings in the first place.

-n

--
Nathaniel J. Smith -- https://vorpus.org
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Proposal to accept NEP-18, __array_function__ protocol

einstein.edison
I’m On 25. Aug 2018, at 00:13, Nathaniel Smith <[hidden email]> wrote:

On Fri, Aug 24, 2018 at 1:46 PM, Stephan Hoyer <[hidden email]> wrote:
On Fri, Aug 24, 2018 at 1:36 AM Hameer Abbasi <[hidden email]>
wrote:


On Fri, Aug 24, 2018 at 9:38 AM Nathaniel Smith <[hidden email]> wrote:

On Thu, Aug 23, 2018 at 9:02 AM,  <[hidden email]> wrote:
I might add that most duck array authors are highly unlikely to be
newcomers
to the Python space. We should just put a big warning there while
enabling
and that’ll be enough to scare away most devs from doing it by default.

That's a reasonable idea... a Big Obnoxious Warning(tm) when it's
enabled, or on first use, would achieve a lot of the same purpose.
E.g.

if this_is_the_first_array_function_usage():
   sys.stderr.write(
       "WARNING: this program uses NumPy's experimental
'__array_function__' feature.\n"
       "It may change or be removed without warning, which might
break this program.\n"
       "For details see
<a href="http://www.numpy.org/neps/nep-0018-array-function-protocol.html\n">http://www.numpy.org/neps/nep-0018-array-function-protocol.html\n"
   )

-n


I was thinking of a FutureWarning... That's essentially what it's for.
Writing to stderr looks un-pythonic to me.


Issuing a FutureWarning seems roughly appropriate here. The Python 3.7 docs
write:
"Base category for warnings about deprecated features when those warnings
are intended for end users of applications that are written in Python."

Writing to sys.stderr directly is generally considered poor practice for a
Python libraries.

In my experience FutureWarning does a good job of satisfying the goals of
being a "Big Obnoxious Warning" while still being silence-able and testable
with standard tools.

Yeah, the reason warnings are normally recommended is because
normally, you want to make it easy to silence. But this is the rare
case where I didn't want to make it easy to silence, so I didn't
suggest using a warning :-).

I really doubt anyone is going to silence a FutureWarning and then come complaining that a feature was removed.


Calling warnings.warn (or the C equivalent) is also very expensive,
even if the warning ultimately isn't displayed. I guess we could do
our own tracking of whether we've displayed the warning yet, and only
even attempt to issue it once, but that partially defeats the purpose
of using warnings in the first place.

How about calling it at enable-time once? That’s why I suggested that in the first place.


-n

--
Nathaniel J. Smith -- https://vorpus.org
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

Best regards,
Hameer Abbasi
Sent from my iPhone

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Proposal to accept NEP-18, __array_function__ protocol

Stephan Hoyer-2
In reply to this post by Nathaniel Smith
On Fri, Aug 24, 2018 at 3:14 PM Nathaniel Smith <[hidden email]> wrote:
Yeah, the reason warnings are normally recommended is because
normally, you want to make it easy to silence. But this is the rare
case where I didn't want to make it easy to silence, so I didn't
suggest using a warning :-).

Calling warnings.warn (or the C equivalent) is also very expensive,
even if the warning ultimately isn't displayed. I guess we could do
our own tracking of whether we've displayed the warning yet, and only
even attempt to issue it once, but that partially defeats the purpose
of using warnings in the first place.

I thought the suggestion was to issue a warning when np.enable_experimental_array_function() is called. I agree that it's a non-starter to issue it every time an __array_function__ method is called -- warnings are way too slow for that.

People can redirect stderr, so we're really not stopping anyone from silencing things by doing it in a non-standard way. We're just making it annoying and non-standard. Developers could even run Python in a subprocess and filter out all the warnings -- there's really nothing we can do to stop determined abusers of this feature.

I get that you want to make this annoying and non-standard, but this is too extreme for me. Do you seriously imagine that we'll consider ourselves beholden in the future to users who didn't take us at our word?

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Proposal to accept NEP-18, __array_function__ protocol

Nathaniel Smith
On Fri, Aug 24, 2018 at 4:00 PM, Stephan Hoyer <[hidden email]> wrote:

> On Fri, Aug 24, 2018 at 3:14 PM Nathaniel Smith <[hidden email]> wrote:
>>
>> Yeah, the reason warnings are normally recommended is because
>> normally, you want to make it easy to silence. But this is the rare
>> case where I didn't want to make it easy to silence, so I didn't
>> suggest using a warning :-).
>>
>> Calling warnings.warn (or the C equivalent) is also very expensive,
>> even if the warning ultimately isn't displayed. I guess we could do
>> our own tracking of whether we've displayed the warning yet, and only
>> even attempt to issue it once, but that partially defeats the purpose
>> of using warnings in the first place.
>
>
> I thought the suggestion was to issue a warning when
> np.enable_experimental_array_function() is called. I agree that it's a
> non-starter to issue it every time an __array_function__ method is called --
> warnings are way too slow for that.

If our protection against uninformed usage is a Big Obnoxious
Warning(tm), then I was imagining that we could simplify by dropping
enable_experimental_array_function entirely. Doesn't make a big
difference either way though.

> People can redirect stderr, so we're really not stopping anyone from
> silencing things by doing it in a non-standard way. We're just making it
> annoying and non-standard. Developers could even run Python in a subprocess
> and filter out all the warnings -- there's really nothing we can do to stop
> determined abusers of this feature.
>
> I get that you want to make this annoying and non-standard, but this is too
> extreme for me. Do you seriously imagine that we'll consider ourselves
> beholden in the future to users who didn't take us at our word?

Let's break that question down into two parts:

1. if we do find ourselves in a situation where changing this would
break lots of users, will we consider ourselves beholden to them?
2. is it plausible that we'll find ourselves in that situation?

For the first question, I think the answer is... yes? We constantly
bend over backwards to try to avoid breaking users. Our deprecation
policy explicitly says that it doesn't matter what we say in the docs,
the only thing that matters is whether a change breaks users. And to
make things more complicated, it's easy to imagine scenarios where the
people being broken aren't the ones who had a chance to read the docs
– e.g. if a major package starts relying on __array_function__, then
it's all *their* users who we'd be breaking, even though they had
nothing to do with it. If any of
{tensorflow/astropy/dask/sparse/sklearn/...} did start relying on
__array_function__ for normal functionality, then *of course* that
would come up in future discussions about changing __array_function__,
and *of course* it would make us reluctant to do that. As it should,
because breaking users is bad, we should try to avoid ending up in
situations where that's what we have to do, even if we have a NEP to
point to to justify it.

But... maybe it's fine anyway, because this situation will never come
up? Obviously I hope that our downstreams are all conscientious, and
friendly, and take good care of their users, and would never create a
situation like that. I'm sure XArray won't :-). But... people are
busy, and distracted, and have pressures to get something shipped, and
corners get cut. Companies *care* about what's right, but they mostly
only *do* the minimum they have to. (Ask anyone who's tried to get
funding for OSS...) Academics *care* about what's right, but they just
don't have time to care much. So yeah... if there's a quick way to
shut up the warning and make things work (or seem to work,
temporarily), versus doing things right by talking to us, then I do
think people might take the quick hack.

The official Tensorflow wheels flat out lie about being manylinux
compatible, and the Tensorflow team has never talked to anyone about
how to fix this, they just upload them to PyPI and leave others get to
deal with the fallout [1]. That may well be what's best for their
users, I don't know. But stuff like this is normal, it happens all the
time, and if someone does it with __array_function__ then we have no
leverage.

-n

[1] https://github.com/tensorflow/tensorflow/issues/8802#issuecomment-401703703

--
Nathaniel J. Smith -- https://vorpus.org
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Proposal to accept NEP-18, __array_function__ protocol

mattip
On 29/08/18 10:37, Nathaniel Smith wrote:
> it's easy to imagine scenarios where the
> people being broken aren't the ones who had a chance to read the docs
> – e.g. if a major package starts relying on __array_function__, then
> it's all*their*  users who we'd be breaking, even though they had
> nothing to do with it.
This is a packaging problem. This proposal is intended for use by other
"major packages", not so much for end-users. We would have much more
trouble if we were proposing a broad change to something like indexing
or the random number module (see those NEPs). If we break one of those
major packages, it is on them to pin the version of NumPy they can work
with. In my opinion very few end users will be implementing their own
ndarray classes with `__array_function__`. While we will get issue
reports, we can handle them much as we do the MKL or OpenBLAS ones -
pinpoint the problem and urge users to complain to those packages.

Other than adding a warning, I am not sure what the concrete proposal is
here. To not accept the NEP?
Matti
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
1234