Experimental `like=` attribute for array creation functions

classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

Experimental `like=` attribute for array creation functions

Sebastian Berg
Hi all,

as a heads up that Peter Entschev has a PR open to add `like=` to
most array creation functions, my current plan is to merge it soon as a preliminary API and bring it up again before the actual release (in a few months).  This allows overriding for array-likes, e.g. it will allow:

   
    arr = np.asarray([3], like=dask_array)
    type(arr) is dask.array.Array

This was proposed in NEP 35:

https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function.html

Although that has not been accepted as of now, the PR is:

https://github.com/numpy/numpy/pull/16935


This was discussed in a smaller group, and is an attempt to see how we
can make the array-function protocol viable to allow packages such as
sklearn to work with non-NumPy arrays.

As of now, this would be experimental and can revisit it before the
actual NumPy release.  We should probably discuss accepting NEP 35
more. At this time, I hope that we can put in the functionality to
facilitate this discussion, rather the other way around.

If anyone feels nervous about this step, I would be happy to document
that we will not include it in the next release unless the NEP is
accepted first, or at least hide it behind an environment variable.

Cheers,

Sebastian


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Experimental `like=` attribute for array creation functions

Hameer Abbasi
Hi,

We should have a higher-bandwidth meeting/communication for all stakeholders, and particularly some library authors, to see what would be good for them.

We should definitely have language in the NEP that says it won’t be in a release unless the NEP is accepted.

Best regards,
Hameer Abbasi

--
Sent from Canary

On Monday, Aug 10, 2020 at 5:31 PM, Sebastian Berg <[hidden email]> wrote:
Hi all,

as a heads up that Peter Entschev has a PR open to add `like=` to
most array creation functions, my current plan is to merge it soon as a preliminary API and bring it up again before the actual release (in a few months). This allows overriding for array-likes, e.g. it will allow:


arr = np.asarray([3], like=dask_array)
type(arr) is dask.array.Array

This was proposed in NEP 35:

https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function.html

Although that has not been accepted as of now, the PR is:

https://github.com/numpy/numpy/pull/16935


This was discussed in a smaller group, and is an attempt to see how we
can make the array-function protocol viable to allow packages such as
sklearn to work with non-NumPy arrays.

As of now, this would be experimental and can revisit it before the
actual NumPy release. We should probably discuss accepting NEP 35
more. At this time, I hope that we can put in the functionality to
facilitate this discussion, rather the other way around.

If anyone feels nervous about this step, I would be happy to document
that we will not include it in the next release unless the NEP is
accepted first, or at least hide it behind an environment variable.

Cheers,

Sebastian

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Experimental `like=` attribute for array creation functions

Sebastian Berg
On Mon, 2020-08-10 at 17:35 +0200, Hameer Abbasi wrote:
> Hi,
>
> We should have a higher-bandwidth meeting/communication for all
> stakeholders, and particularly some library authors, to see what
> would be good for them.
>
> We should definitely have language in the NEP that says it won’t be
> in a release unless the NEP is accepted.

In that case, I think the important part is to have language right now
in the implementation, although that can refer to the NEP itself of
course.
You can't expect everyone who may be tempted to use it to actually read
the NEP draft, at least not without pointing it out.

I will say that I think it is not very high risk, because I think
annoying or not, the argument could be deprecated again with a
transition short phase. Admittedly, that argument only works if we have
a replacement solution.

Cheers,

Sebastian


>
> Best regards,
> Hameer Abbasi
>
> --
> Sent from Canary (https://canarymail.io)
>
> > On Monday, Aug 10, 2020 at 5:31 PM, Sebastian Berg <
> > [hidden email] (mailto:[hidden email])>
> > wrote:
> > Hi all,
> >
> > as a heads up that Peter Entschev has a PR open to add `like=` to
> > most array creation functions, my current plan is to merge it soon
> > as a preliminary API and bring it up again before the actual
> > release (in a few months). This allows overriding for array-likes,
> > e.g. it will allow:
> >
> >
> > arr = np.asarray([3], like=dask_array)
> > type(arr) is dask.array.Array
> >
> > This was proposed in NEP 35:
> >
> > https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function.html
> >
> > Although that has not been accepted as of now, the PR is:
> >
> > https://github.com/numpy/numpy/pull/16935
> >
> >
> > This was discussed in a smaller group, and is an attempt to see how
> > we
> > can make the array-function protocol viable to allow packages such
> > as
> > sklearn to work with non-NumPy arrays.
> >
> > As of now, this would be experimental and can revisit it before the
> > actual NumPy release. We should probably discuss accepting NEP 35
> > more. At this time, I hope that we can put in the functionality to
> > facilitate this discussion, rather the other way around.
> >
> > If anyone feels nervous about this step, I would be happy to
> > document
> > that we will not include it in the next release unless the NEP is
> > accepted first, or at least hide it behind an environment variable.
> >
> > Cheers,
> >
> > Sebastian
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > [hidden email]
> > https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Experimental `like=` attribute for array creation functions

ralfgommers


On Mon, Aug 10, 2020 at 8:37 PM Sebastian Berg <[hidden email]> wrote:
On Mon, 2020-08-10 at 17:35 +0200, Hameer Abbasi wrote:
> Hi,
>
> We should have a higher-bandwidth meeting/communication for all
> stakeholders, and particularly some library authors, to see what
> would be good for them.

I'm not sure that helps. At this point there's little progress since the last meeting, I think the plan is unchanged: we need implementations of all the options on offer, and then try them out in PRs for scikit-learn, SciPy and perhaps another package who's maintainers are interested, to test like=, __array_module__ in realistic situations.


>
> We should definitely have language in the NEP that says it won’t be
> in a release unless the NEP is accepted.

In that case, I think the important part is to have language right now
in the implementation, although that can refer to the NEP itself of
course.
You can't expect everyone who may be tempted to use it to actually read
the NEP draft, at least not without pointing it out.

Agreed, I think the decision is on this list not in the NEP, and to make sure we won't forget we need an issue opened with the 1.20 milestone.

Cheers,
Ralf


I will say that I think it is not very high risk, because I think
annoying or not, the argument could be deprecated again with a
transition short phase. Admittedly, that argument only works if we have
a replacement solution.

Cheers,

Sebastian


>
> Best regards,
> Hameer Abbasi
>
> --
> Sent from Canary (https://canarymail.io)
>
> > On Monday, Aug 10, 2020 at 5:31 PM, Sebastian Berg <
> > [hidden email] (mailto:[hidden email])>
> > wrote:
> > Hi all,
> >
> > as a heads up that Peter Entschev has a PR open to add `like=` to
> > most array creation functions, my current plan is to merge it soon
> > as a preliminary API and bring it up again before the actual
> > release (in a few months). This allows overriding for array-likes,
> > e.g. it will allow:
> >
> >
> > arr = np.asarray([3], like=dask_array)
> > type(arr) is dask.array.Array
> >
> > This was proposed in NEP 35:
> >
> > https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function.html
> >
> > Although that has not been accepted as of now, the PR is:
> >
> > https://github.com/numpy/numpy/pull/16935
> >
> >
> > This was discussed in a smaller group, and is an attempt to see how
> > we
> > can make the array-function protocol viable to allow packages such
> > as
> > sklearn to work with non-NumPy arrays.
> >
> > As of now, this would be experimental and can revisit it before the
> > actual NumPy release. We should probably discuss accepting NEP 35
> > more. At this time, I hope that we can put in the functionality to
> > facilitate this discussion, rather the other way around.
> >
> > If anyone feels nervous about this step, I would be happy to
> > document
> > that we will not include it in the next release unless the NEP is
> > accepted first, or at least hide it behind an environment variable.
> >
> > Cheers,
> >
> > Sebastian
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > [hidden email]
> > https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Experimental `like=` attribute for array creation functions

Ilhan Polat
For what is worth, as a potential consumer in SciPy, it really doesn't say anything (both in NEP and the PR) about how the regular users of NumPy will benefit from this. If only and only 3rd parties are going to benefit from it, I am not sure adding a new keyword to an already confusing function is the right thing to do.

Let me clarify,

- This is already a very (I mean extremely very) easy keyword name to confuse with ones_like, zeros_like and by its nature any other interpretation. It is not signalling anything about the functionality that is being discussed. I would seriously consider reserving such obvious names for really obvious tasks. Because you would also expect the shape and ndim would be mimicked by the "like"d argument but it turns out it is acting more like "typeof=" and not "like=" at all. Because if we follow the semantics it reads as "make your argument asarray like the other thing" but it is actually doing, "make your argument an array with the other thing's type" which might not be an array after all.

- Again, if this is meant for downstream libraries (because that's what I got out of the PR discussion, cupy, dask, and JAX were the only examples I could read) then hiding it in another function and writing with capital letters "this is not meant for numpy users" would be a much more convenient way to separate the target audience and regular users. numpy.astypedarray([[some data], [...]], type_of=x) or whatever else it may be would be quite clean and to the point with no ambiguous keywords.

I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.

Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.

best,
ilhan







On Tue, Aug 11, 2020 at 12:18 AM Ralf Gommers <[hidden email]> wrote:


On Mon, Aug 10, 2020 at 8:37 PM Sebastian Berg <[hidden email]> wrote:
On Mon, 2020-08-10 at 17:35 +0200, Hameer Abbasi wrote:
> Hi,
>
> We should have a higher-bandwidth meeting/communication for all
> stakeholders, and particularly some library authors, to see what
> would be good for them.

I'm not sure that helps. At this point there's little progress since the last meeting, I think the plan is unchanged: we need implementations of all the options on offer, and then try them out in PRs for scikit-learn, SciPy and perhaps another package who's maintainers are interested, to test like=, __array_module__ in realistic situations.


>
> We should definitely have language in the NEP that says it won’t be
> in a release unless the NEP is accepted.

In that case, I think the important part is to have language right now
in the implementation, although that can refer to the NEP itself of
course.
You can't expect everyone who may be tempted to use it to actually read
the NEP draft, at least not without pointing it out.

Agreed, I think the decision is on this list not in the NEP, and to make sure we won't forget we need an issue opened with the 1.20 milestone.

Cheers,
Ralf


I will say that I think it is not very high risk, because I think
annoying or not, the argument could be deprecated again with a
transition short phase. Admittedly, that argument only works if we have
a replacement solution.

Cheers,

Sebastian


>
> Best regards,
> Hameer Abbasi
>
> --
> Sent from Canary (https://canarymail.io)
>
> > On Monday, Aug 10, 2020 at 5:31 PM, Sebastian Berg <
> > [hidden email] (mailto:[hidden email])>
> > wrote:
> > Hi all,
> >
> > as a heads up that Peter Entschev has a PR open to add `like=` to
> > most array creation functions, my current plan is to merge it soon
> > as a preliminary API and bring it up again before the actual
> > release (in a few months). This allows overriding for array-likes,
> > e.g. it will allow:
> >
> >
> > arr = np.asarray([3], like=dask_array)
> > type(arr) is dask.array.Array
> >
> > This was proposed in NEP 35:
> >
> > https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function.html
> >
> > Although that has not been accepted as of now, the PR is:
> >
> > https://github.com/numpy/numpy/pull/16935
> >
> >
> > This was discussed in a smaller group, and is an attempt to see how
> > we
> > can make the array-function protocol viable to allow packages such
> > as
> > sklearn to work with non-NumPy arrays.
> >
> > As of now, this would be experimental and can revisit it before the
> > actual NumPy release. We should probably discuss accepting NEP 35
> > more. At this time, I hope that we can put in the functionality to
> > facilitate this discussion, rather the other way around.
> >
> > If anyone feels nervous about this step, I would be happy to
> > document
> > that we will not include it in the next release unless the NEP is
> > accepted first, or at least hide it behind an environment variable.
> >
> > Cheers,
> >
> > Sebastian
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > [hidden email]
> > https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Experimental `like=` attribute for array creation functions

Juan Nunez-Iglesias-2
I’ve generally been on the “let the NumPy devs worry about it” side of things, but I do agree with Ilhan that `like=` is confusing and `typeof=` would be a much more appropriate name for that parameter.

I do think library writers are NumPy users and so I wouldn’t really make that distinction, though. Users writing their own analysis code could very well be interested in writing code using numpy functions that will transparently work when the input is a CuPy array or whatever.

I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.

Food for thought.

Juan.

On 13 Aug 2020, at 9:24 am, Ilhan Polat <[hidden email]> wrote:

For what is worth, as a potential consumer in SciPy, it really doesn't say anything (both in NEP and the PR) about how the regular users of NumPy will benefit from this. If only and only 3rd parties are going to benefit from it, I am not sure adding a new keyword to an already confusing function is the right thing to do.

Let me clarify,

- This is already a very (I mean extremely very) easy keyword name to confuse with ones_like, zeros_like and by its nature any other interpretation. It is not signalling anything about the functionality that is being discussed. I would seriously consider reserving such obvious names for really obvious tasks. Because you would also expect the shape and ndim would be mimicked by the "like"d argument but it turns out it is acting more like "typeof=" and not "like=" at all. Because if we follow the semantics it reads as "make your argument asarray like the other thing" but it is actually doing, "make your argument an array with the other thing's type" which might not be an array after all.

- Again, if this is meant for downstream libraries (because that's what I got out of the PR discussion, cupy, dask, and JAX were the only examples I could read) then hiding it in another function and writing with capital letters "this is not meant for numpy users" would be a much more convenient way to separate the target audience and regular users. numpy.astypedarray([[some data], [...]], type_of=x) or whatever else it may be would be quite clean and to the point with no ambiguous keywords.

I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.

Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.

best,
ilhan







On Tue, Aug 11, 2020 at 12:18 AM Ralf Gommers <[hidden email]> wrote:


On Mon, Aug 10, 2020 at 8:37 PM Sebastian Berg <[hidden email]> wrote:
On Mon, 2020-08-10 at 17:35 +0200, Hameer Abbasi wrote:
> Hi,
>
> We should have a higher-bandwidth meeting/communication for all
> stakeholders, and particularly some library authors, to see what
> would be good for them.

I'm not sure that helps. At this point there's little progress since the last meeting, I think the plan is unchanged: we need implementations of all the options on offer, and then try them out in PRs for scikit-learn, SciPy and perhaps another package who's maintainers are interested, to test like=, __array_module__ in realistic situations.


>
> We should definitely have language in the NEP that says it won’t be
> in a release unless the NEP is accepted.

In that case, I think the important part is to have language right now
in the implementation, although that can refer to the NEP itself of
course.
You can't expect everyone who may be tempted to use it to actually read
the NEP draft, at least not without pointing it out.

Agreed, I think the decision is on this list not in the NEP, and to make sure we won't forget we need an issue opened with the 1.20 milestone.

Cheers,
Ralf


I will say that I think it is not very high risk, because I think
annoying or not, the argument could be deprecated again with a
transition short phase. Admittedly, that argument only works if we have
a replacement solution.

Cheers,

Sebastian


>
> Best regards,
> Hameer Abbasi
>
> --
> Sent from Canary (https://canarymail.io)
>
> > On Monday, Aug 10, 2020 at 5:31 PM, Sebastian Berg <
> > [hidden email] (mailto:[hidden email])>
> > wrote:
> > Hi all,
> >
> > as a heads up that Peter Entschev has a PR open to add `like=` to
> > most array creation functions, my current plan is to merge it soon
> > as a preliminary API and bring it up again before the actual
> > release (in a few months). This allows overriding for array-likes,
> > e.g. it will allow:
> >
> >
> > arr = np.asarray([3], like=dask_array)
> > type(arr) is dask.array.Array
> >
> > This was proposed in NEP 35:
> >
> > https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function.html
> >
> > Although that has not been accepted as of now, the PR is:
> >
> > https://github.com/numpy/numpy/pull/16935
> >
> >
> > This was discussed in a smaller group, and is an attempt to see how
> > we
> > can make the array-function protocol viable to allow packages such
> > as
> > sklearn to work with non-NumPy arrays.
> >
> > As of now, this would be experimental and can revisit it before the
> > actual NumPy release. We should probably discuss accepting NEP 35
> > more. At this time, I hope that we can put in the functionality to
> > facilitate this discussion, rather the other way around.
> >
> > If anyone feels nervous about this step, I would be happy to
> > document
> > that we will not include it in the next release unless the NEP is
> > accepted first, or at least hide it behind an environment variable.
> >
> > Cheers,
> >
> > Sebastian
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > [hidden email]
> > https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Experimental `like=` attribute for array creation functions

Peter Andreas Entschev
> I am not sure adding a new keyword to an already confusing function is the right thing to do.

Could you clarify what is the confusing function in question?

> This is already a very (I mean extremely very) easy keyword name to confuse with ones_like, zeros_like and by its nature any other interpretation.

To be fair, the usage is the same. Therefore
empty_like(downstream_array, ...) and empty(downstream_array, ...,
like=downstream_array) should have the exact same behavior, which is
arguably redundant now.

> It is not signalling anything about the functionality that is being discussed. I would seriously consider reserving such obvious names for really obvious tasks. Because you would also expect the shape and ndim would be mimicked by the "like"d argument but it turns out it is acting more like "typeof=" and not "like=" at all.

I understand this can be confusing, and naming was one of the hardest
discussions as there's no clear unambiguous name to use for this
keyword, "like=" was simply the name that got closer to converging
during discussions. At the same time I think "typeof=" is perhaps a
better name than "like=", it could be very much confusing with
"dtype=", and that would possibly just shift the confusion.

> Again, if this is meant for downstream libraries (because that's what I got out of the PR discussion, cupy, dask, and JAX were the only examples I could read) then hiding it in another function and writing with capital letters "this is not meant for numpy users" would be a much more convenient way to separate the target audience and regular users.

The problem with this approach is that the __array_function__ protocol
relies on downstream libraries implementing functions with the same
signature (for example, Dask and CuPy both implement an "array"
function that matches NumPy). The purpose of __array_function__ and
NEP-35 is to introduce only minimal changes to both NumPy's API and
downstream libraries. Of course adding new functions for such cases
would work, but IMO it would defeat the purpose of __array_function__
in general as it would require a considerable amount of work in
downstream libraries, and we discussed this previously deciding that
an argument is better than many new functions [1].

> I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.

This is what I intended to do in the Usage Guidance [2] section. Could
you elaborate on what more information you'd want to see there? Or is
it just a matter of reorganizing the NEP a bit to try and summarize
such things right at the top?

> Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.

TBH, I don't really know how to solve that point, so if you have any
specific suggestions, that's certainly welcome. I understand the
frustration for a reader trying to understand all the details, with
many being only described in NEP-18 [3], but we also strive to avoid
rewriting things that are written elsewhere, which would also
overburden those who are aware of what's being discussed.

> I’ve generally been on the “let the NumPy devs worry about it” side of things, but I do agree with Ilhan that `like=` is confusing and `typeof=` would be a much more appropriate name for that parameter.

To be clear, I have no strong opinion on renaming it, I'm fine either
way but I think it's unrealistic to expect that we find somewhat
short, unambiguous and properly descriptive names in a single name. If
the preference now shifts towards the "typeof=" name, we can change
it, but "like=" was really named after "empty_like" and similar
functions.

> I do think library writers are NumPy users and so I wouldn’t really make that distinction, though. Users writing their own analysis code could very well be interested in writing code using numpy functions that will transparently work when the input is a CuPy array or whatever.

I'm guessing this is somewhat of a loose definition of "library", to
some extent if you really need "like=" it means that you're writing
your own functions around the NumPy API (and that IMO is a library,
even if you call it something else), rather than just writing your
application on top of the existing NumPy API. I'm also happy to
rephrase that in the NEP if people feel it should be done.

> I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.

This is a good point, and we do always notify people over the mailing
list of new NEPs as per NEP-0 [4], which was done for NEP-35 [5]
(originally NEP-33, but renamed due to other open NEPs at that time),
unfortunately not many concerns were raised about that back then.

Best,
Peter

[1] https://github.com/numpy/numpy/issues/14441#issuecomment-529969572
[2] https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function.html#usage-guidance
[3] https://numpy.org/neps/nep-0018-array-function-protocol.html
[4] https://numpy.org/neps/nep-0000.html#nep-workflow
[5] https://mail.python.org/pipermail/numpy-discussion/2019-October/080176.html


On Thu, Aug 13, 2020 at 3:44 AM Juan Nunez-Iglesias <[hidden email]> wrote:

>
> I’ve generally been on the “let the NumPy devs worry about it” side of things, but I do agree with Ilhan that `like=` is confusing and `typeof=` would be a much more appropriate name for that parameter.
>
> I do think library writers are NumPy users and so I wouldn’t really make that distinction, though. Users writing their own analysis code could very well be interested in writing code using numpy functions that will transparently work when the input is a CuPy array or whatever.
>
> I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
>
> Food for thought.
>
> Juan.
>
> On 13 Aug 2020, at 9:24 am, Ilhan Polat <[hidden email]> wrote:
>
> For what is worth, as a potential consumer in SciPy, it really doesn't say anything (both in NEP and the PR) about how the regular users of NumPy will benefit from this. If only and only 3rd parties are going to benefit from it, I am not sure adding a new keyword to an already confusing function is the right thing to do.
>
> Let me clarify,
>
> - This is already a very (I mean extremely very) easy keyword name to confuse with ones_like, zeros_like and by its nature any other interpretation. It is not signalling anything about the functionality that is being discussed. I would seriously consider reserving such obvious names for really obvious tasks. Because you would also expect the shape and ndim would be mimicked by the "like"d argument but it turns out it is acting more like "typeof=" and not "like=" at all. Because if we follow the semantics it reads as "make your argument asarray like the other thing" but it is actually doing, "make your argument an array with the other thing's type" which might not be an array after all.
>
> - Again, if this is meant for downstream libraries (because that's what I got out of the PR discussion, cupy, dask, and JAX were the only examples I could read) then hiding it in another function and writing with capital letters "this is not meant for numpy users" would be a much more convenient way to separate the target audience and regular users. numpy.astypedarray([[some data], [...]], type_of=x) or whatever else it may be would be quite clean and to the point with no ambiguous keywords.
>
> I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
>
> Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
>
> best,
> ilhan
>
>
>
>
>
>
>
> On Tue, Aug 11, 2020 at 12:18 AM Ralf Gommers <[hidden email]> wrote:
>>
>>
>>
>> On Mon, Aug 10, 2020 at 8:37 PM Sebastian Berg <[hidden email]> wrote:
>>>
>>> On Mon, 2020-08-10 at 17:35 +0200, Hameer Abbasi wrote:
>>> > Hi,
>>> >
>>> > We should have a higher-bandwidth meeting/communication for all
>>> > stakeholders, and particularly some library authors, to see what
>>> > would be good for them.
>>
>>
>> I'm not sure that helps. At this point there's little progress since the last meeting, I think the plan is unchanged: we need implementations of all the options on offer, and then try them out in PRs for scikit-learn, SciPy and perhaps another package who's maintainers are interested, to test like=, __array_module__ in realistic situations.
>>
>>
>>> >
>>> > We should definitely have language in the NEP that says it won’t be
>>> > in a release unless the NEP is accepted.
>>>
>>> In that case, I think the important part is to have language right now
>>> in the implementation, although that can refer to the NEP itself of
>>> course.
>>> You can't expect everyone who may be tempted to use it to actually read
>>> the NEP draft, at least not without pointing it out.
>>
>>
>> Agreed, I think the decision is on this list not in the NEP, and to make sure we won't forget we need an issue opened with the 1.20 milestone.
>>
>> Cheers,
>> Ralf
>>
>>>
>>> I will say that I think it is not very high risk, because I think
>>> annoying or not, the argument could be deprecated again with a
>>> transition short phase. Admittedly, that argument only works if we have
>>> a replacement solution.
>>>
>>> Cheers,
>>>
>>> Sebastian
>>>
>>>
>>> >
>>> > Best regards,
>>> > Hameer Abbasi
>>> >
>>> > --
>>> > Sent from Canary (https://canarymail.io)
>>> >
>>> > > On Monday, Aug 10, 2020 at 5:31 PM, Sebastian Berg <
>>> > > [hidden email] (mailto:[hidden email])>
>>> > > wrote:
>>> > > Hi all,
>>> > >
>>> > > as a heads up that Peter Entschev has a PR open to add `like=` to
>>> > > most array creation functions, my current plan is to merge it soon
>>> > > as a preliminary API and bring it up again before the actual
>>> > > release (in a few months). This allows overriding for array-likes,
>>> > > e.g. it will allow:
>>> > >
>>> > >
>>> > > arr = np.asarray([3], like=dask_array)
>>> > > type(arr) is dask.array.Array
>>> > >
>>> > > This was proposed in NEP 35:
>>> > >
>>> > > https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function.html
>>> > >
>>> > > Although that has not been accepted as of now, the PR is:
>>> > >
>>> > > https://github.com/numpy/numpy/pull/16935
>>> > >
>>> > >
>>> > > This was discussed in a smaller group, and is an attempt to see how
>>> > > we
>>> > > can make the array-function protocol viable to allow packages such
>>> > > as
>>> > > sklearn to work with non-NumPy arrays.
>>> > >
>>> > > As of now, this would be experimental and can revisit it before the
>>> > > actual NumPy release. We should probably discuss accepting NEP 35
>>> > > more. At this time, I hope that we can put in the functionality to
>>> > > facilitate this discussion, rather the other way around.
>>> > >
>>> > > If anyone feels nervous about this step, I would be happy to
>>> > > document
>>> > > that we will not include it in the next release unless the NEP is
>>> > > accepted first, or at least hide it behind an environment variable.
>>> > >
>>> > > Cheers,
>>> > >
>>> > > Sebastian
>>> > >
>>> > > _______________________________________________
>>> > > NumPy-Discussion mailing list
>>> > > [hidden email]
>>> > > https://mail.python.org/mailman/listinfo/numpy-discussion
>>> >
>>> > _______________________________________________
>>> > NumPy-Discussion mailing list
>>> > [hidden email]
>>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> [hidden email]
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> [hidden email]
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Experimental `like=` attribute for array creation functions

ralfgommers
Thanks for raising these concerns Ilhan and Juan, and for answering Peter. Let me give my perspective as well.

To start with, this is not specifically about Peter's NEP and PR. NEP 35 simply follows the pattern set by previous PRs, and given its tight scope is less difficult to understand than other NEPs on such technical topics. Peter has done a lot of things right, and is close to the finish line.


On Thu, Aug 13, 2020 at 12:02 PM Peter Andreas Entschev <[hidden email]> wrote:

> I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.

This is what I intended to do in the Usage Guidance [2] section. Could
you elaborate on what more information you'd want to see there? Or is
it just a matter of reorganizing the NEP a bit to try and summarize
such things right at the top?

We adapted the NEP template [6] several times last year to try and improve this. And specified in there as well that NEP content set to the mailing list should only contain the sections: Abstract, Motivation and Scope, Usage and Impact, and Backwards compatibility. This to ensure we fully understand the "why" and "what" before the "how". Unfortunately that template and procedure hasn't been exercised much yet, only in NEP 38 [7] and partially in NEP 41 [8].

If we have long-time maintainers of SciPy (Ilhan and myself), scikit-image (Juan) and CuPy (Leo, on the PR review) all saying they don't understand the goals, relevance, target audience, or how they're supposed to use a new feature, that indicates that the people doing the writing and having the discussion are doing something wrong at a very fundamental level.

At this point I'm pretty disappointed in and tired of how we write and discuss NEPs on technical topics like dispatching, dtypes and the like. People literally refuse to write down concrete motivations, goals and non-goals, code that's problematic now and will be better/working post-NEP and usage examples before launching into extensive discussion of the gory details of the internals. I'm not sure what to do about it. Completely separate API and behavior proposals from implementation proposals? Make separate "API" and "internals" teams with the likes of Juan, Ilhan and Leo on the API team which then needs to approve every API change in new NEPs? Offer to co-write NEPs if someone is willing but doesn't understand how to go about it? Keep the current structure/process but veto further approvals until NEP authors get it right?

I want to make an exception for merging the current NEP, for which the plan is to merge it as experimental to try in downstream PRs and get more experience. That does mean that master will be in an unreleasable state by the way, which is unusual and it'd be nice to get Chuck's explicit OK for that. But after that, I think we need a change here. I would like to hear what everyone thinks is the shape that change should take - any of my above suggestions, or something else?

 
> Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.

TBH, I don't really know how to solve that point, so if you have any
specific suggestions, that's certainly welcome. I understand the
frustration for a reader trying to understand all the details, with
many being only described in NEP-18 [3], but we also strive to avoid
rewriting things that are written elsewhere, which would also
overburden those who are aware of what's being discussed.


> I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.

Some variant of this proposal would be my preference.

Cheers,
Ralf


[1] https://github.com/numpy/numpy/issues/14441#issuecomment-529969572
[2] https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function.html#usage-guidance
[3] https://numpy.org/neps/nep-0018-array-function-protocol.html
[4] https://numpy.org/neps/nep-0000.html#nep-workflow
[5] https://mail.python.org/pipermail/numpy-discussion/2019-October/080176.html






On Thu, Aug 13, 2020 at 3:44 AM Juan Nunez-Iglesias <[hidden email]> wrote:
>
> I’ve generally been on the “let the NumPy devs worry about it” side of things, but I do agree with Ilhan that `like=` is confusing and `typeof=` would be a much more appropriate name for that parameter.
>
> I do think library writers are NumPy users and so I wouldn’t really make that distinction, though. Users writing their own analysis code could very well be interested in writing code using numpy functions that will transparently work when the input is a CuPy array or whatever.
>
> I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
>
> Food for thought.
>
> Juan.
>
> On 13 Aug 2020, at 9:24 am, Ilhan Polat <[hidden email]> wrote:
>
> For what is worth, as a potential consumer in SciPy, it really doesn't say anything (both in NEP and the PR) about how the regular users of NumPy will benefit from this. If only and only 3rd parties are going to benefit from it, I am not sure adding a new keyword to an already confusing function is the right thing to do.
>
> Let me clarify,
>
> - This is already a very (I mean extremely very) easy keyword name to confuse with ones_like, zeros_like and by its nature any other interpretation. It is not signalling anything about the functionality that is being discussed. I would seriously consider reserving such obvious names for really obvious tasks. Because you would also expect the shape and ndim would be mimicked by the "like"d argument but it turns out it is acting more like "typeof=" and not "like=" at all. Because if we follow the semantics it reads as "make your argument asarray like the other thing" but it is actually doing, "make your argument an array with the other thing's type" which might not be an array after all.
>
> - Again, if this is meant for downstream libraries (because that's what I got out of the PR discussion, cupy, dask, and JAX were the only examples I could read) then hiding it in another function and writing with capital letters "this is not meant for numpy users" would be a much more convenient way to separate the target audience and regular users. numpy.astypedarray([[some data], [...]], type_of=x) or whatever else it may be would be quite clean and to the point with no ambiguous keywords.
>
> I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
>
> Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Experimental `like=` attribute for array creation functions

Ilhan Polat
To maybe lighten up the discussion a bit and to make my outsider confusion more tangible, let me start by apologizing for diving head first without weighing the past luggage :-) I always forget how much effort goes into these things and for outsiders like me, it's a matter of dipping the finger and tasting it just before starting to complain how much salt is missing etc. What I was mentioning about NEPs wasn't only related specifically to this one by the way. It's the generic feeling that I have.

First let me start what I mean by NumPy users and downstreamers distinction. This is very much related to how data-science and huge-array users are magnetizing every tool out there in the Python world which is fine though the majority of number-crunchers have nothing to do with any of GPU/Parallelism/ClusterUsage etc. Hence when I mention NumPy users, think of people who use NumPy as its own right with no duck-typing and nothing related to subclassing. Just straightforward array creation and lots of ops on these arrays. For those people (I'm one of them), this option brings in a keyword that we would never use. And it gets into many major functions (linspace and others mentioned somewhere). So it has a very appealing name but has nothing to do with me in an already very crowded namespace and keyword catalogue. That's basically a UX issue to be addressed (under the assumption that users like me are the majority). Either making its name as esoteric as possible so I naturally stay away from it or I don't see it. This has absolutely nothing to do with looking down on the downstream libraries. They are flat-out amazing and the more we can support them the merrier.

Using yet another metaphor, I was hoping that NumPy would have a loading dock for heavy duty deliveries for downstream projects or specialized array creations and won't disturb the regular customer entrance. Because if I look at this page https://numpy.org/doc/stable/referenc/routines.array-creation.html, there are a lot of functions and I think most of them are candidates to gain this keyword.  I wish I can comment on a viable alternative but I really cannot understand the _array_xxxx_ discussions since they fly way over my head no matter how many times I tried. So that's why I naively mentioned the "np.astypedarray" or "np.asarray_but_not_numpy_array" or whatever. Now I see that it is even more complicated and I generated extra noise. So you can just ignore my previous suggestions. Except that I want to draw attention to the UX problem and I'd like to leave it at that.

The other point is about the NEP stuff. I think I need to elaborate. If the NEPs are meant for internal NumPy discussions, then by all means, crank up the pointer*-meter to 11 and dive into it, totally fine with me. But if you also want to get feedback from outside, then probably a few lines of code examples for mere mortals would go a long way. Also it would make the discussion much more streamlined in my humble opinion. What I was trying to get at was that almost all NEPs read like a legal document that I want to agree as soon as possible. Because they often come without any or minimal amount of code in it. In NEP35 for example, there are nice code blocks in function dispatching but I guess it's not meant for me. Because it is only decorating asarray with some black magic happening there somehow (I guess). So I can't even comprehend what the proposition would mean for the regular, friendly, anti-duck users. But I am pretty sure it is about dispatching something because the word is repeated ~20 times :-)  Thus the feedback would be limited. That was also what I meant there. But again I totally understand the complexity of these issues. So I'm not expecting to understand all details of NumPy machinery in a single NEP.

But anyways, hope this clarifies a few things that I failed to convey in my previous mail.
ilhan



On Thu, Aug 13, 2020 at 2:23 PM Ralf Gommers <[hidden email]> wrote:
Thanks for raising these concerns Ilhan and Juan, and for answering Peter. Let me give my perspective as well.

To start with, this is not specifically about Peter's NEP and PR. NEP 35 simply follows the pattern set by previous PRs, and given its tight scope is less difficult to understand than other NEPs on such technical topics. Peter has done a lot of things right, and is close to the finish line.


On Thu, Aug 13, 2020 at 12:02 PM Peter Andreas Entschev <[hidden email]> wrote:

> I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.

This is what I intended to do in the Usage Guidance [2] section. Could
you elaborate on what more information you'd want to see there? Or is
it just a matter of reorganizing the NEP a bit to try and summarize
such things right at the top?

We adapted the NEP template [6] several times last year to try and improve this. And specified in there as well that NEP content set to the mailing list should only contain the sections: Abstract, Motivation and Scope, Usage and Impact, and Backwards compatibility. This to ensure we fully understand the "why" and "what" before the "how". Unfortunately that template and procedure hasn't been exercised much yet, only in NEP 38 [7] and partially in NEP 41 [8].

If we have long-time maintainers of SciPy (Ilhan and myself), scikit-image (Juan) and CuPy (Leo, on the PR review) all saying they don't understand the goals, relevance, target audience, or how they're supposed to use a new feature, that indicates that the people doing the writing and having the discussion are doing something wrong at a very fundamental level.

At this point I'm pretty disappointed in and tired of how we write and discuss NEPs on technical topics like dispatching, dtypes and the like. People literally refuse to write down concrete motivations, goals and non-goals, code that's problematic now and will be better/working post-NEP and usage examples before launching into extensive discussion of the gory details of the internals. I'm not sure what to do about it. Completely separate API and behavior proposals from implementation proposals? Make separate "API" and "internals" teams with the likes of Juan, Ilhan and Leo on the API team which then needs to approve every API change in new NEPs? Offer to co-write NEPs if someone is willing but doesn't understand how to go about it? Keep the current structure/process but veto further approvals until NEP authors get it right?

I want to make an exception for merging the current NEP, for which the plan is to merge it as experimental to try in downstream PRs and get more experience. That does mean that master will be in an unreleasable state by the way, which is unusual and it'd be nice to get Chuck's explicit OK for that. But after that, I think we need a change here. I would like to hear what everyone thinks is the shape that change should take - any of my above suggestions, or something else?

 
> Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.

TBH, I don't really know how to solve that point, so if you have any
specific suggestions, that's certainly welcome. I understand the
frustration for a reader trying to understand all the details, with
many being only described in NEP-18 [3], but we also strive to avoid
rewriting things that are written elsewhere, which would also
overburden those who are aware of what's being discussed.


> I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.

Some variant of this proposal would be my preference.

Cheers,
Ralf


[1] https://github.com/numpy/numpy/issues/14441#issuecomment-529969572
[2] https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function.html#usage-guidance
[3] https://numpy.org/neps/nep-0018-array-function-protocol.html
[4] https://numpy.org/neps/nep-0000.html#nep-workflow
[5] https://mail.python.org/pipermail/numpy-discussion/2019-October/080176.html






On Thu, Aug 13, 2020 at 3:44 AM Juan Nunez-Iglesias <[hidden email]> wrote:
>
> I’ve generally been on the “let the NumPy devs worry about it” side of things, but I do agree with Ilhan that `like=` is confusing and `typeof=` would be a much more appropriate name for that parameter.
>
> I do think library writers are NumPy users and so I wouldn’t really make that distinction, though. Users writing their own analysis code could very well be interested in writing code using numpy functions that will transparently work when the input is a CuPy array or whatever.
>
> I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
>
> Food for thought.
>
> Juan.
>
> On 13 Aug 2020, at 9:24 am, Ilhan Polat <[hidden email]> wrote:
>
> For what is worth, as a potential consumer in SciPy, it really doesn't say anything (both in NEP and the PR) about how the regular users of NumPy will benefit from this. If only and only 3rd parties are going to benefit from it, I am not sure adding a new keyword to an already confusing function is the right thing to do.
>
> Let me clarify,
>
> - This is already a very (I mean extremely very) easy keyword name to confuse with ones_like, zeros_like and by its nature any other interpretation. It is not signalling anything about the functionality that is being discussed. I would seriously consider reserving such obvious names for really obvious tasks. Because you would also expect the shape and ndim would be mimicked by the "like"d argument but it turns out it is acting more like "typeof=" and not "like=" at all. Because if we follow the semantics it reads as "make your argument asarray like the other thing" but it is actually doing, "make your argument an array with the other thing's type" which might not be an array after all.
>
> - Again, if this is meant for downstream libraries (because that's what I got out of the PR discussion, cupy, dask, and JAX were the only examples I could read) then hiding it in another function and writing with capital letters "this is not meant for numpy users" would be a much more convenient way to separate the target audience and regular users. numpy.astypedarray([[some data], [...]], type_of=x) or whatever else it may be would be quite clean and to the point with no ambiguous keywords.
>
> I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
>
> Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Experimental `like=` attribute for array creation functions

Peter Andreas Entschev
In reply to this post by ralfgommers
> We adapted the NEP template [6] several times last year to try and improve this. And specified in there as well that NEP content set to the mailing list should only contain the sections: Abstract, Motivation and Scope, Usage and Impact, and Backwards compatibility. This to ensure we fully understand the "why" and "what" before the "how". Unfortunately that template and procedure hasn't been exercised much yet, only in NEP 38 [7] and partially in NEP 41 [8].
>
> If we have long-time maintainers of SciPy (Ilhan and myself), scikit-image (Juan) and CuPy (Leo, on the PR review) all saying they don't understand the goals, relevance, target audience, or how they're supposed to use a new feature, that indicates that the people doing the writing and having the discussion are doing something wrong at a very fundamental level.

I'm more than happy to edit the NEP and try to clarify all the
concerns. However, it gets pretty difficult to do so when I as an
author don't understand where the difficulty is. Ilhan, Juan and Ralf
now pointed out things that are missing/unclear, but no comment was
made in that regard when I sent the NEP, my point being: I couldn't
fix what I didn't know was a problem to others.

> At this point I'm pretty disappointed in and tired of how we write and discuss NEPs on technical topics like dispatching, dtypes and the like. People literally refuse to write down concrete motivations, goals and non-goals, code that's problematic now and will be better/working post-NEP and usage examples before launching into extensive discussion of the gory details of the internals. I'm not sure what to do about it.

Honestly, I don't really understand this. From my perspective, there
are two ways to deal with such things:

1. Templates are to be taken mainly as _guidelines_ rather than
_hardlines_, and the current text of NEP-35 definitely falls in the
first category;
2. Templates are _hardlines_ and to be guided/enforced by maintainers
at some point (maybe before merging the PR?).

If 2 is the desired case for NumPy, which sounds a lot like what is
wanted from NEP-35 and other NEPs generally, maintainers should let
the authors know as early as possible that something isn't following
the template's hardlines and it should be corrected. I don't mean any
of this to remove myself of any responsibility, but would like to
express my frustration that a 10 month-old NEP is only now getting so
much pushback for being unclear after its implementation is nearing
completion.

> I want to make an exception for merging the current NEP, for which the plan is to merge it as experimental to try in downstream PRs and get more experience. That does mean that master will be in an unreleasable state by the way, which is unusual and it'd be nice to get Chuck's explicit OK for that.

I don't quite understand this either, why would that leave master in
an unreleasable state?

Best,
Peter

On Thu, Aug 13, 2020 at 2:21 PM Ralf Gommers <[hidden email]> wrote:

>
> Thanks for raising these concerns Ilhan and Juan, and for answering Peter. Let me give my perspective as well.
>
> To start with, this is not specifically about Peter's NEP and PR. NEP 35 simply follows the pattern set by previous PRs, and given its tight scope is less difficult to understand than other NEPs on such technical topics. Peter has done a lot of things right, and is close to the finish line.
>
>
> On Thu, Aug 13, 2020 at 12:02 PM Peter Andreas Entschev <[hidden email]> wrote:
>>
>>
>> > I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
>>
>> This is what I intended to do in the Usage Guidance [2] section. Could
>> you elaborate on what more information you'd want to see there? Or is
>> it just a matter of reorganizing the NEP a bit to try and summarize
>> such things right at the top?
>
>
> We adapted the NEP template [6] several times last year to try and improve this. And specified in there as well that NEP content set to the mailing list should only contain the sections: Abstract, Motivation and Scope, Usage and Impact, and Backwards compatibility. This to ensure we fully understand the "why" and "what" before the "how". Unfortunately that template and procedure hasn't been exercised much yet, only in NEP 38 [7] and partially in NEP 41 [8].
>
> If we have long-time maintainers of SciPy (Ilhan and myself), scikit-image (Juan) and CuPy (Leo, on the PR review) all saying they don't understand the goals, relevance, target audience, or how they're supposed to use a new feature, that indicates that the people doing the writing and having the discussion are doing something wrong at a very fundamental level.
>
> At this point I'm pretty disappointed in and tired of how we write and discuss NEPs on technical topics like dispatching, dtypes and the like. People literally refuse to write down concrete motivations, goals and non-goals, code that's problematic now and will be better/working post-NEP and usage examples before launching into extensive discussion of the gory details of the internals. I'm not sure what to do about it. Completely separate API and behavior proposals from implementation proposals? Make separate "API" and "internals" teams with the likes of Juan, Ilhan and Leo on the API team which then needs to approve every API change in new NEPs? Offer to co-write NEPs if someone is willing but doesn't understand how to go about it? Keep the current structure/process but veto further approvals until NEP authors get it right?
>
> I want to make an exception for merging the current NEP, for which the plan is to merge it as experimental to try in downstream PRs and get more experience. That does mean that master will be in an unreleasable state by the way, which is unusual and it'd be nice to get Chuck's explicit OK for that. But after that, I think we need a change here. I would like to hear what everyone thinks is the shape that change should take - any of my above suggestions, or something else?
>
>
>>
>> > Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
>>
>> TBH, I don't really know how to solve that point, so if you have any
>> specific suggestions, that's certainly welcome. I understand the
>> frustration for a reader trying to understand all the details, with
>> many being only described in NEP-18 [3], but we also strive to avoid
>> rewriting things that are written elsewhere, which would also
>> overburden those who are aware of what's being discussed.
>>
>>
>> > I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
>
>
> Some variant of this proposal would be my preference.
>
> Cheers,
> Ralf
>
>>
>> [1] https://github.com/numpy/numpy/issues/14441#issuecomment-529969572
>> [2] https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function.html#usage-guidance
>> [3] https://numpy.org/neps/nep-0018-array-function-protocol.html
>> [4] https://numpy.org/neps/nep-0000.html#nep-workflow
>> [5] https://mail.python.org/pipermail/numpy-discussion/2019-October/080176.html
>
>
> [6] https://github.com/numpy/numpy/blob/master/doc/neps/nep-template.rst
> [7] https://github.com/numpy/numpy/blob/master/doc/neps/nep-0038-SIMD-optimizations.rst
> [8] https://github.com/numpy/numpy/blob/master/doc/neps/nep-0041-improved-dtype-support.rst
>
>
>>
>>
>>
>> On Thu, Aug 13, 2020 at 3:44 AM Juan Nunez-Iglesias <[hidden email]> wrote:
>> >
>> > I’ve generally been on the “let the NumPy devs worry about it” side of things, but I do agree with Ilhan that `like=` is confusing and `typeof=` would be a much more appropriate name for that parameter.
>> >
>> > I do think library writers are NumPy users and so I wouldn’t really make that distinction, though. Users writing their own analysis code could very well be interested in writing code using numpy functions that will transparently work when the input is a CuPy array or whatever.
>> >
>> > I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
>> >
>> > Food for thought.
>> >
>> > Juan.
>> >
>> > On 13 Aug 2020, at 9:24 am, Ilhan Polat <[hidden email]> wrote:
>> >
>> > For what is worth, as a potential consumer in SciPy, it really doesn't say anything (both in NEP and the PR) about how the regular users of NumPy will benefit from this. If only and only 3rd parties are going to benefit from it, I am not sure adding a new keyword to an already confusing function is the right thing to do.
>> >
>> > Let me clarify,
>> >
>> > - This is already a very (I mean extremely very) easy keyword name to confuse with ones_like, zeros_like and by its nature any other interpretation. It is not signalling anything about the functionality that is being discussed. I would seriously consider reserving such obvious names for really obvious tasks. Because you would also expect the shape and ndim would be mimicked by the "like"d argument but it turns out it is acting more like "typeof=" and not "like=" at all. Because if we follow the semantics it reads as "make your argument asarray like the other thing" but it is actually doing, "make your argument an array with the other thing's type" which might not be an array after all.
>> >
>> > - Again, if this is meant for downstream libraries (because that's what I got out of the PR discussion, cupy, dask, and JAX were the only examples I could read) then hiding it in another function and writing with capital letters "this is not meant for numpy users" would be a much more convenient way to separate the target audience and regular users. numpy.astypedarray([[some data], [...]], type_of=x) or whatever else it may be would be quite clean and to the point with no ambiguous keywords.
>> >
>> > I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
>> >
>> > Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Experimental `like=` attribute for array creation functions

ralfgommers


On Thu, Aug 13, 2020 at 2:47 PM Peter Andreas Entschev <[hidden email]> wrote:
> We adapted the NEP template [6] several times last year to try and improve this. And specified in there as well that NEP content set to the mailing list should only contain the sections: Abstract, Motivation and Scope, Usage and Impact, and Backwards compatibility. This to ensure we fully understand the "why" and "what" before the "how". Unfortunately that template and procedure hasn't been exercised much yet, only in NEP 38 [7] and partially in NEP 41 [8].
>
> If we have long-time maintainers of SciPy (Ilhan and myself), scikit-image (Juan) and CuPy (Leo, on the PR review) all saying they don't understand the goals, relevance, target audience, or how they're supposed to use a new feature, that indicates that the people doing the writing and having the discussion are doing something wrong at a very fundamental level.

I'm more than happy to edit the NEP and try to clarify all the
concerns.

Thanks Peter. Let me reiterate, you did a lot of things right, have been happy to adapt when given feedback, and your willingness to go back and fix things up now is much appreciated (and I'm happy to help). No criticism of your work or attitude intended, on the contract.
 
However, it gets pretty difficult to do so when I as an
author don't understand where the difficulty is. Ilhan, Juan and Ralf
now pointed out things that are missing/unclear, but no comment was
made in that regard when I sent the NEP, my point being: I couldn't
fix what I didn't know was a problem to others.

Yes of course, I totally understand that.


> At this point I'm pretty disappointed in and tired of how we write and discuss NEPs on technical topics like dispatching, dtypes and the like. People literally refuse to write down concrete motivations, goals and non-goals, code that's problematic now and will be better/working post-NEP and usage examples before launching into extensive discussion of the gory details of the internals. I'm not sure what to do about it.

Honestly, I don't really understand this. From my perspective, there
are two ways to deal with such things:

1. Templates are to be taken mainly as _guidelines_ rather than
_hardlines_, and the current text of NEP-35 definitely falls in the
first category;
2. Templates are _hardlines_ and to be guided/enforced by maintainers
at some point (maybe before merging the PR?).

If 2 is the desired case for NumPy, which sounds a lot like what is
wanted from NEP-35 and other NEPs generally, maintainers should let
the authors know as early as possible that something isn't following
the template's hardlines and it should be corrected.

Yes agreed, maintainers should do this. It was always meant as something in between, "please follow but deviate if needed". If essential elements are missing, I think that should be flagged earlier going forward.

As a concrete example: Stephan (the main author of __array_function__) was still fuzzy on the functions covered and whether it solves array coercion, in the last 24 hours*. You answered by pointing to concrete code in Dask and Xarray. That code, why it doesn't work well now but will work with like=, should be at the top of the NEP as concrete problem statement / code examples. It's quite unfortunate that no maintainer explicitly requested this many months ago.


I don't mean any of this to remove myself of any responsibility, but would like to
express my frustration that a 10 month-old NEP is only now getting so
much pushback for being unclear after its implementation is nearing
completion.

Totally understandable. I think part of the problem is that people only weigh in when they see concrete "this part is for you, and here's how you use it to solve problem X".

As for me personally, if I'm saying things now that I didn't manage to respond to earlier (specific to your NEP), I apologize. 10 months ago I was in the middle of an intercontinental move and a new-ish job getting busier fast. Again, apologies and no criticism of your work.
 

> I want to make an exception for merging the current NEP, for which the plan is to merge it as experimental to try in downstream PRs and get more experience. That does mean that master will be in an unreleasable state by the way, which is unusual and it'd be nice to get Chuck's explicit OK for that.

I don't quite understand this either, why would that leave master in
an unreleasable state?

That's what Sebastian proposed yesterday: let's merge right now, open issues for all the things being brought up right now, and deal with them pre-1.20-release. I'm saying I'm fine with that, but then we actually need to go back and finalize the discussions before the next release.

Cheers,
Ralf





Best,
Peter

On Thu, Aug 13, 2020 at 2:21 PM Ralf Gommers <[hidden email]> wrote:
>
> Thanks for raising these concerns Ilhan and Juan, and for answering Peter. Let me give my perspective as well.
>
> To start with, this is not specifically about Peter's NEP and PR. NEP 35 simply follows the pattern set by previous PRs, and given its tight scope is less difficult to understand than other NEPs on such technical topics. Peter has done a lot of things right, and is close to the finish line.
>
>
> On Thu, Aug 13, 2020 at 12:02 PM Peter Andreas Entschev <[hidden email]> wrote:
>>
>>
>> > I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
>>
>> This is what I intended to do in the Usage Guidance [2] section. Could
>> you elaborate on what more information you'd want to see there? Or is
>> it just a matter of reorganizing the NEP a bit to try and summarize
>> such things right at the top?
>
>
> We adapted the NEP template [6] several times last year to try and improve this. And specified in there as well that NEP content set to the mailing list should only contain the sections: Abstract, Motivation and Scope, Usage and Impact, and Backwards compatibility. This to ensure we fully understand the "why" and "what" before the "how". Unfortunately that template and procedure hasn't been exercised much yet, only in NEP 38 [7] and partially in NEP 41 [8].
>
> If we have long-time maintainers of SciPy (Ilhan and myself), scikit-image (Juan) and CuPy (Leo, on the PR review) all saying they don't understand the goals, relevance, target audience, or how they're supposed to use a new feature, that indicates that the people doing the writing and having the discussion are doing something wrong at a very fundamental level.
>
> At this point I'm pretty disappointed in and tired of how we write and discuss NEPs on technical topics like dispatching, dtypes and the like. People literally refuse to write down concrete motivations, goals and non-goals, code that's problematic now and will be better/working post-NEP and usage examples before launching into extensive discussion of the gory details of the internals. I'm not sure what to do about it. Completely separate API and behavior proposals from implementation proposals? Make separate "API" and "internals" teams with the likes of Juan, Ilhan and Leo on the API team which then needs to approve every API change in new NEPs? Offer to co-write NEPs if someone is willing but doesn't understand how to go about it? Keep the current structure/process but veto further approvals until NEP authors get it right?
>
> I want to make an exception for merging the current NEP, for which the plan is to merge it as experimental to try in downstream PRs and get more experience. That does mean that master will be in an unreleasable state by the way, which is unusual and it'd be nice to get Chuck's explicit OK for that. But after that, I think we need a change here. I would like to hear what everyone thinks is the shape that change should take - any of my above suggestions, or something else?
>
>
>>
>> > Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
>>
>> TBH, I don't really know how to solve that point, so if you have any
>> specific suggestions, that's certainly welcome. I understand the
>> frustration for a reader trying to understand all the details, with
>> many being only described in NEP-18 [3], but we also strive to avoid
>> rewriting things that are written elsewhere, which would also
>> overburden those who are aware of what's being discussed.
>>
>>
>> > I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
>
>
> Some variant of this proposal would be my preference.
>
> Cheers,
> Ralf
>
>>
>> [1] https://github.com/numpy/numpy/issues/14441#issuecomment-529969572
>> [2] https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function.html#usage-guidance
>> [3] https://numpy.org/neps/nep-0018-array-function-protocol.html
>> [4] https://numpy.org/neps/nep-0000.html#nep-workflow
>> [5] https://mail.python.org/pipermail/numpy-discussion/2019-October/080176.html
>
>
> [6] https://github.com/numpy/numpy/blob/master/doc/neps/nep-template.rst
> [7] https://github.com/numpy/numpy/blob/master/doc/neps/nep-0038-SIMD-optimizations.rst
> [8] https://github.com/numpy/numpy/blob/master/doc/neps/nep-0041-improved-dtype-support.rst
>
>
>>
>>
>>
>> On Thu, Aug 13, 2020 at 3:44 AM Juan Nunez-Iglesias <[hidden email]> wrote:
>> >
>> > I’ve generally been on the “let the NumPy devs worry about it” side of things, but I do agree with Ilhan that `like=` is confusing and `typeof=` would be a much more appropriate name for that parameter.
>> >
>> > I do think library writers are NumPy users and so I wouldn’t really make that distinction, though. Users writing their own analysis code could very well be interested in writing code using numpy functions that will transparently work when the input is a CuPy array or whatever.
>> >
>> > I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
>> >
>> > Food for thought.
>> >
>> > Juan.
>> >
>> > On 13 Aug 2020, at 9:24 am, Ilhan Polat <[hidden email]> wrote:
>> >
>> > For what is worth, as a potential consumer in SciPy, it really doesn't say anything (both in NEP and the PR) about how the regular users of NumPy will benefit from this. If only and only 3rd parties are going to benefit from it, I am not sure adding a new keyword to an already confusing function is the right thing to do.
>> >
>> > Let me clarify,
>> >
>> > - This is already a very (I mean extremely very) easy keyword name to confuse with ones_like, zeros_like and by its nature any other interpretation. It is not signalling anything about the functionality that is being discussed. I would seriously consider reserving such obvious names for really obvious tasks. Because you would also expect the shape and ndim would be mimicked by the "like"d argument but it turns out it is acting more like "typeof=" and not "like=" at all. Because if we follow the semantics it reads as "make your argument asarray like the other thing" but it is actually doing, "make your argument an array with the other thing's type" which might not be an array after all.
>> >
>> > - Again, if this is meant for downstream libraries (because that's what I got out of the PR discussion, cupy, dask, and JAX were the only examples I could read) then hiding it in another function and writing with capital letters "this is not meant for numpy users" would be a much more convenient way to separate the target audience and regular users. numpy.astypedarray([[some data], [...]], type_of=x) or whatever else it may be would be quite clean and to the point with no ambiguous keywords.
>> >
>> > I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
>> >
>> > Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Experimental `like=` attribute for array creation functions

Peter Andreas Entschev
In reply to this post by Ilhan Polat
Ilhan,

Thanks, that does clarify things.

I think the main point -- and correct me here if I'm still wrong -- is
that we want the NEP to have some very clear example of when/why/how
to use it, preferably as early in the text as possible, maybe just
below the Abstract, in a Motivation and Scope section, as the NEP
Template [6] pointed out to by Ralf earlier suggests. That is a
totally valid ask, and I'll try to address it as soon as possible
(hopefully today or tomorrow).

To the point of whether NEPs are to be read by users, I normally don't
expect users to be required to read and understand those NEPs other
than by pure curiosity. If we need them to do so, then there's
definitely a big problem in the API. This may sound counterintuitive
with what I said before about the "like=" name, but that's really the
piece of the NumPy API that I with a somewhat reasonable understand of
arrays don't quite get or like, for instance "asarray" and "like"
sound exactly the same thing, but they're not in the NumPy context,
and on the other hand it's quite difficult to find a reasonable name
to clarify that. And once more, I do like the "typeof=" suggestion
more than "like=" to be perfectly honest, I'm just afraid it could be
mistaken by the "dtype=" keyword somehow and thus still not solve the
clarity problem. Going back to users reading NEPs or not, I would
really expect that the docstring from the function is sufficiently
clear to keep users off of it, but still give them an understanding of
why that exists, the current docstring is in [9], please do comment on
it if you have ideas of how to make it more accessible to users.

You also mentioned you'd like that the name is as esoteric as
possible, do you have any suggestions for an esoteric name that is
hopefully unambiguous too? Naming has definitely been very much on the
table since the NEP was written, but the consensus was more that
"like=" is reasonably similar enough in both application and the name
itself to "empty_like" and derived functions, that's why we just stuck
to it.

Best,
Peter

[9] https://github.com/numpy/numpy/pull/16935/files#diff-e5969453e399f2d32519d305b2582da9R16-R22

On Thu, Aug 13, 2020 at 3:43 PM Ilhan Polat <[hidden email]> wrote:

>
> To maybe lighten up the discussion a bit and to make my outsider confusion more tangible, let me start by apologizing for diving head first without weighing the past luggage :-) I always forget how much effort goes into these things and for outsiders like me, it's a matter of dipping the finger and tasting it just before starting to complain how much salt is missing etc. What I was mentioning about NEPs wasn't only related specifically to this one by the way. It's the generic feeling that I have.
>
> First let me start what I mean by NumPy users and downstreamers distinction. This is very much related to how data-science and huge-array users are magnetizing every tool out there in the Python world which is fine though the majority of number-crunchers have nothing to do with any of GPU/Parallelism/ClusterUsage etc. Hence when I mention NumPy users, think of people who use NumPy as its own right with no duck-typing and nothing related to subclassing. Just straightforward array creation and lots of ops on these arrays. For those people (I'm one of them), this option brings in a keyword that we would never use. And it gets into many major functions (linspace and others mentioned somewhere). So it has a very appealing name but has nothing to do with me in an already very crowded namespace and keyword catalogue. That's basically a UX issue to be addressed (under the assumption that users like me are the majority). Either making its name as esoteric as possible so I naturally stay away from it or I don't see it. This has absolutely nothing to do with looking down on the downstream libraries. They are flat-out amazing and the more we can support them the merrier.
>
> Using yet another metaphor, I was hoping that NumPy would have a loading dock for heavy duty deliveries for downstream projects or specialized array creations and won't disturb the regular customer entrance. Because if I look at this page https://numpy.org/doc/stable/referenc/routines.array-creation.html, there are a lot of functions and I think most of them are candidates to gain this keyword.  I wish I can comment on a viable alternative but I really cannot understand the _array_xxxx_ discussions since they fly way over my head no matter how many times I tried. So that's why I naively mentioned the "np.astypedarray" or "np.asarray_but_not_numpy_array" or whatever. Now I see that it is even more complicated and I generated extra noise. So you can just ignore my previous suggestions. Except that I want to draw attention to the UX problem and I'd like to leave it at that.
>
> The other point is about the NEP stuff. I think I need to elaborate. If the NEPs are meant for internal NumPy discussions, then by all means, crank up the pointer*-meter to 11 and dive into it, totally fine with me. But if you also want to get feedback from outside, then probably a few lines of code examples for mere mortals would go a long way. Also it would make the discussion much more streamlined in my humble opinion. What I was trying to get at was that almost all NEPs read like a legal document that I want to agree as soon as possible. Because they often come without any or minimal amount of code in it. In NEP35 for example, there are nice code blocks in function dispatching but I guess it's not meant for me. Because it is only decorating asarray with some black magic happening there somehow (I guess). So I can't even comprehend what the proposition would mean for the regular, friendly, anti-duck users. But I am pretty sure it is about dispatching something because the word is repeated ~20 times :-)  Thus the feedback would be limited. That was also what I meant there. But again I totally understand the complexity of these issues. So I'm not expecting to understand all details of NumPy machinery in a single NEP.
>
> But anyways, hope this clarifies a few things that I failed to convey in my previous mail.
> ilhan
>
>
>
> On Thu, Aug 13, 2020 at 2:23 PM Ralf Gommers <[hidden email]> wrote:
>>
>> Thanks for raising these concerns Ilhan and Juan, and for answering Peter. Let me give my perspective as well.
>>
>> To start with, this is not specifically about Peter's NEP and PR. NEP 35 simply follows the pattern set by previous PRs, and given its tight scope is less difficult to understand than other NEPs on such technical topics. Peter has done a lot of things right, and is close to the finish line.
>>
>>
>> On Thu, Aug 13, 2020 at 12:02 PM Peter Andreas Entschev <[hidden email]> wrote:
>>>
>>>
>>> > I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
>>>
>>> This is what I intended to do in the Usage Guidance [2] section. Could
>>> you elaborate on what more information you'd want to see there? Or is
>>> it just a matter of reorganizing the NEP a bit to try and summarize
>>> such things right at the top?
>>
>>
>> We adapted the NEP template [6] several times last year to try and improve this. And specified in there as well that NEP content set to the mailing list should only contain the sections: Abstract, Motivation and Scope, Usage and Impact, and Backwards compatibility. This to ensure we fully understand the "why" and "what" before the "how". Unfortunately that template and procedure hasn't been exercised much yet, only in NEP 38 [7] and partially in NEP 41 [8].
>>
>> If we have long-time maintainers of SciPy (Ilhan and myself), scikit-image (Juan) and CuPy (Leo, on the PR review) all saying they don't understand the goals, relevance, target audience, or how they're supposed to use a new feature, that indicates that the people doing the writing and having the discussion are doing something wrong at a very fundamental level.
>>
>> At this point I'm pretty disappointed in and tired of how we write and discuss NEPs on technical topics like dispatching, dtypes and the like. People literally refuse to write down concrete motivations, goals and non-goals, code that's problematic now and will be better/working post-NEP and usage examples before launching into extensive discussion of the gory details of the internals. I'm not sure what to do about it. Completely separate API and behavior proposals from implementation proposals? Make separate "API" and "internals" teams with the likes of Juan, Ilhan and Leo on the API team which then needs to approve every API change in new NEPs? Offer to co-write NEPs if someone is willing but doesn't understand how to go about it? Keep the current structure/process but veto further approvals until NEP authors get it right?
>>
>> I want to make an exception for merging the current NEP, for which the plan is to merge it as experimental to try in downstream PRs and get more experience. That does mean that master will be in an unreleasable state by the way, which is unusual and it'd be nice to get Chuck's explicit OK for that. But after that, I think we need a change here. I would like to hear what everyone thinks is the shape that change should take - any of my above suggestions, or something else?
>>
>>
>>>
>>> > Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
>>>
>>> TBH, I don't really know how to solve that point, so if you have any
>>> specific suggestions, that's certainly welcome. I understand the
>>> frustration for a reader trying to understand all the details, with
>>> many being only described in NEP-18 [3], but we also strive to avoid
>>> rewriting things that are written elsewhere, which would also
>>> overburden those who are aware of what's being discussed.
>>>
>>>
>>> > I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
>>
>>
>> Some variant of this proposal would be my preference.
>>
>> Cheers,
>> Ralf
>>
>>>
>>> [1] https://github.com/numpy/numpy/issues/14441#issuecomment-529969572
>>> [2] https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function.html#usage-guidance
>>> [3] https://numpy.org/neps/nep-0018-array-function-protocol.html
>>> [4] https://numpy.org/neps/nep-0000.html#nep-workflow
>>> [5] https://mail.python.org/pipermail/numpy-discussion/2019-October/080176.html
>>
>>
>> [6] https://github.com/numpy/numpy/blob/master/doc/neps/nep-template.rst
>> [7] https://github.com/numpy/numpy/blob/master/doc/neps/nep-0038-SIMD-optimizations.rst
>> [8] https://github.com/numpy/numpy/blob/master/doc/neps/nep-0041-improved-dtype-support.rst
>>
>>
>>>
>>>
>>>
>>> On Thu, Aug 13, 2020 at 3:44 AM Juan Nunez-Iglesias <[hidden email]> wrote:
>>> >
>>> > I’ve generally been on the “let the NumPy devs worry about it” side of things, but I do agree with Ilhan that `like=` is confusing and `typeof=` would be a much more appropriate name for that parameter.
>>> >
>>> > I do think library writers are NumPy users and so I wouldn’t really make that distinction, though. Users writing their own analysis code could very well be interested in writing code using numpy functions that will transparently work when the input is a CuPy array or whatever.
>>> >
>>> > I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
>>> >
>>> > Food for thought.
>>> >
>>> > Juan.
>>> >
>>> > On 13 Aug 2020, at 9:24 am, Ilhan Polat <[hidden email]> wrote:
>>> >
>>> > For what is worth, as a potential consumer in SciPy, it really doesn't say anything (both in NEP and the PR) about how the regular users of NumPy will benefit from this. If only and only 3rd parties are going to benefit from it, I am not sure adding a new keyword to an already confusing function is the right thing to do.
>>> >
>>> > Let me clarify,
>>> >
>>> > - This is already a very (I mean extremely very) easy keyword name to confuse with ones_like, zeros_like and by its nature any other interpretation. It is not signalling anything about the functionality that is being discussed. I would seriously consider reserving such obvious names for really obvious tasks. Because you would also expect the shape and ndim would be mimicked by the "like"d argument but it turns out it is acting more like "typeof=" and not "like=" at all. Because if we follow the semantics it reads as "make your argument asarray like the other thing" but it is actually doing, "make your argument an array with the other thing's type" which might not be an array after all.
>>> >
>>> > - Again, if this is meant for downstream libraries (because that's what I got out of the PR discussion, cupy, dask, and JAX were the only examples I could read) then hiding it in another function and writing with capital letters "this is not meant for numpy users" would be a much more convenient way to separate the target audience and regular users. numpy.astypedarray([[some data], [...]], type_of=x) or whatever else it may be would be quite clean and to the point with no ambiguous keywords.
>>> >
>>> > I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
>>> >
>>> > Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> [hidden email]
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Experimental `like=` attribute for array creation functions

Peter Andreas Entschev
In reply to this post by ralfgommers
Ralf,

I know none of it is a criticism of my work or directly of anybody
else's work. I was just making a couple of general points (or
questions really):

1. What is accepted as a reasonably clear NEP? It seems to point that
a NEP _must_ follow the Template
2. Should the NEP Template be followed as a hardline? Personally, I
think that would be fine in general, and diverging seems to be only an
option of when additional information is necessary, but less should
not be acceptable.

And to be perfectly clear, none of what I said is a criticism to
anybody in particular, but it's a frustration about the process
seemingly not clear in itself for either authors or maintainers, thus
my two points above. I apologize if any of what I said so far has been
taken as a personal criticism to someone, it was definitely not meant
that way.

Finally, I like Juan's previous suggestion that someone not involved
in the discussion proof-reading would be a great idea, I'm not sure if
that's achievable in practice though. However, I think that discussion
is a bit out of context, so I'll try to address the unclear parts of
this NEP in a PR and we could continue the general discussion of the
NEP process in a different thread if people wish to do so.

Best,
Peter

On Thu, Aug 13, 2020 at 4:13 PM Ralf Gommers <[hidden email]> wrote:

>
>
>
> On Thu, Aug 13, 2020 at 2:47 PM Peter Andreas Entschev <[hidden email]> wrote:
>>
>> > We adapted the NEP template [6] several times last year to try and improve this. And specified in there as well that NEP content set to the mailing list should only contain the sections: Abstract, Motivation and Scope, Usage and Impact, and Backwards compatibility. This to ensure we fully understand the "why" and "what" before the "how". Unfortunately that template and procedure hasn't been exercised much yet, only in NEP 38 [7] and partially in NEP 41 [8].
>> >
>> > If we have long-time maintainers of SciPy (Ilhan and myself), scikit-image (Juan) and CuPy (Leo, on the PR review) all saying they don't understand the goals, relevance, target audience, or how they're supposed to use a new feature, that indicates that the people doing the writing and having the discussion are doing something wrong at a very fundamental level.
>>
>> I'm more than happy to edit the NEP and try to clarify all the
>> concerns.
>
>
> Thanks Peter. Let me reiterate, you did a lot of things right, have been happy to adapt when given feedback, and your willingness to go back and fix things up now is much appreciated (and I'm happy to help). No criticism of your work or attitude intended, on the contract.
>
>>
>> However, it gets pretty difficult to do so when I as an
>> author don't understand where the difficulty is. Ilhan, Juan and Ralf
>> now pointed out things that are missing/unclear, but no comment was
>> made in that regard when I sent the NEP, my point being: I couldn't
>> fix what I didn't know was a problem to others.
>
>
> Yes of course, I totally understand that.
>
>>
>> > At this point I'm pretty disappointed in and tired of how we write and discuss NEPs on technical topics like dispatching, dtypes and the like. People literally refuse to write down concrete motivations, goals and non-goals, code that's problematic now and will be better/working post-NEP and usage examples before launching into extensive discussion of the gory details of the internals. I'm not sure what to do about it.
>>
>> Honestly, I don't really understand this. From my perspective, there
>> are two ways to deal with such things:
>>
>> 1. Templates are to be taken mainly as _guidelines_ rather than
>> _hardlines_, and the current text of NEP-35 definitely falls in the
>> first category;
>> 2. Templates are _hardlines_ and to be guided/enforced by maintainers
>> at some point (maybe before merging the PR?).
>>
>> If 2 is the desired case for NumPy, which sounds a lot like what is
>> wanted from NEP-35 and other NEPs generally, maintainers should let
>> the authors know as early as possible that something isn't following
>> the template's hardlines and it should be corrected.
>
>
> Yes agreed, maintainers should do this. It was always meant as something in between, "please follow but deviate if needed". If essential elements are missing, I think that should be flagged earlier going forward.
>
> As a concrete example: Stephan (the main author of __array_function__) was still fuzzy on the functions covered and whether it solves array coercion, in the last 24 hours*. You answered by pointing to concrete code in Dask and Xarray. That code, why it doesn't work well now but will work with like=, should be at the top of the NEP as concrete problem statement / code examples. It's quite unfortunate that no maintainer explicitly requested this many months ago.
>
> * https://github.com/numpy/numpy/pull/16935#issuecomment-673379038
>
>> I don't mean any of this to remove myself of any responsibility, but would like to
>> express my frustration that a 10 month-old NEP is only now getting so
>> much pushback for being unclear after its implementation is nearing
>> completion.
>
>
> Totally understandable. I think part of the problem is that people only weigh in when they see concrete "this part is for you, and here's how you use it to solve problem X".
>
> As for me personally, if I'm saying things now that I didn't manage to respond to earlier (specific to your NEP), I apologize. 10 months ago I was in the middle of an intercontinental move and a new-ish job getting busier fast. Again, apologies and no criticism of your work.
>
>>
>>
>> > I want to make an exception for merging the current NEP, for which the plan is to merge it as experimental to try in downstream PRs and get more experience. That does mean that master will be in an unreleasable state by the way, which is unusual and it'd be nice to get Chuck's explicit OK for that.
>>
>> I don't quite understand this either, why would that leave master in
>> an unreleasable state?
>
>
> That's what Sebastian proposed yesterday: let's merge right now, open issues for all the things being brought up right now, and deal with them pre-1.20-release. I'm saying I'm fine with that, but then we actually need to go back and finalize the discussions before the next release.
>
> Cheers,
> Ralf
>
>
>
>
>>
>> Best,
>> Peter
>>
>> On Thu, Aug 13, 2020 at 2:21 PM Ralf Gommers <[hidden email]> wrote:
>> >
>> > Thanks for raising these concerns Ilhan and Juan, and for answering Peter. Let me give my perspective as well.
>> >
>> > To start with, this is not specifically about Peter's NEP and PR. NEP 35 simply follows the pattern set by previous PRs, and given its tight scope is less difficult to understand than other NEPs on such technical topics. Peter has done a lot of things right, and is close to the finish line.
>> >
>> >
>> > On Thu, Aug 13, 2020 at 12:02 PM Peter Andreas Entschev <[hidden email]> wrote:
>> >>
>> >>
>> >> > I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
>> >>
>> >> This is what I intended to do in the Usage Guidance [2] section. Could
>> >> you elaborate on what more information you'd want to see there? Or is
>> >> it just a matter of reorganizing the NEP a bit to try and summarize
>> >> such things right at the top?
>> >
>> >
>> > We adapted the NEP template [6] several times last year to try and improve this. And specified in there as well that NEP content set to the mailing list should only contain the sections: Abstract, Motivation and Scope, Usage and Impact, and Backwards compatibility. This to ensure we fully understand the "why" and "what" before the "how". Unfortunately that template and procedure hasn't been exercised much yet, only in NEP 38 [7] and partially in NEP 41 [8].
>> >
>> > If we have long-time maintainers of SciPy (Ilhan and myself), scikit-image (Juan) and CuPy (Leo, on the PR review) all saying they don't understand the goals, relevance, target audience, or how they're supposed to use a new feature, that indicates that the people doing the writing and having the discussion are doing something wrong at a very fundamental level.
>> >
>> > At this point I'm pretty disappointed in and tired of how we write and discuss NEPs on technical topics like dispatching, dtypes and the like. People literally refuse to write down concrete motivations, goals and non-goals, code that's problematic now and will be better/working post-NEP and usage examples before launching into extensive discussion of the gory details of the internals. I'm not sure what to do about it. Completely separate API and behavior proposals from implementation proposals? Make separate "API" and "internals" teams with the likes of Juan, Ilhan and Leo on the API team which then needs to approve every API change in new NEPs? Offer to co-write NEPs if someone is willing but doesn't understand how to go about it? Keep the current structure/process but veto further approvals until NEP authors get it right?
>> >
>> > I want to make an exception for merging the current NEP, for which the plan is to merge it as experimental to try in downstream PRs and get more experience. That does mean that master will be in an unreleasable state by the way, which is unusual and it'd be nice to get Chuck's explicit OK for that. But after that, I think we need a change here. I would like to hear what everyone thinks is the shape that change should take - any of my above suggestions, or something else?
>> >
>> >
>> >>
>> >> > Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
>> >>
>> >> TBH, I don't really know how to solve that point, so if you have any
>> >> specific suggestions, that's certainly welcome. I understand the
>> >> frustration for a reader trying to understand all the details, with
>> >> many being only described in NEP-18 [3], but we also strive to avoid
>> >> rewriting things that are written elsewhere, which would also
>> >> overburden those who are aware of what's being discussed.
>> >>
>> >>
>> >> > I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
>> >
>> >
>> > Some variant of this proposal would be my preference.
>> >
>> > Cheers,
>> > Ralf
>> >
>> >>
>> >> [1] https://github.com/numpy/numpy/issues/14441#issuecomment-529969572
>> >> [2] https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function.html#usage-guidance
>> >> [3] https://numpy.org/neps/nep-0018-array-function-protocol.html
>> >> [4] https://numpy.org/neps/nep-0000.html#nep-workflow
>> >> [5] https://mail.python.org/pipermail/numpy-discussion/2019-October/080176.html
>> >
>> >
>> > [6] https://github.com/numpy/numpy/blob/master/doc/neps/nep-template.rst
>> > [7] https://github.com/numpy/numpy/blob/master/doc/neps/nep-0038-SIMD-optimizations.rst
>> > [8] https://github.com/numpy/numpy/blob/master/doc/neps/nep-0041-improved-dtype-support.rst
>> >
>> >
>> >>
>> >>
>> >>
>> >> On Thu, Aug 13, 2020 at 3:44 AM Juan Nunez-Iglesias <[hidden email]> wrote:
>> >> >
>> >> > I’ve generally been on the “let the NumPy devs worry about it” side of things, but I do agree with Ilhan that `like=` is confusing and `typeof=` would be a much more appropriate name for that parameter.
>> >> >
>> >> > I do think library writers are NumPy users and so I wouldn’t really make that distinction, though. Users writing their own analysis code could very well be interested in writing code using numpy functions that will transparently work when the input is a CuPy array or whatever.
>> >> >
>> >> > I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
>> >> >
>> >> > Food for thought.
>> >> >
>> >> > Juan.
>> >> >
>> >> > On 13 Aug 2020, at 9:24 am, Ilhan Polat <[hidden email]> wrote:
>> >> >
>> >> > For what is worth, as a potential consumer in SciPy, it really doesn't say anything (both in NEP and the PR) about how the regular users of NumPy will benefit from this. If only and only 3rd parties are going to benefit from it, I am not sure adding a new keyword to an already confusing function is the right thing to do.
>> >> >
>> >> > Let me clarify,
>> >> >
>> >> > - This is already a very (I mean extremely very) easy keyword name to confuse with ones_like, zeros_like and by its nature any other interpretation. It is not signalling anything about the functionality that is being discussed. I would seriously consider reserving such obvious names for really obvious tasks. Because you would also expect the shape and ndim would be mimicked by the "like"d argument but it turns out it is acting more like "typeof=" and not "like=" at all. Because if we follow the semantics it reads as "make your argument asarray like the other thing" but it is actually doing, "make your argument an array with the other thing's type" which might not be an array after all.
>> >> >
>> >> > - Again, if this is meant for downstream libraries (because that's what I got out of the PR discussion, cupy, dask, and JAX were the only examples I could read) then hiding it in another function and writing with capital letters "this is not meant for numpy users" would be a much more convenient way to separate the target audience and regular users. numpy.astypedarray([[some data], [...]], type_of=x) or whatever else it may be would be quite clean and to the point with no ambiguous keywords.
>> >> >
>> >> > I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
>> >> >
>> >> > Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
>> >
>> > _______________________________________________
>> > NumPy-Discussion mailing list
>> > [hidden email]
>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>> _______________________________________________
>> NumPy-Discussion mailing list
>> [hidden email]
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Experimental `like=` attribute for array creation functions

Sebastian Berg
In reply to this post by Peter Andreas Entschev
On Thu, 2020-08-13 at 15:47 +0200, Peter Andreas Entschev wrote:

> > We adapted the NEP template [6] several times last year to try and
> > improve this. And specified in there as well that NEP content set
> > to the mailing list should only contain the sections: Abstract,
> > Motivation and Scope, Usage and Impact, and Backwards
> > compatibility. This to ensure we fully understand the "why" and
> > "what" before the "how". Unfortunately that template and procedure
> > hasn't been exercised much yet, only in NEP 38 [7] and partially in
> > NEP 41 [8].
> >
> > If we have long-time maintainers of SciPy (Ilhan and myself),
> > scikit-image (Juan) and CuPy (Leo, on the PR review) all saying
> > they don't understand the goals, relevance, target audience, or how
> > they're supposed to use a new feature, that indicates that the
> > people doing the writing and having the discussion are doing
> > something wrong at a very fundamental level.
>
> I'm more than happy to edit the NEP and try to clarify all the
> concerns. However, it gets pretty difficult to do so when I as an
> author don't understand where the difficulty is. Ilhan, Juan and Ralf
> now pointed out things that are missing/unclear, but no comment was
> made in that regard when I sent the NEP, my point being: I couldn't
> fix what I didn't know was a problem to others.
>
> > At this point I'm pretty disappointed in and tired of how we write
> > and discuss NEPs on technical topics like dispatching, dtypes and
> > the like. People literally refuse to write down concrete
> > motivations, goals and non-goals, code that's problematic now and
> > will be better/working post-NEP and usage examples before launching
> > into extensive discussion of the gory details of the internals. I'm
> > not sure what to do about it.
>
> Honestly, I don't really understand this. From my perspective, there
> are two ways to deal with such things:
>
> 1. Templates are to be taken mainly as _guidelines_ rather than
> _hardlines_, and the current text of NEP-35 definitely falls in the
> first category;
> 2. Templates are _hardlines_ and to be guided/enforced by maintainers
> at some point (maybe before merging the PR?).
>
> If 2 is the desired case for NumPy, which sounds a lot like what is
> wanted from NEP-35 and other NEPs generally, maintainers should let
> the authors know as early as possible that something isn't following
> the template's hardlines and it should be corrected. I don't mean any
> of this to remove myself of any responsibility, but would like to
> express my frustration that a 10 month-old NEP is only now getting so
> much pushback for being unclear after its implementation is nearing
> completion.
>
> > I want to make an exception for merging the current NEP, for which
> > the plan is to merge it as experimental to try in downstream PRs
> > and get more experience. That does mean that master will be in an
> > unreleasable state by the way, which is unusual and it'd be nice to
> > get Chuck's explicit OK for that.
>
> I don't quite understand this either, why would that leave master in
> an unreleasable state?
>
Well, a few points are not discussed to the end yet. The name is one
that did not get much attention yet. Maybe because nobody had much
concerns about it yet, or maybe it was just lower on the priority list.

To be clear: I am fully prepared to pull this out of master before
release or probably rather disable it in release versions. An
alternative could be an environment variable (an env variable will not
stop actual adoption, but we may be fine with that).
And unless NEP 35 is accepted, that probably has to be the default,
fortunately there is still some time until the next release.

- Sebastian


> Best,
> Peter
>
> On Thu, Aug 13, 2020 at 2:21 PM Ralf Gommers <[hidden email]>
> wrote:
> > Thanks for raising these concerns Ilhan and Juan, and for answering
> > Peter. Let me give my perspective as well.
> >
> > To start with, this is not specifically about Peter's NEP and PR.
> > NEP 35 simply follows the pattern set by previous PRs, and given
> > its tight scope is less difficult to understand than other NEPs on
> > such technical topics. Peter has done a lot of things right, and is
> > close to the finish line.
> >
> >
> > On Thu, Aug 13, 2020 at 12:02 PM Peter Andreas Entschev <
> > [hidden email]> wrote:
> > >
> > > > I think, arriving to an agreement would be much faster if there
> > > > is an executive summary of who this is intended for and what
> > > > the regular usage is. Because with no offense, all I see is
> > > > "dispatch", "_array_function_" and a lot of technical details
> > > > of which I am absolutely ignorant.
> > >
> > > This is what I intended to do in the Usage Guidance [2] section.
> > > Could
> > > you elaborate on what more information you'd want to see there?
> > > Or is
> > > it just a matter of reorganizing the NEP a bit to try and
> > > summarize
> > > such things right at the top?
> >
> > We adapted the NEP template [6] several times last year to try and
> > improve this. And specified in there as well that NEP content set
> > to the mailing list should only contain the sections: Abstract,
> > Motivation and Scope, Usage and Impact, and Backwards
> > compatibility. This to ensure we fully understand the "why" and
> > "what" before the "how". Unfortunately that template and procedure
> > hasn't been exercised much yet, only in NEP 38 [7] and partially in
> > NEP 41 [8].
> >
> > If we have long-time maintainers of SciPy (Ilhan and myself),
> > scikit-image (Juan) and CuPy (Leo, on the PR review) all saying
> > they don't understand the goals, relevance, target audience, or how
> > they're supposed to use a new feature, that indicates that the
> > people doing the writing and having the discussion are doing
> > something wrong at a very fundamental level.
> >
> > At this point I'm pretty disappointed in and tired of how we write
> > and discuss NEPs on technical topics like dispatching, dtypes and
> > the like. People literally refuse to write down concrete
> > motivations, goals and non-goals, code that's problematic now and
> > will be better/working post-NEP and usage examples before launching
> > into extensive discussion of the gory details of the internals. I'm
> > not sure what to do about it. Completely separate API and behavior
> > proposals from implementation proposals? Make separate "API" and
> > "internals" teams with the likes of Juan, Ilhan and Leo on the API
> > team which then needs to approve every API change in new NEPs?
> > Offer to co-write NEPs if someone is willing but doesn't understand
> > how to go about it? Keep the current structure/process but veto
> > further approvals until NEP authors get it right?
> >
> > I want to make an exception for merging the current NEP, for which
> > the plan is to merge it as experimental to try in downstream PRs
> > and get more experience. That does mean that master will be in an
> > unreleasable state by the way, which is unusual and it'd be nice to
> > get Chuck's explicit OK for that. But after that, I think we need a
> > change here. I would like to hear what everyone thinks is the shape
> > that change should take - any of my above suggestions, or something
> > else?
> >
> >
> > > > Finally as a minor point, I know we are mostly (ex-)academics
> > > > but this necessity of formal language on NEPs is self-imposed
> > > > (probably PEPs are to blame) and not quite helping. It can be a
> > > > bit more descriptive in my external opinion.
> > >
> > > TBH, I don't really know how to solve that point, so if you have
> > > any
> > > specific suggestions, that's certainly welcome. I understand the
> > > frustration for a reader trying to understand all the details,
> > > with
> > > many being only described in NEP-18 [3], but we also strive to
> > > avoid
> > > rewriting things that are written elsewhere, which would also
> > > overburden those who are aware of what's being discussed.
> > >
> > >
> > > > I also share Ilhan’s concern (and I mentioned this in a
> > > > previous NEP discussion) that NEPs are getting pretty
> > > > inaccessible. In a sense these are difficult topics and readers
> > > > should be expected to have *some* familiarity with the topics
> > > > being discussed, but perhaps more effort should be put into the
> > > > context/motivation/background of a NEP before accepting it. One
> > > > way to ensure this might be to require a final proofreading
> > > > step by someone who has not been involved at all in the
> > > > discussions, like peer review does for papers.
> >
> > Some variant of this proposal would be my preference.
> >
> > Cheers,
> > Ralf
> >
> > > [1]
> > > https://github.com/numpy/numpy/issues/14441#issuecomment-529969572
> > > [2]
> > > https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function.html#usage-guidance
> > > [3] https://numpy.org/neps/nep-0018-array-function-protocol.html
> > > [4] https://numpy.org/neps/nep-0000.html#nep-workflow
> > > [5]
> > > https://mail.python.org/pipermail/numpy-discussion/2019-October/080176.html
> >
> > [6]
> > https://github.com/numpy/numpy/blob/master/doc/neps/nep-template.rst
> > [7]
> > https://github.com/numpy/numpy/blob/master/doc/neps/nep-0038-SIMD-optimizations.rst
> > [8]
> > https://github.com/numpy/numpy/blob/master/doc/neps/nep-0041-improved-dtype-support.rst
> >
> >
> > >
> > >
> > > On Thu, Aug 13, 2020 at 3:44 AM Juan Nunez-Iglesias <
> > > [hidden email]> wrote:
> > > > I’ve generally been on the “let the NumPy devs worry about it”
> > > > side of things, but I do agree with Ilhan that `like=` is
> > > > confusing and `typeof=` would be a much more appropriate name
> > > > for that parameter.
> > > >
> > > > I do think library writers are NumPy users and so I wouldn’t
> > > > really make that distinction, though. Users writing their own
> > > > analysis code could very well be interested in writing code
> > > > using numpy functions that will transparently work when the
> > > > input is a CuPy array or whatever.
> > > >
> > > > I also share Ilhan’s concern (and I mentioned this in a
> > > > previous NEP discussion) that NEPs are getting pretty
> > > > inaccessible. In a sense these are difficult topics and readers
> > > > should be expected to have *some* familiarity with the topics
> > > > being discussed, but perhaps more effort should be put into the
> > > > context/motivation/background of a NEP before accepting it. One
> > > > way to ensure this might be to require a final proofreading
> > > > step by someone who has not been involved at all in the
> > > > discussions, like peer review does for papers.
> > > >
> > > > Food for thought.
> > > >
> > > > Juan.
> > > >
> > > > On 13 Aug 2020, at 9:24 am, Ilhan Polat <[hidden email]>
> > > > wrote:
> > > >
> > > > For what is worth, as a potential consumer in SciPy, it really
> > > > doesn't say anything (both in NEP and the PR) about how the
> > > > regular users of NumPy will benefit from this. If only and only
> > > > 3rd parties are going to benefit from it, I am not sure adding
> > > > a new keyword to an already confusing function is the right
> > > > thing to do.
> > > >
> > > > Let me clarify,
> > > >
> > > > - This is already a very (I mean extremely very) easy keyword
> > > > name to confuse with ones_like, zeros_like and by its nature
> > > > any other interpretation. It is not signalling anything about
> > > > the functionality that is being discussed. I would seriously
> > > > consider reserving such obvious names for really obvious tasks.
> > > > Because you would also expect the shape and ndim would be
> > > > mimicked by the "like"d argument but it turns out it is acting
> > > > more like "typeof=" and not "like=" at all. Because if we
> > > > follow the semantics it reads as "make your argument asarray
> > > > like the other thing" but it is actually doing, "make your
> > > > argument an array with the other thing's type" which might not
> > > > be an array after all.
> > > >
> > > > - Again, if this is meant for downstream libraries (because
> > > > that's what I got out of the PR discussion, cupy, dask, and JAX
> > > > were the only examples I could read) then hiding it in another
> > > > function and writing with capital letters "this is not meant
> > > > for numpy users" would be a much more convenient way to
> > > > separate the target audience and regular users.
> > > > numpy.astypedarray([[some data], [...]], type_of=x) or whatever
> > > > else it may be would be quite clean and to the point with no
> > > > ambiguous keywords.
> > > >
> > > > I think, arriving to an agreement would be much faster if there
> > > > is an executive summary of who this is intended for and what
> > > > the regular usage is. Because with no offense, all I see is
> > > > "dispatch", "_array_function_" and a lot of technical details
> > > > of which I am absolutely ignorant.
> > > >
> > > > Finally as a minor point, I know we are mostly (ex-)academics
> > > > but this necessity of formal language on NEPs is self-imposed
> > > > (probably PEPs are to blame) and not quite helping. It can be a
> > > > bit more descriptive in my external opinion.
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > [hidden email]
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Experimental `like=` attribute for array creation functions

Ilhan Polat
In reply to this post by Peter Andreas Entschev
Yes, the underlying gory details should be spelled out of course but if it is also modifying/adding to API then it is best to sound the horn and invite zombies to take a stab at it. Often people arrive with interesting use-cases that you wouldn't have thought about.

And I am very familiar with the pushback feeling you are having right now, probably internally shouting "where have you been all this time you slackers?". As you might have seen me asking questions here and Cython lists, when I am done with some new feature over SciPy, it is also going to be a very very long and tiring process. I am really not looking forward to it :-)  but I guess it is part of the deal. Maybe I can give some comfort that if more people start to flock over that means it has morphed into a finished product so people can shoot. But, I honestly thought this was a new NEP, that's a mistake on my part.

For the like, typeof and other candidates, by esoteric I mean foreign enough to most users. We already have a nice candidate I think; ehm... "dispatch" or "dispatch_like" or something like that, nobody sober enough would confuse this with any other. And since this won't be typed in daily usage, or so I understood, I guess it is ok to make it verbose. But still take it as an initial guess and feel free to dismiss.

I still would be in a platonic love with "numpy.DIY" or "numpy.hermes" namespace with a nice "bring your own _array_function_" service.








On Thu, Aug 13, 2020 at 4:16 PM Peter Andreas Entschev <[hidden email]> wrote:
Ilhan,

Thanks, that does clarify things.

I think the main point -- and correct me here if I'm still wrong -- is
that we want the NEP to have some very clear example of when/why/how
to use it, preferably as early in the text as possible, maybe just
below the Abstract, in a Motivation and Scope section, as the NEP
Template [6] pointed out to by Ralf earlier suggests. That is a
totally valid ask, and I'll try to address it as soon as possible
(hopefully today or tomorrow).

To the point of whether NEPs are to be read by users, I normally don't
expect users to be required to read and understand those NEPs other
than by pure curiosity. If we need them to do so, then there's
definitely a big problem in the API. This may sound counterintuitive
with what I said before about the "like=" name, but that's really the
piece of the NumPy API that I with a somewhat reasonable understand of
arrays don't quite get or like, for instance "asarray" and "like"
sound exactly the same thing, but they're not in the NumPy context,
and on the other hand it's quite difficult to find a reasonable name
to clarify that. And once more, I do like the "typeof=" suggestion
more than "like=" to be perfectly honest, I'm just afraid it could be
mistaken by the "dtype=" keyword somehow and thus still not solve the
clarity problem. Going back to users reading NEPs or not, I would
really expect that the docstring from the function is sufficiently
clear to keep users off of it, but still give them an understanding of
why that exists, the current docstring is in [9], please do comment on
it if you have ideas of how to make it more accessible to users.

You also mentioned you'd like that the name is as esoteric as
possible, do you have any suggestions for an esoteric name that is
hopefully unambiguous too? Naming has definitely been very much on the
table since the NEP was written, but the consensus was more that
"like=" is reasonably similar enough in both application and the name
itself to "empty_like" and derived functions, that's why we just stuck
to it.

Best,
Peter

[9] https://github.com/numpy/numpy/pull/16935/files#diff-e5969453e399f2d32519d305b2582da9R16-R22

On Thu, Aug 13, 2020 at 3:43 PM Ilhan Polat <[hidden email]> wrote:
>
> To maybe lighten up the discussion a bit and to make my outsider confusion more tangible, let me start by apologizing for diving head first without weighing the past luggage :-) I always forget how much effort goes into these things and for outsiders like me, it's a matter of dipping the finger and tasting it just before starting to complain how much salt is missing etc. What I was mentioning about NEPs wasn't only related specifically to this one by the way. It's the generic feeling that I have.
>
> First let me start what I mean by NumPy users and downstreamers distinction. This is very much related to how data-science and huge-array users are magnetizing every tool out there in the Python world which is fine though the majority of number-crunchers have nothing to do with any of GPU/Parallelism/ClusterUsage etc. Hence when I mention NumPy users, think of people who use NumPy as its own right with no duck-typing and nothing related to subclassing. Just straightforward array creation and lots of ops on these arrays. For those people (I'm one of them), this option brings in a keyword that we would never use. And it gets into many major functions (linspace and others mentioned somewhere). So it has a very appealing name but has nothing to do with me in an already very crowded namespace and keyword catalogue. That's basically a UX issue to be addressed (under the assumption that users like me are the majority). Either making its name as esoteric as possible so I naturally stay away from it or I don't see it. This has absolutely nothing to do with looking down on the downstream libraries. They are flat-out amazing and the more we can support them the merrier.
>
> Using yet another metaphor, I was hoping that NumPy would have a loading dock for heavy duty deliveries for downstream projects or specialized array creations and won't disturb the regular customer entrance. Because if I look at this page https://numpy.org/doc/stable/referenc/routines.array-creation.html, there are a lot of functions and I think most of them are candidates to gain this keyword.  I wish I can comment on a viable alternative but I really cannot understand the _array_xxxx_ discussions since they fly way over my head no matter how many times I tried. So that's why I naively mentioned the "np.astypedarray" or "np.asarray_but_not_numpy_array" or whatever. Now I see that it is even more complicated and I generated extra noise. So you can just ignore my previous suggestions. Except that I want to draw attention to the UX problem and I'd like to leave it at that.
>
> The other point is about the NEP stuff. I think I need to elaborate. If the NEPs are meant for internal NumPy discussions, then by all means, crank up the pointer*-meter to 11 and dive into it, totally fine with me. But if you also want to get feedback from outside, then probably a few lines of code examples for mere mortals would go a long way. Also it would make the discussion much more streamlined in my humble opinion. What I was trying to get at was that almost all NEPs read like a legal document that I want to agree as soon as possible. Because they often come without any or minimal amount of code in it. In NEP35 for example, there are nice code blocks in function dispatching but I guess it's not meant for me. Because it is only decorating asarray with some black magic happening there somehow (I guess). So I can't even comprehend what the proposition would mean for the regular, friendly, anti-duck users. But I am pretty sure it is about dispatching something because the word is repeated ~20 times :-)  Thus the feedback would be limited. That was also what I meant there. But again I totally understand the complexity of these issues. So I'm not expecting to understand all details of NumPy machinery in a single NEP.
>
> But anyways, hope this clarifies a few things that I failed to convey in my previous mail.
> ilhan
>
>
>
> On Thu, Aug 13, 2020 at 2:23 PM Ralf Gommers <[hidden email]> wrote:
>>
>> Thanks for raising these concerns Ilhan and Juan, and for answering Peter. Let me give my perspective as well.
>>
>> To start with, this is not specifically about Peter's NEP and PR. NEP 35 simply follows the pattern set by previous PRs, and given its tight scope is less difficult to understand than other NEPs on such technical topics. Peter has done a lot of things right, and is close to the finish line.
>>
>>
>> On Thu, Aug 13, 2020 at 12:02 PM Peter Andreas Entschev <[hidden email]> wrote:
>>>
>>>
>>> > I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
>>>
>>> This is what I intended to do in the Usage Guidance [2] section. Could
>>> you elaborate on what more information you'd want to see there? Or is
>>> it just a matter of reorganizing the NEP a bit to try and summarize
>>> such things right at the top?
>>
>>
>> We adapted the NEP template [6] several times last year to try and improve this. And specified in there as well that NEP content set to the mailing list should only contain the sections: Abstract, Motivation and Scope, Usage and Impact, and Backwards compatibility. This to ensure we fully understand the "why" and "what" before the "how". Unfortunately that template and procedure hasn't been exercised much yet, only in NEP 38 [7] and partially in NEP 41 [8].
>>
>> If we have long-time maintainers of SciPy (Ilhan and myself), scikit-image (Juan) and CuPy (Leo, on the PR review) all saying they don't understand the goals, relevance, target audience, or how they're supposed to use a new feature, that indicates that the people doing the writing and having the discussion are doing something wrong at a very fundamental level.
>>
>> At this point I'm pretty disappointed in and tired of how we write and discuss NEPs on technical topics like dispatching, dtypes and the like. People literally refuse to write down concrete motivations, goals and non-goals, code that's problematic now and will be better/working post-NEP and usage examples before launching into extensive discussion of the gory details of the internals. I'm not sure what to do about it. Completely separate API and behavior proposals from implementation proposals? Make separate "API" and "internals" teams with the likes of Juan, Ilhan and Leo on the API team which then needs to approve every API change in new NEPs? Offer to co-write NEPs if someone is willing but doesn't understand how to go about it? Keep the current structure/process but veto further approvals until NEP authors get it right?
>>
>> I want to make an exception for merging the current NEP, for which the plan is to merge it as experimental to try in downstream PRs and get more experience. That does mean that master will be in an unreleasable state by the way, which is unusual and it'd be nice to get Chuck's explicit OK for that. But after that, I think we need a change here. I would like to hear what everyone thinks is the shape that change should take - any of my above suggestions, or something else?
>>
>>
>>>
>>> > Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
>>>
>>> TBH, I don't really know how to solve that point, so if you have any
>>> specific suggestions, that's certainly welcome. I understand the
>>> frustration for a reader trying to understand all the details, with
>>> many being only described in NEP-18 [3], but we also strive to avoid
>>> rewriting things that are written elsewhere, which would also
>>> overburden those who are aware of what's being discussed.
>>>
>>>
>>> > I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
>>
>>
>> Some variant of this proposal would be my preference.
>>
>> Cheers,
>> Ralf
>>
>>>
>>> [1] https://github.com/numpy/numpy/issues/14441#issuecomment-529969572
>>> [2] https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function.html#usage-guidance
>>> [3] https://numpy.org/neps/nep-0018-array-function-protocol.html
>>> [4] https://numpy.org/neps/nep-0000.html#nep-workflow
>>> [5] https://mail.python.org/pipermail/numpy-discussion/2019-October/080176.html
>>
>>
>> [6] https://github.com/numpy/numpy/blob/master/doc/neps/nep-template.rst
>> [7] https://github.com/numpy/numpy/blob/master/doc/neps/nep-0038-SIMD-optimizations.rst
>> [8] https://github.com/numpy/numpy/blob/master/doc/neps/nep-0041-improved-dtype-support.rst
>>
>>
>>>
>>>
>>>
>>> On Thu, Aug 13, 2020 at 3:44 AM Juan Nunez-Iglesias <[hidden email]> wrote:
>>> >
>>> > I’ve generally been on the “let the NumPy devs worry about it” side of things, but I do agree with Ilhan that `like=` is confusing and `typeof=` would be a much more appropriate name for that parameter.
>>> >
>>> > I do think library writers are NumPy users and so I wouldn’t really make that distinction, though. Users writing their own analysis code could very well be interested in writing code using numpy functions that will transparently work when the input is a CuPy array or whatever.
>>> >
>>> > I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
>>> >
>>> > Food for thought.
>>> >
>>> > Juan.
>>> >
>>> > On 13 Aug 2020, at 9:24 am, Ilhan Polat <[hidden email]> wrote:
>>> >
>>> > For what is worth, as a potential consumer in SciPy, it really doesn't say anything (both in NEP and the PR) about how the regular users of NumPy will benefit from this. If only and only 3rd parties are going to benefit from it, I am not sure adding a new keyword to an already confusing function is the right thing to do.
>>> >
>>> > Let me clarify,
>>> >
>>> > - This is already a very (I mean extremely very) easy keyword name to confuse with ones_like, zeros_like and by its nature any other interpretation. It is not signalling anything about the functionality that is being discussed. I would seriously consider reserving such obvious names for really obvious tasks. Because you would also expect the shape and ndim would be mimicked by the "like"d argument but it turns out it is acting more like "typeof=" and not "like=" at all. Because if we follow the semantics it reads as "make your argument asarray like the other thing" but it is actually doing, "make your argument an array with the other thing's type" which might not be an array after all.
>>> >
>>> > - Again, if this is meant for downstream libraries (because that's what I got out of the PR discussion, cupy, dask, and JAX were the only examples I could read) then hiding it in another function and writing with capital letters "this is not meant for numpy users" would be a much more convenient way to separate the target audience and regular users. numpy.astypedarray([[some data], [...]], type_of=x) or whatever else it may be would be quite clean and to the point with no ambiguous keywords.
>>> >
>>> > I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
>>> >
>>> > Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> [hidden email]
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Experimental `like=` attribute for array creation functions

Stephan Hoyer-2
In reply to this post by ralfgommers
On Thu, Aug 13, 2020 at 5:22 AM Ralf Gommers <[hidden email]> wrote:
Thanks for raising these concerns Ilhan and Juan, and for answering Peter. Let me give my perspective as well.

To start with, this is not specifically about Peter's NEP and PR. NEP 35 simply follows the pattern set by previous PRs, and given its tight scope is less difficult to understand than other NEPs on such technical topics. Peter has done a lot of things right, and is close to the finish line.


On Thu, Aug 13, 2020 at 12:02 PM Peter Andreas Entschev <[hidden email]> wrote:

> I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.

This is what I intended to do in the Usage Guidance [2] section. Could
you elaborate on what more information you'd want to see there? Or is
it just a matter of reorganizing the NEP a bit to try and summarize
such things right at the top?

We adapted the NEP template [6] several times last year to try and improve this. And specified in there as well that NEP content set to the mailing list should only contain the sections: Abstract, Motivation and Scope, Usage and Impact, and Backwards compatibility. This to ensure we fully understand the "why" and "what" before the "how". Unfortunately that template and procedure hasn't been exercised much yet, only in NEP 38 [7] and partially in NEP 41 [8].

If we have long-time maintainers of SciPy (Ilhan and myself), scikit-image (Juan) and CuPy (Leo, on the PR review) all saying they don't understand the goals, relevance, target audience, or how they're supposed to use a new feature, that indicates that the people doing the writing and having the discussion are doing something wrong at a very fundamental level.

At this point I'm pretty disappointed in and tired of how we write and discuss NEPs on technical topics like dispatching, dtypes and the like. People literally refuse to write down concrete motivations, goals and non-goals, code that's problematic now and will be better/working post-NEP and usage examples before launching into extensive discussion of the gory details of the internals. I'm not sure what to do about it. Completely separate API and behavior proposals from implementation proposals? Make separate "API" and "internals" teams with the likes of Juan, Ilhan and Leo on the API team which then needs to approve every API change in new NEPs? Offer to co-write NEPs if someone is willing but doesn't understand how to go about it? Keep the current structure/process but veto further approvals until NEP authors get it right?

I think the NEP template is great, and we should try to be more diligent about following it!

My own NEP 37 (__array_module__) is probably a good example of poor presentation due to not following the template structure. It goes pretty deep into low-level motivation and some implementation details before usage examples.

Speaking just for myself, I would have appreciated a friendly nudge to use the template. Certainly I think it would be fine to require using the template for newly submitted NEPs. I did not remember about it when I started drafting NEP 37, and it definitely would have helped. I may still try to do a revision at some point to use the template structure.
 
I want to make an exception for merging the current NEP, for which the plan is to merge it as experimental to try in downstream PRs and get more experience. That does mean that master will be in an unreleasable state by the way, which is unusual and it'd be nice to get Chuck's explicit OK for that. But after that, I think we need a change here. I would like to hear what everyone thinks is the shape that change should take - any of my above suggestions, or something else?

 
> Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.

TBH, I don't really know how to solve that point, so if you have any
specific suggestions, that's certainly welcome. I understand the
frustration for a reader trying to understand all the details, with
many being only described in NEP-18 [3], but we also strive to avoid
rewriting things that are written elsewhere, which would also
overburden those who are aware of what's being discussed.


> I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.

Some variant of this proposal would be my preference.

Cheers,
Ralf


[1] https://github.com/numpy/numpy/issues/14441#issuecomment-529969572
[2] https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function.html#usage-guidance
[3] https://numpy.org/neps/nep-0018-array-function-protocol.html
[4] https://numpy.org/neps/nep-0000.html#nep-workflow
[5] https://mail.python.org/pipermail/numpy-discussion/2019-October/080176.html






On Thu, Aug 13, 2020 at 3:44 AM Juan Nunez-Iglesias <[hidden email]> wrote:
>
> I’ve generally been on the “let the NumPy devs worry about it” side of things, but I do agree with Ilhan that `like=` is confusing and `typeof=` would be a much more appropriate name for that parameter.
>
> I do think library writers are NumPy users and so I wouldn’t really make that distinction, though. Users writing their own analysis code could very well be interested in writing code using numpy functions that will transparently work when the input is a CuPy array or whatever.
>
> I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
>
> Food for thought.
>
> Juan.
>
> On 13 Aug 2020, at 9:24 am, Ilhan Polat <[hidden email]> wrote:
>
> For what is worth, as a potential consumer in SciPy, it really doesn't say anything (both in NEP and the PR) about how the regular users of NumPy will benefit from this. If only and only 3rd parties are going to benefit from it, I am not sure adding a new keyword to an already confusing function is the right thing to do.
>
> Let me clarify,
>
> - This is already a very (I mean extremely very) easy keyword name to confuse with ones_like, zeros_like and by its nature any other interpretation. It is not signalling anything about the functionality that is being discussed. I would seriously consider reserving such obvious names for really obvious tasks. Because you would also expect the shape and ndim would be mimicked by the "like"d argument but it turns out it is acting more like "typeof=" and not "like=" at all. Because if we follow the semantics it reads as "make your argument asarray like the other thing" but it is actually doing, "make your argument an array with the other thing's type" which might not be an array after all.
>
> - Again, if this is meant for downstream libraries (because that's what I got out of the PR discussion, cupy, dask, and JAX were the only examples I could read) then hiding it in another function and writing with capital letters "this is not meant for numpy users" would be a much more convenient way to separate the target audience and regular users. numpy.astypedarray([[some data], [...]], type_of=x) or whatever else it may be would be quite clean and to the point with no ambiguous keywords.
>
> I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
>
> Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Experimental `like=` attribute for array creation functions

Juan Nunez-Iglesias-2
Hello everyone again!

A few clarifications about my proposal of external peer review:

- Yes, all this work is public and announced on the mailing list. However, I don’t think there’s a single person in this discussion or even this whole ecosystem that does not have a more immediately-pressing and also virtually infinite to-do list, so it’s unreasonable to expect that generally they would do more than glance at the stuff in the mailing list. In the peer review analogy, the mailing list is like the arXiv or Biorxiv stream — yep, anyone can see the stuff on there and comment, but most people just don’t have the time or attention to grab onto that. The only reason I stopped to comment here is Sebastian’s “Imma merge, YOLO!”, which had me raising my eyebrows real high. 😂 Especially for something that would expand the NumPy API!

- So, my proposal is that there needs to be an *editor* of NEPs who takes responsibility, once they are themselves satisfied with the NEP, for seeking out external reviewers and pinging them individually and asking them if they would be ok to review.

- A good friend who does screenwriting once told me, “don’t use all your proofreaders at once”. You want to get feedback, improve things, then feedback from a *totally independent* new person who can see the document with fresh eyes.

Obviously, all of the above slows things down. But “alone we go fast, together we go far”. The point of a NEP is to document critical decisions for the long term health of the project. If the documentation is insufficient, it defeats the whole purpose. Might as well just implement stuff and skip the whole NEP process. (Side note: Stephan, I for one would definitely appreciate an update to existing NEPs if there’s obvious ways they can be improved!)

I do think that NEP templates should be strict, and I don’t think that is incompatible with plain, jargon-free text. The NEP template and guidelines should specify that, and that the motivation should be understandable by a casual NumPy user — the kind described by Ilhan, for whom bare NumPy actually meets all their needs. Maybe they’ve also used PyTorch but they’ve never really had cause to mix them or write a program that worked with both kinds of arrays.

Ditto for backwards compatibility — everyone should be clear when their existing code is going to be broken. Actually NEP18 broke so much of my code, but its Backward compatibility section basically says all good! https://numpy.org/neps/nep-0018-array-function-protocol.html#backward-compatibility 

Anywho, as always, none of this is criticism to work done — I thank you all, and am eternally grateful for all the hard work everyone is doing to keep the ecosystem from fragmenting. I’m just hoping that this discussion can improve the process going forward!

And, yes, apologies to Peter, I know from repeated personal experience how frustrating it can be to have last-minute drive-by objections after months of consensus building! But I think in the end every time that happened the end result was better — I hope the same is true here! And yes, I’ll reiterate Ralf’s point: my concerns are about the NEP process itself rather than this one. I’ll summarise my proposal:

- strict NEP template. NEPs with missing sections will not be accepted.
- sections Abstract, Motivation, and Backwards Compatibility should be understandable at a high level by casual users with ~zero background on the topic
- enforce the above with at least two independent rounds of coordinated peer review.

Thank you,

Juan.

On 14 Aug 2020, at 5:29 am, Stephan Hoyer <[hidden email]> wrote:

On Thu, Aug 13, 2020 at 5:22 AM Ralf Gommers <[hidden email]> wrote:
Thanks for raising these concerns Ilhan and Juan, and for answering Peter. Let me give my perspective as well.

To start with, this is not specifically about Peter's NEP and PR. NEP 35 simply follows the pattern set by previous PRs, and given its tight scope is less difficult to understand than other NEPs on such technical topics. Peter has done a lot of things right, and is close to the finish line.


On Thu, Aug 13, 2020 at 12:02 PM Peter Andreas Entschev <[hidden email]> wrote:

> I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.

This is what I intended to do in the Usage Guidance [2] section. Could
you elaborate on what more information you'd want to see there? Or is
it just a matter of reorganizing the NEP a bit to try and summarize
such things right at the top?

We adapted the NEP template [6] several times last year to try and improve this. And specified in there as well that NEP content set to the mailing list should only contain the sections: Abstract, Motivation and Scope, Usage and Impact, and Backwards compatibility. This to ensure we fully understand the "why" and "what" before the "how". Unfortunately that template and procedure hasn't been exercised much yet, only in NEP 38 [7] and partially in NEP 41 [8].

If we have long-time maintainers of SciPy (Ilhan and myself), scikit-image (Juan) and CuPy (Leo, on the PR review) all saying they don't understand the goals, relevance, target audience, or how they're supposed to use a new feature, that indicates that the people doing the writing and having the discussion are doing something wrong at a very fundamental level. 

At this point I'm pretty disappointed in and tired of how we write and discuss NEPs on technical topics like dispatching, dtypes and the like. People literally refuse to write down concrete motivations, goals and non-goals, code that's problematic now and will be better/working post-NEP and usage examples before launching into extensive discussion of the gory details of the internals. I'm not sure what to do about it. Completely separate API and behavior proposals from implementation proposals? Make separate "API" and "internals" teams with the likes of Juan, Ilhan and Leo on the API team which then needs to approve every API change in new NEPs? Offer to co-write NEPs if someone is willing but doesn't understand how to go about it? Keep the current structure/process but veto further approvals until NEP authors get it right?

I think the NEP template is great, and we should try to be more diligent about following it!

My own NEP 37 (__array_module__) is probably a good example of poor presentation due to not following the template structure. It goes pretty deep into low-level motivation and some implementation details before usage examples.

Speaking just for myself, I would have appreciated a friendly nudge to use the template. Certainly I think it would be fine to require using the template for newly submitted NEPs. I did not remember about it when I started drafting NEP 37, and it definitely would have helped. I may still try to do a revision at some point to use the template structure.
 
I want to make an exception for merging the current NEP, for which the plan is to merge it as experimental to try in downstream PRs and get more experience. That does mean that master will be in an unreleasable state by the way, which is unusual and it'd be nice to get Chuck's explicit OK for that. But after that, I think we need a change here. I would like to hear what everyone thinks is the shape that change should take - any of my above suggestions, or something else?

 
> Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.

TBH, I don't really know how to solve that point, so if you have any
specific suggestions, that's certainly welcome. I understand the
frustration for a reader trying to understand all the details, with
many being only described in NEP-18 [3], but we also strive to avoid
rewriting things that are written elsewhere, which would also
overburden those who are aware of what's being discussed.


> I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.

Some variant of this proposal would be my preference.

Cheers,
Ralf


[1] https://github.com/numpy/numpy/issues/14441#issuecomment-529969572
[2] https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function.html#usage-guidance
[3] https://numpy.org/neps/nep-0018-array-function-protocol.html
[4] https://numpy.org/neps/nep-0000.html#nep-workflow
[5] https://mail.python.org/pipermail/numpy-discussion/2019-October/080176.html






On Thu, Aug 13, 2020 at 3:44 AM Juan Nunez-Iglesias <[hidden email]> wrote:

>
> I’ve generally been on the “let the NumPy devs worry about it” side of things, but I do agree with Ilhan that `like=` is confusing and `typeof=` would be a much more appropriate name for that parameter.
>
> I do think library writers are NumPy users and so I wouldn’t really make that distinction, though. Users writing their own analysis code could very well be interested in writing code using numpy functions that will transparently work when the input is a CuPy array or whatever.
>
> I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
>
> Food for thought.
>
> Juan.
>
> On 13 Aug 2020, at 9:24 am, Ilhan Polat <[hidden email]> wrote:
>
> For what is worth, as a potential consumer in SciPy, it really doesn't say anything (both in NEP and the PR) about how the regular users of NumPy will benefit from this. If only and only 3rd parties are going to benefit from it, I am not sure adding a new keyword to an already confusing function is the right thing to do.
>
> Let me clarify,
>
> - This is already a very (I mean extremely very) easy keyword name to confuse with ones_like, zeros_like and by its nature any other interpretation. It is not signalling anything about the functionality that is being discussed. I would seriously consider reserving such obvious names for really obvious tasks. Because you would also expect the shape and ndim would be mimicked by the "like"d argument but it turns out it is acting more like "typeof=" and not "like=" at all. Because if we follow the semantics it reads as "make your argument asarray like the other thing" but it is actually doing, "make your argument an array with the other thing's type" which might not be an array after all.
>
> - Again, if this is meant for downstream libraries (because that's what I got out of the PR discussion, cupy, dask, and JAX were the only examples I could read) then hiding it in another function and writing with capital letters "this is not meant for numpy users" would be a much more convenient way to separate the target audience and regular users. numpy.astypedarray([[some data], [...]], type_of=x) or whatever else it may be would be quite clean and to the point with no ambiguous keywords.
>
> I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
>
> Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Experimental `like=` attribute for array creation functions

Peter Andreas Entschev
Hi all,

This thread has IMO drifted very far from its original purpose, due to that I decided to start a new thread specifically for the general NEP procedure discussed, please check your mail for "NEP Procedure Discussion" subject.

On the topic of this thread, I'll try to rewrite NEP-35 to make it more accessible and ping back here once I have a PR for that. Is there anything else that's pressing here? If there is and I missed/forgot about it, please let me know.

Best,
Peter

On Fri, Aug 14, 2020 at 5:00 AM Juan Nunez-Iglesias <[hidden email]> wrote:
Hello everyone again!

A few clarifications about my proposal of external peer review:

- Yes, all this work is public and announced on the mailing list. However, I don’t think there’s a single person in this discussion or even this whole ecosystem that does not have a more immediately-pressing and also virtually infinite to-do list, so it’s unreasonable to expect that generally they would do more than glance at the stuff in the mailing list. In the peer review analogy, the mailing list is like the arXiv or Biorxiv stream — yep, anyone can see the stuff on there and comment, but most people just don’t have the time or attention to grab onto that. The only reason I stopped to comment here is Sebastian’s “Imma merge, YOLO!”, which had me raising my eyebrows real high. 😂 Especially for something that would expand the NumPy API!

- So, my proposal is that there needs to be an *editor* of NEPs who takes responsibility, once they are themselves satisfied with the NEP, for seeking out external reviewers and pinging them individually and asking them if they would be ok to review.

- A good friend who does screenwriting once told me, “don’t use all your proofreaders at once”. You want to get feedback, improve things, then feedback from a *totally independent* new person who can see the document with fresh eyes.

Obviously, all of the above slows things down. But “alone we go fast, together we go far”. The point of a NEP is to document critical decisions for the long term health of the project. If the documentation is insufficient, it defeats the whole purpose. Might as well just implement stuff and skip the whole NEP process. (Side note: Stephan, I for one would definitely appreciate an update to existing NEPs if there’s obvious ways they can be improved!)

I do think that NEP templates should be strict, and I don’t think that is incompatible with plain, jargon-free text. The NEP template and guidelines should specify that, and that the motivation should be understandable by a casual NumPy user — the kind described by Ilhan, for whom bare NumPy actually meets all their needs. Maybe they’ve also used PyTorch but they’ve never really had cause to mix them or write a program that worked with both kinds of arrays.

Ditto for backwards compatibility — everyone should be clear when their existing code is going to be broken. Actually NEP18 broke so much of my code, but its Backward compatibility section basically says all good! https://numpy.org/neps/nep-0018-array-function-protocol.html#backward-compatibility 

Anywho, as always, none of this is criticism to work done — I thank you all, and am eternally grateful for all the hard work everyone is doing to keep the ecosystem from fragmenting. I’m just hoping that this discussion can improve the process going forward!

And, yes, apologies to Peter, I know from repeated personal experience how frustrating it can be to have last-minute drive-by objections after months of consensus building! But I think in the end every time that happened the end result was better — I hope the same is true here! And yes, I’ll reiterate Ralf’s point: my concerns are about the NEP process itself rather than this one. I’ll summarise my proposal:

- strict NEP template. NEPs with missing sections will not be accepted.
- sections Abstract, Motivation, and Backwards Compatibility should be understandable at a high level by casual users with ~zero background on the topic
- enforce the above with at least two independent rounds of coordinated peer review.

Thank you,

Juan.

On 14 Aug 2020, at 5:29 am, Stephan Hoyer <[hidden email]> wrote:

On Thu, Aug 13, 2020 at 5:22 AM Ralf Gommers <[hidden email]> wrote:
Thanks for raising these concerns Ilhan and Juan, and for answering Peter. Let me give my perspective as well.

To start with, this is not specifically about Peter's NEP and PR. NEP 35 simply follows the pattern set by previous PRs, and given its tight scope is less difficult to understand than other NEPs on such technical topics. Peter has done a lot of things right, and is close to the finish line.


On Thu, Aug 13, 2020 at 12:02 PM Peter Andreas Entschev <[hidden email]> wrote:

> I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.

This is what I intended to do in the Usage Guidance [2] section. Could
you elaborate on what more information you'd want to see there? Or is
it just a matter of reorganizing the NEP a bit to try and summarize
such things right at the top?

We adapted the NEP template [6] several times last year to try and improve this. And specified in there as well that NEP content set to the mailing list should only contain the sections: Abstract, Motivation and Scope, Usage and Impact, and Backwards compatibility. This to ensure we fully understand the "why" and "what" before the "how". Unfortunately that template and procedure hasn't been exercised much yet, only in NEP 38 [7] and partially in NEP 41 [8].

If we have long-time maintainers of SciPy (Ilhan and myself), scikit-image (Juan) and CuPy (Leo, on the PR review) all saying they don't understand the goals, relevance, target audience, or how they're supposed to use a new feature, that indicates that the people doing the writing and having the discussion are doing something wrong at a very fundamental level. 

At this point I'm pretty disappointed in and tired of how we write and discuss NEPs on technical topics like dispatching, dtypes and the like. People literally refuse to write down concrete motivations, goals and non-goals, code that's problematic now and will be better/working post-NEP and usage examples before launching into extensive discussion of the gory details of the internals. I'm not sure what to do about it. Completely separate API and behavior proposals from implementation proposals? Make separate "API" and "internals" teams with the likes of Juan, Ilhan and Leo on the API team which then needs to approve every API change in new NEPs? Offer to co-write NEPs if someone is willing but doesn't understand how to go about it? Keep the current structure/process but veto further approvals until NEP authors get it right?

I think the NEP template is great, and we should try to be more diligent about following it!

My own NEP 37 (__array_module__) is probably a good example of poor presentation due to not following the template structure. It goes pretty deep into low-level motivation and some implementation details before usage examples.

Speaking just for myself, I would have appreciated a friendly nudge to use the template. Certainly I think it would be fine to require using the template for newly submitted NEPs. I did not remember about it when I started drafting NEP 37, and it definitely would have helped. I may still try to do a revision at some point to use the template structure.
 
I want to make an exception for merging the current NEP, for which the plan is to merge it as experimental to try in downstream PRs and get more experience. That does mean that master will be in an unreleasable state by the way, which is unusual and it'd be nice to get Chuck's explicit OK for that. But after that, I think we need a change here. I would like to hear what everyone thinks is the shape that change should take - any of my above suggestions, or something else?

 
> Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.

TBH, I don't really know how to solve that point, so if you have any
specific suggestions, that's certainly welcome. I understand the
frustration for a reader trying to understand all the details, with
many being only described in NEP-18 [3], but we also strive to avoid
rewriting things that are written elsewhere, which would also
overburden those who are aware of what's being discussed.


> I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.

Some variant of this proposal would be my preference.

Cheers,
Ralf


[1] https://github.com/numpy/numpy/issues/14441#issuecomment-529969572
[2] https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function.html#usage-guidance
[3] https://numpy.org/neps/nep-0018-array-function-protocol.html
[4] https://numpy.org/neps/nep-0000.html#nep-workflow
[5] https://mail.python.org/pipermail/numpy-discussion/2019-October/080176.html






On Thu, Aug 13, 2020 at 3:44 AM Juan Nunez-Iglesias <[hidden email]> wrote:

>
> I’ve generally been on the “let the NumPy devs worry about it” side of things, but I do agree with Ilhan that `like=` is confusing and `typeof=` would be a much more appropriate name for that parameter.
>
> I do think library writers are NumPy users and so I wouldn’t really make that distinction, though. Users writing their own analysis code could very well be interested in writing code using numpy functions that will transparently work when the input is a CuPy array or whatever.
>
> I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
>
> Food for thought.
>
> Juan.
>
> On 13 Aug 2020, at 9:24 am, Ilhan Polat <[hidden email]> wrote:
>
> For what is worth, as a potential consumer in SciPy, it really doesn't say anything (both in NEP and the PR) about how the regular users of NumPy will benefit from this. If only and only 3rd parties are going to benefit from it, I am not sure adding a new keyword to an already confusing function is the right thing to do.
>
> Let me clarify,
>
> - This is already a very (I mean extremely very) easy keyword name to confuse with ones_like, zeros_like and by its nature any other interpretation. It is not signalling anything about the functionality that is being discussed. I would seriously consider reserving such obvious names for really obvious tasks. Because you would also expect the shape and ndim would be mimicked by the "like"d argument but it turns out it is acting more like "typeof=" and not "like=" at all. Because if we follow the semantics it reads as "make your argument asarray like the other thing" but it is actually doing, "make your argument an array with the other thing's type" which might not be an array after all.
>
> - Again, if this is meant for downstream libraries (because that's what I got out of the PR discussion, cupy, dask, and JAX were the only examples I could read) then hiding it in another function and writing with capital letters "this is not meant for numpy users" would be a much more convenient way to separate the target audience and regular users. numpy.astypedarray([[some data], [...]], type_of=x) or whatever else it may be would be quite clean and to the point with no ambiguous keywords.
>
> I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
>
> Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Experimental `like=` attribute for array creation functions

ralfgommers


On Fri, Aug 14, 2020 at 12:23 PM Peter Andreas Entschev <[hidden email]> wrote:
Hi all,

This thread has IMO drifted very far from its original purpose, due to that I decided to start a new thread specifically for the general NEP procedure discussed, please check your mail for "NEP Procedure Discussion" subject.

Thanks Peter. For future reference: better to just edit the thread subject, but not start over completely - people want to reply to previous content. I will copy over comments I'd like to reply to to the other thread by hand now.


On the topic of this thread, I'll try to rewrite NEP-35 to make it more accessible and ping back here once I have a PR for that.

Thanks!

Cheers,
Ralf

Is there anything else that's pressing here? If there is and I missed/forgot about it, please let me know.

Best,
Peter

On Fri, Aug 14, 2020 at 5:00 AM Juan Nunez-Iglesias <[hidden email]> wrote:
Hello everyone again!

A few clarifications about my proposal of external peer review:

- Yes, all this work is public and announced on the mailing list. However, I don’t think there’s a single person in this discussion or even this whole ecosystem that does not have a more immediately-pressing and also virtually infinite to-do list, so it’s unreasonable to expect that generally they would do more than glance at the stuff in the mailing list. In the peer review analogy, the mailing list is like the arXiv or Biorxiv stream — yep, anyone can see the stuff on there and comment, but most people just don’t have the time or attention to grab onto that. The only reason I stopped to comment here is Sebastian’s “Imma merge, YOLO!”, which had me raising my eyebrows real high. 😂 Especially for something that would expand the NumPy API!

- So, my proposal is that there needs to be an *editor* of NEPs who takes responsibility, once they are themselves satisfied with the NEP, for seeking out external reviewers and pinging them individually and asking them if they would be ok to review.

- A good friend who does screenwriting once told me, “don’t use all your proofreaders at once”. You want to get feedback, improve things, then feedback from a *totally independent* new person who can see the document with fresh eyes.

Obviously, all of the above slows things down. But “alone we go fast, together we go far”. The point of a NEP is to document critical decisions for the long term health of the project. If the documentation is insufficient, it defeats the whole purpose. Might as well just implement stuff and skip the whole NEP process. (Side note: Stephan, I for one would definitely appreciate an update to existing NEPs if there’s obvious ways they can be improved!)

I do think that NEP templates should be strict, and I don’t think that is incompatible with plain, jargon-free text. The NEP template and guidelines should specify that, and that the motivation should be understandable by a casual NumPy user — the kind described by Ilhan, for whom bare NumPy actually meets all their needs. Maybe they’ve also used PyTorch but they’ve never really had cause to mix them or write a program that worked with both kinds of arrays.

Ditto for backwards compatibility — everyone should be clear when their existing code is going to be broken. Actually NEP18 broke so much of my code, but its Backward compatibility section basically says all good! https://numpy.org/neps/nep-0018-array-function-protocol.html#backward-compatibility 

Anywho, as always, none of this is criticism to work done — I thank you all, and am eternally grateful for all the hard work everyone is doing to keep the ecosystem from fragmenting. I’m just hoping that this discussion can improve the process going forward!

And, yes, apologies to Peter, I know from repeated personal experience how frustrating it can be to have last-minute drive-by objections after months of consensus building! But I think in the end every time that happened the end result was better — I hope the same is true here! And yes, I’ll reiterate Ralf’s point: my concerns are about the NEP process itself rather than this one. I’ll summarise my proposal:

- strict NEP template. NEPs with missing sections will not be accepted.
- sections Abstract, Motivation, and Backwards Compatibility should be understandable at a high level by casual users with ~zero background on the topic
- enforce the above with at least two independent rounds of coordinated peer review.

Thank you,

Juan.

On 14 Aug 2020, at 5:29 am, Stephan Hoyer <[hidden email]> wrote:

On Thu, Aug 13, 2020 at 5:22 AM Ralf Gommers <[hidden email]> wrote:
Thanks for raising these concerns Ilhan and Juan, and for answering Peter. Let me give my perspective as well.

To start with, this is not specifically about Peter's NEP and PR. NEP 35 simply follows the pattern set by previous PRs, and given its tight scope is less difficult to understand than other NEPs on such technical topics. Peter has done a lot of things right, and is close to the finish line.


On Thu, Aug 13, 2020 at 12:02 PM Peter Andreas Entschev <[hidden email]> wrote:

> I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.

This is what I intended to do in the Usage Guidance [2] section. Could
you elaborate on what more information you'd want to see there? Or is
it just a matter of reorganizing the NEP a bit to try and summarize
such things right at the top?

We adapted the NEP template [6] several times last year to try and improve this. And specified in there as well that NEP content set to the mailing list should only contain the sections: Abstract, Motivation and Scope, Usage and Impact, and Backwards compatibility. This to ensure we fully understand the "why" and "what" before the "how". Unfortunately that template and procedure hasn't been exercised much yet, only in NEP 38 [7] and partially in NEP 41 [8].

If we have long-time maintainers of SciPy (Ilhan and myself), scikit-image (Juan) and CuPy (Leo, on the PR review) all saying they don't understand the goals, relevance, target audience, or how they're supposed to use a new feature, that indicates that the people doing the writing and having the discussion are doing something wrong at a very fundamental level. 

At this point I'm pretty disappointed in and tired of how we write and discuss NEPs on technical topics like dispatching, dtypes and the like. People literally refuse to write down concrete motivations, goals and non-goals, code that's problematic now and will be better/working post-NEP and usage examples before launching into extensive discussion of the gory details of the internals. I'm not sure what to do about it. Completely separate API and behavior proposals from implementation proposals? Make separate "API" and "internals" teams with the likes of Juan, Ilhan and Leo on the API team which then needs to approve every API change in new NEPs? Offer to co-write NEPs if someone is willing but doesn't understand how to go about it? Keep the current structure/process but veto further approvals until NEP authors get it right?

I think the NEP template is great, and we should try to be more diligent about following it!

My own NEP 37 (__array_module__) is probably a good example of poor presentation due to not following the template structure. It goes pretty deep into low-level motivation and some implementation details before usage examples.

Speaking just for myself, I would have appreciated a friendly nudge to use the template. Certainly I think it would be fine to require using the template for newly submitted NEPs. I did not remember about it when I started drafting NEP 37, and it definitely would have helped. I may still try to do a revision at some point to use the template structure.
 
I want to make an exception for merging the current NEP, for which the plan is to merge it as experimental to try in downstream PRs and get more experience. That does mean that master will be in an unreleasable state by the way, which is unusual and it'd be nice to get Chuck's explicit OK for that. But after that, I think we need a change here. I would like to hear what everyone thinks is the shape that change should take - any of my above suggestions, or something else?

 
> Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.

TBH, I don't really know how to solve that point, so if you have any
specific suggestions, that's certainly welcome. I understand the
frustration for a reader trying to understand all the details, with
many being only described in NEP-18 [3], but we also strive to avoid
rewriting things that are written elsewhere, which would also
overburden those who are aware of what's being discussed.


> I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.

Some variant of this proposal would be my preference.

Cheers,
Ralf


[1] https://github.com/numpy/numpy/issues/14441#issuecomment-529969572
[2] https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function.html#usage-guidance
[3] https://numpy.org/neps/nep-0018-array-function-protocol.html
[4] https://numpy.org/neps/nep-0000.html#nep-workflow
[5] https://mail.python.org/pipermail/numpy-discussion/2019-October/080176.html






On Thu, Aug 13, 2020 at 3:44 AM Juan Nunez-Iglesias <[hidden email]> wrote:

>
> I’ve generally been on the “let the NumPy devs worry about it” side of things, but I do agree with Ilhan that `like=` is confusing and `typeof=` would be a much more appropriate name for that parameter.
>
> I do think library writers are NumPy users and so I wouldn’t really make that distinction, though. Users writing their own analysis code could very well be interested in writing code using numpy functions that will transparently work when the input is a CuPy array or whatever.
>
> I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
>
> Food for thought.
>
> Juan.
>
> On 13 Aug 2020, at 9:24 am, Ilhan Polat <[hidden email]> wrote:
>
> For what is worth, as a potential consumer in SciPy, it really doesn't say anything (both in NEP and the PR) about how the regular users of NumPy will benefit from this. If only and only 3rd parties are going to benefit from it, I am not sure adding a new keyword to an already confusing function is the right thing to do.
>
> Let me clarify,
>
> - This is already a very (I mean extremely very) easy keyword name to confuse with ones_like, zeros_like and by its nature any other interpretation. It is not signalling anything about the functionality that is being discussed. I would seriously consider reserving such obvious names for really obvious tasks. Because you would also expect the shape and ndim would be mimicked by the "like"d argument but it turns out it is acting more like "typeof=" and not "like=" at all. Because if we follow the semantics it reads as "make your argument asarray like the other thing" but it is actually doing, "make your argument an array with the other thing's type" which might not be an array after all.
>
> - Again, if this is meant for downstream libraries (because that's what I got out of the PR discussion, cupy, dask, and JAX were the only examples I could read) then hiding it in another function and writing with capital letters "this is not meant for numpy users" would be a much more convenient way to separate the target audience and regular users. numpy.astypedarray([[some data], [...]], type_of=x) or whatever else it may be would be quite clean and to the point with no ambiguous keywords.
>
> I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
>
> Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Experimental `like=` attribute for array creation functions

Peter Andreas Entschev
As per discussed, I've opened a PR
https://github.com/numpy/numpy/pull/17093 attempting to clarify some
of the writing and to follow the NEP Template. As suggested in the
template, please find below the top part of NEP-35 (up to and
including the Backward Compatibility section). Please feel free to
comment and suggest improvements or point out what may still be
unclear, personally I would prefer comments directly on the PR if
possible.

===========================================================
NEP 35 — Array Creation Dispatching With __array_function__
===========================================================

:Author: Peter Andreas Entschev <[hidden email]>
:Status: Draft
:Type: Standards Track
:Created: 2019-10-15
:Updated: 2020-08-17
:Resolution:

Abstract
--------

We propose the introduction of a new keyword argument ``like=`` to all array
creation functions, this argument permits the creation of an array based on
a non-NumPy reference array passed via that argument, resulting in an array
defined by the downstream library implementing that type, which also implements
the ``__array_function__`` protocol. With this we address one of that
protocol's shortcomings, as described by NEP 18 [1]_.

Motivation and Scope
--------------------

Many are the libraries implementing the NumPy API, such as Dask for graph
computing, CuPy for GPGPU computing, xarray for N-D labeled arrays, etc. All
the libraries mentioned have yet another thing in common: they have also adopted
the ``__array_function__`` protocol. The protocol defines a mechanism allowing a
user to directly use the NumPy API as a dispatcher based on the input array
type. In essence, dispatching means users are able to pass a downstream array,
such as a Dask array, directly to one of NumPy's compute functions, and NumPy
will be able to automatically recognize that and send the work back to Dask's
implementation of that function, which will define the return value. For
example:

.. code:: python

    x = dask.array.arange(5)    # Creates dask.array
    np.sum(a)                   # Returns dask.array

Note above how we called Dask's implementation of ``sum`` via the NumPy
namespace by calling ``np.sum``, and the same would apply if we had a CuPy
array or any other array from a library that adopts ``__array_function__``.
This allows writing code that is agnostic to the implementation library, thus
users can write their code once and still be able to use different array
implementations according to their needs.

Unfortunately, ``__array_function__`` has limitations, one of them being array
creation functions. In the example above, NumPy was able to call Dask's
implementation because the input array was a Dask array. The same is not true
for array creation functions, in the example the input of ``arange`` is simply
the integer ``5``, not providing any information of the array type that should
be the result, that's where a reference array passed by the ``like=`` argument
proposed here can be of help, as it provides NumPy with the information
required to create the expected type of array.

The new ``like=`` keyword proposed is solely intended to identify the downstream
library where to dispatch and the object is used only as reference, meaning that
no modifications, copies or processing will be performed on that object.

We expect that this functionality will be mostly useful to library developers,
allowing them to create new arrays for internal usage based on arrays passed
by the user, preventing unnecessary creation of NumPy arrays that will
ultimately lead to an additional conversion into a downstream array type.

Support for Python 2.7 has been dropped since NumPy 1.17, therefore we make use
of the keyword-only argument standard described in PEP-3102 [2]_ to implement
``like=``, thus preventing it from being passed by position.

.. _neps.like-kwarg.usage-and-impact:

Usage and Impact
----------------

To understand the intended use for ``like=``, and before we move to more complex
cases, consider the following illustrative example consisting only of NumPy and
CuPy arrays:

.. code:: python

    import numpy as np
    import cupy

    def my_pad(arr, padding):
        padding = np.array(padding, like=arr)
        return np.concatenate((padding, arr, padding))

    my_pad(np.arange(5), [-1, -1])    # Returns np.ndarray
    my_pad(cupy.arange(5), [-1, -1])  # Returns cupy.core.core.ndarray

Note in the ``my_pad`` function above how ``arr`` is used as a reference to
dictate what array type padding should have, before concatenating the arrays to
produce the result. On the other hand, if ``like=`` wasn't used, the NumPy case
case would still work, but CuPy wouldn't allow this kind of automatic
conversion, ultimately raising a
``TypeError: Only cupy arrays can be concatenated`` exception.

Now we should look at how a library like Dask could benefit from ``like=``.
Before we understand that, it's important to understand a bit about Dask basics
and ensures correctness with ``__array_function__``. Note that Dask can compute
different sorts of objects, like dataframes, bags and arrays, here we will focus
strictly on arrays, which are the objects we can use ``__array_function__``
with.

Dask uses a graph computing model, meaning it breaks down a large problem in
many smaller problems and merge their results to reach the final result. To
break the problem down into smaller ones, Dask also breaks arrays into smaller
arrays, that it calls "chunks". A Dask array can thus consist of one or more
chunks and they may be of different types. However, in the context of
``__array_function__``, Dask only allows chunks of the same type, for example,
a Dask array can be formed of several NumPy arrays or several CuPy arrays, but
not a mix of both.

To avoid mismatched types during compute, Dask keeps an attribute ``_meta`` as
part of its array throughout computation, this attribute is used to both predict
the output type at graph creation time and to create any intermediary arrays
that are necessary within some function's computation. Going back to our
previous example, we can use ``_meta`` information to identify what kind of
array we would use for padding, as seen below:

.. code:: python

    import numpy as np
    import cupy
    import dask.array as da
    from dask.array.utils import meta_from_array

    def my_pad(arr, padding):
        padding = np.array(padding, like=meta_from_array(arr))
        return np.concatenate((padding, arr, padding))

    # Returns dask.array<concatenate, shape=(9,), dtype=int64,
chunksize=(5,), chunktype=numpy.ndarray>
    my_pad(da.arange(5), [-1, -1])

    # Returns dask.array<concatenate, shape=(9,), dtype=int64,
chunksize=(5,), chunktype=cupy.ndarray>
    my_pad(da.from_array(cupy.arange(5)), [-1, -1])

Note how ``chunktype`` in the return value above changes from
``numpy.ndarray`` in the first ``my_pad`` call to ``cupy.ndarray`` in the
second.

To enable proper identification of the array type we use Dask's utility function
``meta_from_array``, which was introduced as part of the work to support
``__array_function__``, allowing Dask to handle ``_meta`` appropriately. That
function is primarily targeted at the library's internal usage to ensure chunks
are created with correct types. Without the ``like=`` argument, it would be
impossible to ensure ``my_pad`` creates a padding array with a type matching
that of the input array, which would cause cause a ``TypeError`` exception to
be raised by CuPy, as discussed above would happen to the CuPy case alone.

Backward Compatibility
----------------------

This proposal does not raise any backward compatibility issues within NumPy,
given that it only introduces a new keyword argument to existing array creation
functions with a default ``None`` value, thus not changing current behavior.

On Sun, Aug 16, 2020 at 1:41 PM Ralf Gommers <[hidden email]> wrote:

>
>
>
> On Fri, Aug 14, 2020 at 12:23 PM Peter Andreas Entschev <[hidden email]> wrote:
>>
>> Hi all,
>>
>> This thread has IMO drifted very far from its original purpose, due to that I decided to start a new thread specifically for the general NEP procedure discussed, please check your mail for "NEP Procedure Discussion" subject.
>
>
> Thanks Peter. For future reference: better to just edit the thread subject, but not start over completely - people want to reply to previous content. I will copy over comments I'd like to reply to to the other thread by hand now.
>
>>
>> On the topic of this thread, I'll try to rewrite NEP-35 to make it more accessible and ping back here once I have a PR for that.
>
>
> Thanks!
>
> Cheers,
> Ralf
>
>> Is there anything else that's pressing here? If there is and I missed/forgot about it, please let me know.
>>
>> Best,
>> Peter
>>
>> On Fri, Aug 14, 2020 at 5:00 AM Juan Nunez-Iglesias <[hidden email]> wrote:
>>>
>>> Hello everyone again!
>>>
>>> A few clarifications about my proposal of external peer review:
>>>
>>> - Yes, all this work is public and announced on the mailing list. However, I don’t think there’s a single person in this discussion or even this whole ecosystem that does not have a more immediately-pressing and also virtually infinite to-do list, so it’s unreasonable to expect that generally they would do more than glance at the stuff in the mailing list. In the peer review analogy, the mailing list is like the arXiv or Biorxiv stream — yep, anyone can see the stuff on there and comment, but most people just don’t have the time or attention to grab onto that. The only reason I stopped to comment here is Sebastian’s “Imma merge, YOLO!”, which had me raising my eyebrows real high. Especially for something that would expand the NumPy API!
>>>
>>> - So, my proposal is that there needs to be an *editor* of NEPs who takes responsibility, once they are themselves satisfied with the NEP, for seeking out external reviewers and pinging them individually and asking them if they would be ok to review.
>>>
>>> - A good friend who does screenwriting once told me, “don’t use all your proofreaders at once”. You want to get feedback, improve things, then feedback from a *totally independent* new person who can see the document with fresh eyes.
>>>
>>> Obviously, all of the above slows things down. But “alone we go fast, together we go far”. The point of a NEP is to document critical decisions for the long term health of the project. If the documentation is insufficient, it defeats the whole purpose. Might as well just implement stuff and skip the whole NEP process. (Side note: Stephan, I for one would definitely appreciate an update to existing NEPs if there’s obvious ways they can be improved!)
>>>
>>> I do think that NEP templates should be strict, and I don’t think that is incompatible with plain, jargon-free text. The NEP template and guidelines should specify that, and that the motivation should be understandable by a casual NumPy user — the kind described by Ilhan, for whom bare NumPy actually meets all their needs. Maybe they’ve also used PyTorch but they’ve never really had cause to mix them or write a program that worked with both kinds of arrays.
>>>
>>> Ditto for backwards compatibility — everyone should be clear when their existing code is going to be broken. Actually NEP18 broke so much of my code, but its Backward compatibility section basically says all good! https://numpy.org/neps/nep-0018-array-function-protocol.html#backward-compatibility
>>>
>>> Anywho, as always, none of this is criticism to work done — I thank you all, and am eternally grateful for all the hard work everyone is doing to keep the ecosystem from fragmenting. I’m just hoping that this discussion can improve the process going forward!
>>>
>>> And, yes, apologies to Peter, I know from repeated personal experience how frustrating it can be to have last-minute drive-by objections after months of consensus building! But I think in the end every time that happened the end result was better — I hope the same is true here! And yes, I’ll reiterate Ralf’s point: my concerns are about the NEP process itself rather than this one. I’ll summarise my proposal:
>>>
>>> - strict NEP template. NEPs with missing sections will not be accepted.
>>> - sections Abstract, Motivation, and Backwards Compatibility should be understandable at a high level by casual users with ~zero background on the topic
>>> - enforce the above with at least two independent rounds of coordinated peer review.
>>>
>>> Thank you,
>>>
>>> Juan.
>>>
>>> On 14 Aug 2020, at 5:29 am, Stephan Hoyer <[hidden email]> wrote:
>>>
>>> On Thu, Aug 13, 2020 at 5:22 AM Ralf Gommers <[hidden email]> wrote:
>>>>
>>>> Thanks for raising these concerns Ilhan and Juan, and for answering Peter. Let me give my perspective as well.
>>>>
>>>> To start with, this is not specifically about Peter's NEP and PR. NEP 35 simply follows the pattern set by previous PRs, and given its tight scope is less difficult to understand than other NEPs on such technical topics. Peter has done a lot of things right, and is close to the finish line.
>>>>
>>>>
>>>> On Thu, Aug 13, 2020 at 12:02 PM Peter Andreas Entschev <[hidden email]> wrote:
>>>>>
>>>>>
>>>>> > I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
>>>>>
>>>>> This is what I intended to do in the Usage Guidance [2] section. Could
>>>>> you elaborate on what more information you'd want to see there? Or is
>>>>> it just a matter of reorganizing the NEP a bit to try and summarize
>>>>> such things right at the top?
>>>>
>>>>
>>>> We adapted the NEP template [6] several times last year to try and improve this. And specified in there as well that NEP content set to the mailing list should only contain the sections: Abstract, Motivation and Scope, Usage and Impact, and Backwards compatibility. This to ensure we fully understand the "why" and "what" before the "how". Unfortunately that template and procedure hasn't been exercised much yet, only in NEP 38 [7] and partially in NEP 41 [8].
>>>>
>>>> If we have long-time maintainers of SciPy (Ilhan and myself), scikit-image (Juan) and CuPy (Leo, on the PR review) all saying they don't understand the goals, relevance, target audience, or how they're supposed to use a new feature, that indicates that the people doing the writing and having the discussion are doing something wrong at a very fundamental level.
>>>>
>>>> At this point I'm pretty disappointed in and tired of how we write and discuss NEPs on technical topics like dispatching, dtypes and the like. People literally refuse to write down concrete motivations, goals and non-goals, code that's problematic now and will be better/working post-NEP and usage examples before launching into extensive discussion of the gory details of the internals. I'm not sure what to do about it. Completely separate API and behavior proposals from implementation proposals? Make separate "API" and "internals" teams with the likes of Juan, Ilhan and Leo on the API team which then needs to approve every API change in new NEPs? Offer to co-write NEPs if someone is willing but doesn't understand how to go about it? Keep the current structure/process but veto further approvals until NEP authors get it right?
>>>
>>>
>>> I think the NEP template is great, and we should try to be more diligent about following it!
>>>
>>> My own NEP 37 (__array_module__) is probably a good example of poor presentation due to not following the template structure. It goes pretty deep into low-level motivation and some implementation details before usage examples.
>>>
>>> Speaking just for myself, I would have appreciated a friendly nudge to use the template. Certainly I think it would be fine to require using the template for newly submitted NEPs. I did not remember about it when I started drafting NEP 37, and it definitely would have helped. I may still try to do a revision at some point to use the template structure.
>>>
>>>>
>>>> I want to make an exception for merging the current NEP, for which the plan is to merge it as experimental to try in downstream PRs and get more experience. That does mean that master will be in an unreleasable state by the way, which is unusual and it'd be nice to get Chuck's explicit OK for that. But after that, I think we need a change here. I would like to hear what everyone thinks is the shape that change should take - any of my above suggestions, or something else?
>>>>
>>>>
>>>>>
>>>>> > Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
>>>>>
>>>>> TBH, I don't really know how to solve that point, so if you have any
>>>>> specific suggestions, that's certainly welcome. I understand the
>>>>> frustration for a reader trying to understand all the details, with
>>>>> many being only described in NEP-18 [3], but we also strive to avoid
>>>>> rewriting things that are written elsewhere, which would also
>>>>> overburden those who are aware of what's being discussed.
>>>>>
>>>>>
>>>>> > I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
>>>>
>>>>
>>>> Some variant of this proposal would be my preference.
>>>>
>>>> Cheers,
>>>> Ralf
>>>>
>>>>>
>>>>> [1] https://github.com/numpy/numpy/issues/14441#issuecomment-529969572
>>>>> [2] https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function.html#usage-guidance
>>>>> [3] https://numpy.org/neps/nep-0018-array-function-protocol.html
>>>>> [4] https://numpy.org/neps/nep-0000.html#nep-workflow
>>>>> [5] https://mail.python.org/pipermail/numpy-discussion/2019-October/080176.html
>>>>
>>>>
>>>> [6] https://github.com/numpy/numpy/blob/master/doc/neps/nep-template.rst
>>>> [7] https://github.com/numpy/numpy/blob/master/doc/neps/nep-0038-SIMD-optimizations.rst
>>>> [8] https://github.com/numpy/numpy/blob/master/doc/neps/nep-0041-improved-dtype-support.rst
>>>>
>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Aug 13, 2020 at 3:44 AM Juan Nunez-Iglesias <[hidden email]> wrote:
>>>>> >
>>>>> > I’ve generally been on the “let the NumPy devs worry about it” side of things, but I do agree with Ilhan that `like=` is confusing and `typeof=` would be a much more appropriate name for that parameter.
>>>>> >
>>>>> > I do think library writers are NumPy users and so I wouldn’t really make that distinction, though. Users writing their own analysis code could very well be interested in writing code using numpy functions that will transparently work when the input is a CuPy array or whatever.
>>>>> >
>>>>> > I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
>>>>> >
>>>>> > Food for thought.
>>>>> >
>>>>> > Juan.
>>>>> >
>>>>> > On 13 Aug 2020, at 9:24 am, Ilhan Polat <[hidden email]> wrote:
>>>>> >
>>>>> > For what is worth, as a potential consumer in SciPy, it really doesn't say anything (both in NEP and the PR) about how the regular users of NumPy will benefit from this. If only and only 3rd parties are going to benefit from it, I am not sure adding a new keyword to an already confusing function is the right thing to do.
>>>>> >
>>>>> > Let me clarify,
>>>>> >
>>>>> > - This is already a very (I mean extremely very) easy keyword name to confuse with ones_like, zeros_like and by its nature any other interpretation. It is not signalling anything about the functionality that is being discussed. I would seriously consider reserving such obvious names for really obvious tasks. Because you would also expect the shape and ndim would be mimicked by the "like"d argument but it turns out it is acting more like "typeof=" and not "like=" at all. Because if we follow the semantics it reads as "make your argument asarray like the other thing" but it is actually doing, "make your argument an array with the other thing's type" which might not be an array after all.
>>>>> >
>>>>> > - Again, if this is meant for downstream libraries (because that's what I got out of the PR discussion, cupy, dask, and JAX were the only examples I could read) then hiding it in another function and writing with capital letters "this is not meant for numpy users" would be a much more convenient way to separate the target audience and regular users. numpy.astypedarray([[some data], [...]], type_of=x) or whatever else it may be would be quite clean and to the point with no ambiguous keywords.
>>>>> >
>>>>> > I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
>>>>> >
>>>>> > Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
>>>>
>>>> _______________________________________________
>>>> NumPy-Discussion mailing list
>>>> [hidden email]
>>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> [hidden email]
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> [hidden email]
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> [hidden email]
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion