defining a NumPy API standard?

classic Classic list List threaded Threaded
40 messages Options
12
Reply | Threaded
Open this post in threaded view
|

defining a NumPy API standard?

ralfgommers
Hi all,

I have an idea that I've discussed with a few people in person, and the feedback has generally been positive. So I'd like to bring it up here, to get a sense of if this is going to fly. Note that this is NOT a proposal at this point.

Idea in five words: define a NumPy API standard

Observations
------------
- Many libraries, both in Python and other languages, have APIs copied from or inspired by NumPy.
- All of those APIs are incomplete, and many deviate from NumPy either by accident or on purpose.
- The NumPy API is very large and ill-defined.

Libraries with a NumPy-like API
-------------------------------
In Python:
- GPU: Tensorflow, PyTorch, CuPy, MXNet
- distributed: Dask
- sparse: pydata/sparse
- other: tensorly, uarray/unumpy, ...

In other languages:
- JavaScript: numjs
- Go: Gonum
- Rust: rust-ndarray, rust-numpy
- C++: xtensor
- C: XND
- Java: ND4J
- C#: NumSharp, numpy.net
- Ruby: Narray, xnd-ruby
- R: Rray

This is an incomplete list. Xtensor and XND aim for multi-language support. These libraries are of varying completeness, size and quality - everything from one-person efforts that have just started, to large code bases that go beyond NumPy in features or performance.

Idea
----
Define a standard for "the NumPy API" (or "NumPy core API", or .... - it's just a name for now), that
other libraries can use as a guide on what to implement and when to say they are NumPy compatible.

In scope:
- Define a NumPy API standard, containing an N-dimensional array object and a set of functions.
- List of functions and ndarray methods to include.
- Recommendations about where to deviate from NumPy (e.g. leave out array scalars)

Out of scope, or to be treated separately:
- dtypes and casting
- (g)ufuncs
- function behavior (e.g. returning views vs. copies, which keyword arguments to include)
- indexing behavior
- submodules (fft, random, linalg)

Who cares and why?
- Library authors: this saves them work and helps them make decisions.
- End users: consistency between libraries/languages, helps transfer knowledge and understand code
- NumPy developers: gives them a vocabulary for "the NumPy API", "compatible with NumPy", etc.

Risks:
- If not done well, we just add to the confusion rather than make things better.
- Opportunity for endless amount of bikeshedding
- ?

Some more rationale:
We (NumPy devs) mostly have a shared understanding of what is "core NumPy functionality", what we'd like to remove but are stuck with, what's not used a whole lot, etc. Examples: financial functions don't belong, array creation methods with weird names like np.r_ were a mistake. We are not communicating this in any way though. Doing so would be helpful. Perhaps this API standard could even have layers, to indicate what's really core, what are secondary sets of functionality to include in other libraries, etc.

Discussion and next steps
-------------------------
What I'd like to get a sense of is:
- Is this a good idea to begin with?
- What should the scope be?
- What should the format be (a NEP, some other doc, defining in code)?

If this idea is well-received, I can try to draft a proposal during the next month (help/volunteers welcome!). It can then be discussed at SciPy'19 - high-bandwidth communication may help to get a set of people on the same page and hash out a lot of details.

Thoughts?

Ralf


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: defining a NumPy API standard?

Nathaniel Smith
It's possible I'm not getting what you're thinking, but from what you
describe in your email I think it's a bad idea.

Standards take a tremendous amount of work (no really, an absurdly
massively huge amount of work, more than you can imagine if you
haven't done it). And they don't do what people usually hope they do.
Many many standards are written all the time that have zero effect on
reality, and the effort is wasted. They're really only useful when you
have to solve a coordination problem: lots of people want to do the
same thing as each other, whatever that is, but no-one knows what the
thing should be. That's not a problem at all for us, because numpy
already exists.

If you want to improve compatibility between Python libraries, then I
don't think it will be relevant. Users aren't writing code against
"the numpy standard", they're not testing their libraries against "the
numpy standard", they're using/testing against numpy. If library
authors want to be compatible with numpy, they need to match what
numpy does, not what some document says. OTOH if they think they have
a better idea and its worth breaking compatibility, they're going to
do it regardless of what some document somewhere says.

If you want to share the lessons learned from numpy in the hopes of
improving future libraries that don't care about numpy compatibility
per se, in python or other languages, then that seems like a great
idea! But that's not a standard, that's a journal article called
something like "NumPy: A retrospective". Other languages aren't going
to match numpy one-to-one anyway, because they'll be adapting things
to their language's idioms; they certainly don't care about whether
you decided 'newaxis MUST be defined to None' or merely 'SHOULD' be
defined to None.

IMO if you try the most likely outcome will be that it will suck up a
lot of energy writing it, and then the only effect is that everyone
will keep doing what they would have done anyway but now with extra
self-righteousness and yelling depending on whether that turns out to
match the standard or not.

-n

On Sat, Jun 1, 2019 at 1:18 AM Ralf Gommers <[hidden email]> wrote:

>
> Hi all,
>
> I have an idea that I've discussed with a few people in person, and the feedback has generally been positive. So I'd like to bring it up here, to get a sense of if this is going to fly. Note that this is NOT a proposal at this point.
>
> Idea in five words: define a NumPy API standard
>
> Observations
> ------------
> - Many libraries, both in Python and other languages, have APIs copied from or inspired by NumPy.
> - All of those APIs are incomplete, and many deviate from NumPy either by accident or on purpose.
> - The NumPy API is very large and ill-defined.
>
> Libraries with a NumPy-like API
> -------------------------------
> In Python:
> - GPU: Tensorflow, PyTorch, CuPy, MXNet
> - distributed: Dask
> - sparse: pydata/sparse
> - other: tensorly, uarray/unumpy, ...
>
> In other languages:
> - JavaScript: numjs
> - Go: Gonum
> - Rust: rust-ndarray, rust-numpy
> - C++: xtensor
> - C: XND
> - Java: ND4J
> - C#: NumSharp, numpy.net
> - Ruby: Narray, xnd-ruby
> - R: Rray
>
> This is an incomplete list. Xtensor and XND aim for multi-language support. These libraries are of varying completeness, size and quality - everything from one-person efforts that have just started, to large code bases that go beyond NumPy in features or performance.
>
> Idea
> ----
> Define a standard for "the NumPy API" (or "NumPy core API", or .... - it's just a name for now), that
> other libraries can use as a guide on what to implement and when to say they are NumPy compatible.
>
> In scope:
> - Define a NumPy API standard, containing an N-dimensional array object and a set of functions.
> - List of functions and ndarray methods to include.
> - Recommendations about where to deviate from NumPy (e.g. leave out array scalars)
>
> Out of scope, or to be treated separately:
> - dtypes and casting
> - (g)ufuncs
> - function behavior (e.g. returning views vs. copies, which keyword arguments to include)
> - indexing behavior
> - submodules (fft, random, linalg)
>
> Who cares and why?
> - Library authors: this saves them work and helps them make decisions.
> - End users: consistency between libraries/languages, helps transfer knowledge and understand code
> - NumPy developers: gives them a vocabulary for "the NumPy API", "compatible with NumPy", etc.
>
> Risks:
> - If not done well, we just add to the confusion rather than make things better.
> - Opportunity for endless amount of bikeshedding
> - ?
>
> Some more rationale:
> We (NumPy devs) mostly have a shared understanding of what is "core NumPy functionality", what we'd like to remove but are stuck with, what's not used a whole lot, etc. Examples: financial functions don't belong, array creation methods with weird names like np.r_ were a mistake. We are not communicating this in any way though. Doing so would be helpful. Perhaps this API standard could even have layers, to indicate what's really core, what are secondary sets of functionality to include in other libraries, etc.
>
> Discussion and next steps
> -------------------------
> What I'd like to get a sense of is:
> - Is this a good idea to begin with?
> - What should the scope be?
> - What should the format be (a NEP, some other doc, defining in code)?
>
> If this idea is well-received, I can try to draft a proposal during the next month (help/volunteers welcome!). It can then be discussed at SciPy'19 - high-bandwidth communication may help to get a set of people on the same page and hash out a lot of details.
>
> Thoughts?
>
> Ralf
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion



--
Nathaniel J. Smith -- https://vorpus.org
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

defining a NumPy API standard?

Hameer Abbasi
I think this hits the crux of the issue... There is a huge coordination problem. Users want to move their code from NumPy to Sparse or Dask all the time, but it’s not trivial to do. And libraries like sparse and Dask want to follow a standard (or at least hoped there was one) before they existed.

Maybe I think the issue is bigger than it really is, but there’s definitely a coordination problem.

See the section in the original email on “who cares and why”...

Best Regards,
Hameer Abbasi

On Saturday, Jun 01, 2019 at 11:32 AM, Nathaniel Smith <[hidden email]> wrote:
[snip]

That's not a problem at all for us, because numpy
already exists.

[snip]

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: defining a NumPy API standard?

William Ray Wing
In reply to this post by ralfgommers


On Jun 1, 2019, at 4:17 AM, Ralf Gommers <[hidden email]> wrote:

Hi all,

I have an idea that I've discussed with a few people in person, and the feedback has generally been positive. So I'd like to bring it up here, to get a sense of if this is going to fly. Note that this is NOT a proposal at this point.

Idea in five words: define a NumPy API standard


As an amateur user of Numpy (hobby programming), and at the opposite end of the spectrum from the Numpy development team, I’d like to raise my hand and applaud this idea.  I think it would make my use of Numpy significantly easier if an API not only specified the basic API structure, but also regularized it to the extent possible.

Thanks,
Bill Wing

Observations
------------
- Many libraries, both in Python and other languages, have APIs copied from or inspired by NumPy.
- All of those APIs are incomplete, and many deviate from NumPy either by accident or on purpose.
- The NumPy API is very large and ill-defined.

Libraries with a NumPy-like API
-------------------------------
In Python:
- GPU: Tensorflow, PyTorch, CuPy, MXNet
- distributed: Dask
- sparse: pydata/sparse
- other: tensorly, uarray/unumpy, ...

In other languages:
- JavaScript: numjs
- Go: Gonum
- Rust: rust-ndarray, rust-numpy
- C++: xtensor
- C: XND
- Java: ND4J
- C#: NumSharp, numpy.net
- Ruby: Narray, xnd-ruby
- R: Rray

This is an incomplete list. Xtensor and XND aim for multi-language support. These libraries are of varying completeness, size and quality - everything from one-person efforts that have just started, to large code bases that go beyond NumPy in features or performance.

Idea
----
Define a standard for "the NumPy API" (or "NumPy core API", or .... - it's just a name for now), that
other libraries can use as a guide on what to implement and when to say they are NumPy compatible.

In scope:
- Define a NumPy API standard, containing an N-dimensional array object and a set of functions.
- List of functions and ndarray methods to include.
- Recommendations about where to deviate from NumPy (e.g. leave out array scalars)

Out of scope, or to be treated separately:
- dtypes and casting
- (g)ufuncs
- function behavior (e.g. returning views vs. copies, which keyword arguments to include)
- indexing behavior
- submodules (fft, random, linalg)

Who cares and why?
- Library authors: this saves them work and helps them make decisions.
- End users: consistency between libraries/languages, helps transfer knowledge and understand code
- NumPy developers: gives them a vocabulary for "the NumPy API", "compatible with NumPy", etc.

Risks:
- If not done well, we just add to the confusion rather than make things better.
- Opportunity for endless amount of bikeshedding
- ?

Some more rationale:
We (NumPy devs) mostly have a shared understanding of what is "core NumPy functionality", what we'd like to remove but are stuck with, what's not used a whole lot, etc. Examples: financial functions don't belong, array creation methods with weird names like np.r_ were a mistake. We are not communicating this in any way though. Doing so would be helpful. Perhaps this API standard could even have layers, to indicate what's really core, what are secondary sets of functionality to include in other libraries, etc.

Discussion and next steps
-------------------------
What I'd like to get a sense of is:
- Is this a good idea to begin with?
- What should the scope be?
- What should the format be (a NEP, some other doc, defining in code)?

If this idea is well-received, I can try to draft a proposal during the next month (help/volunteers welcome!). It can then be discussed at SciPy'19 - high-bandwidth communication may help to get a set of people on the same page and hash out a lot of details.

Thoughts?

Ralf

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: defining a NumPy API standard?

Marten van Kerkwijk
Hi Ralf,

Despite sharing Nathaniel's doubts about the ease of defining the numpy API and the likelihood of people actually sticking to a limited subset of what numpy exposes, I quite like the actual things you propose to do!

But my liking it is for reasons that are different from your stated ones: I think the proposed actions are likely to benefit greatly  both for users (like Bill above) and current and prospective developers.  To me, it seems almost as a side benefit (if a very nice one) that it might help other projects to share an API; a larger benefit may come from tapping into the experience of other projects in thinking about what are the true  basic functions/method that one should have.

More concretely, to address Nathaniel's (very reasonable) worry about ending up wasting a lot of time, I think it may be good to identify smaller parts, each of which are useful on their own.

In this respect, I think an excellent place to start might be something you are planning already anyway: update the user documentation. Doing this will necessarily require thinking about, e.g., what `ndarray` methods and properties are actually fundamental, as you only want to focus on a few. With that in place, one could then, as you suggest, reorganize the reference documentation to put those most important properties up front, and ones that we really think are mistakes at the bottom, with explanations of why we think so and what the alternative is. Also for the reference documentation, it would help to group functions more logically.

The above could lead to three next steps, all of which I think would be useful. First, for (prospective) developers as well as for future maintenance, I think it would be quite a large benefit if we (slowly but surely) rewrote code that implements the less basic functionality in terms of more basic functions (e.g., replace use of `array.fill(...)` or `np.copyto(array, ...)` with `array[...] =`).

Second, we could update Nathaniel's NEP about distinct properties duck arrays might want to mimic/implement.

Third, we could actual implementing the logical groupings identified in the code base (and describing them!). Currently, it is a mess: for the C files, I typically have to grep to even find where things are done, and while for the functions defined in python files that is not necessary, many have historical rather than logical groupings (looking at you, `from_numeric`!), and even more descriptive ones like `shape_base` are split over `lib` and `core`. I think it would help everybody if we went to a python-like layout, with a true core and libraries such as polynomial, fft, ma, etc.

Anyway, re-reading your message, I realize the above is not really what you wrote about, so perhaps this is irrelevant...

All the best,

Marten



_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: defining a NumPy API standard?

ralfgommers
In reply to this post by Nathaniel Smith


On Sat, Jun 1, 2019 at 11:32 AM Nathaniel Smith <[hidden email]> wrote:
It's possible I'm not getting what you're thinking, but from what you
describe in your email I think it's a bad idea.

Hi Nathaniel, I think you are indeed not getting what I meant and are just responding to the word "standard".

I'll give a concrete example. Here is the xtensor to numpy comparison: https://xtensor.readthedocs.io/en/latest/numpy.html. The xtensor authors clearly have made sane choices, but they did have to spend a lot of effort making those choices - what to include and what not.

Now, the XND team is just starting to build out their Python API. Hameer is building out unumpy. There's all the other arrays libraries I mentioned. We can say "sort it out yourself, make your own choices", or we can provide some guidance. So far the authors of those libaries I have asked say they would appreciate the guidance.

Cheers,
Ralf



Standards take a tremendous amount of work (no really, an absurdly
massively huge amount of work, more than you can imagine if you
haven't done it). And they don't do what people usually hope they do.
Many many standards are written all the time that have zero effect on
reality, and the effort is wasted. They're really only useful when you
have to solve a coordination problem: lots of people want to do the
same thing as each other, whatever that is, but no-one knows what the
thing should be. That's not a problem at all for us, because numpy
already exists.

If you want to improve compatibility between Python libraries, then I
don't think it will be relevant. Users aren't writing code against
"the numpy standard", they're not testing their libraries against "the
numpy standard", they're using/testing against numpy. If library
authors want to be compatible with numpy, they need to match what
numpy does, not what some document says. OTOH if they think they have
a better idea and its worth breaking compatibility, they're going to
do it regardless of what some document somewhere says.

If you want to share the lessons learned from numpy in the hopes of
improving future libraries that don't care about numpy compatibility
per se, in python or other languages, then that seems like a great
idea! But that's not a standard, that's a journal article called
something like "NumPy: A retrospective". Other languages aren't going
to match numpy one-to-one anyway, because they'll be adapting things
to their language's idioms; they certainly don't care about whether
you decided 'newaxis MUST be defined to None' or merely 'SHOULD' be
defined to None.

IMO if you try the most likely outcome will be that it will suck up a
lot of energy writing it, and then the only effect is that everyone
will keep doing what they would have done anyway but now with extra
self-righteousness and yelling depending on whether that turns out to
match the standard or not.

-n

On Sat, Jun 1, 2019 at 1:18 AM Ralf Gommers <[hidden email]> wrote:
>
> Hi all,
>
> I have an idea that I've discussed with a few people in person, and the feedback has generally been positive. So I'd like to bring it up here, to get a sense of if this is going to fly. Note that this is NOT a proposal at this point.
>
> Idea in five words: define a NumPy API standard
>
> Observations
> ------------
> - Many libraries, both in Python and other languages, have APIs copied from or inspired by NumPy.
> - All of those APIs are incomplete, and many deviate from NumPy either by accident or on purpose.
> - The NumPy API is very large and ill-defined.
>
> Libraries with a NumPy-like API
> -------------------------------
> In Python:
> - GPU: Tensorflow, PyTorch, CuPy, MXNet
> - distributed: Dask
> - sparse: pydata/sparse
> - other: tensorly, uarray/unumpy, ...
>
> In other languages:
> - JavaScript: numjs
> - Go: Gonum
> - Rust: rust-ndarray, rust-numpy
> - C++: xtensor
> - C: XND
> - Java: ND4J
> - C#: NumSharp, numpy.net
> - Ruby: Narray, xnd-ruby
> - R: Rray
>
> This is an incomplete list. Xtensor and XND aim for multi-language support. These libraries are of varying completeness, size and quality - everything from one-person efforts that have just started, to large code bases that go beyond NumPy in features or performance.
>
> Idea
> ----
> Define a standard for "the NumPy API" (or "NumPy core API", or .... - it's just a name for now), that
> other libraries can use as a guide on what to implement and when to say they are NumPy compatible.
>
> In scope:
> - Define a NumPy API standard, containing an N-dimensional array object and a set of functions.
> - List of functions and ndarray methods to include.
> - Recommendations about where to deviate from NumPy (e.g. leave out array scalars)
>
> Out of scope, or to be treated separately:
> - dtypes and casting
> - (g)ufuncs
> - function behavior (e.g. returning views vs. copies, which keyword arguments to include)
> - indexing behavior
> - submodules (fft, random, linalg)
>
> Who cares and why?
> - Library authors: this saves them work and helps them make decisions.
> - End users: consistency between libraries/languages, helps transfer knowledge and understand code
> - NumPy developers: gives them a vocabulary for "the NumPy API", "compatible with NumPy", etc.
>
> Risks:
> - If not done well, we just add to the confusion rather than make things better.
> - Opportunity for endless amount of bikeshedding
> - ?
>
> Some more rationale:
> We (NumPy devs) mostly have a shared understanding of what is "core NumPy functionality", what we'd like to remove but are stuck with, what's not used a whole lot, etc. Examples: financial functions don't belong, array creation methods with weird names like np.r_ were a mistake. We are not communicating this in any way though. Doing so would be helpful. Perhaps this API standard could even have layers, to indicate what's really core, what are secondary sets of functionality to include in other libraries, etc.
>
> Discussion and next steps
> -------------------------
> What I'd like to get a sense of is:
> - Is this a good idea to begin with?
> - What should the scope be?
> - What should the format be (a NEP, some other doc, defining in code)?
>
> If this idea is well-received, I can try to draft a proposal during the next month (help/volunteers welcome!). It can then be discussed at SciPy'19 - high-bandwidth communication may help to get a set of people on the same page and hash out a lot of details.
>
> Thoughts?
>
> Ralf
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion



--
Nathaniel J. Smith -- https://vorpus.org
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: defining a NumPy API standard?

ralfgommers
In reply to this post by Marten van Kerkwijk


On Sat, Jun 1, 2019 at 6:12 PM Marten van Kerkwijk <[hidden email]> wrote:
Hi Ralf,

Despite sharing Nathaniel's doubts about the ease of defining the numpy API and the likelihood of people actually sticking to a limited subset of what numpy exposes, I quite like the actual things you propose to do!

But my liking it is for reasons that are different from your stated ones: I think the proposed actions are likely to benefit greatly  both for users (like Bill above) and current and prospective developers.  To me, it seems almost as a side benefit (if a very nice one) that it might help other projects to share an API; a larger benefit may come from tapping into the experience of other projects in thinking about what are the true  basic functions/method that one should have.

Agreed, there is some reverse learning there as well. Projects like Dask and Xtensor already went through making these choices, which can teach us as NumPy developers some lessons.


More concretely, to address Nathaniel's (very reasonable) worry about ending up wasting a lot of time, I think it may be good to identify smaller parts, each of which are useful on their own.

In this respect, I think an excellent place to start might be something you are planning already anyway: update the user documentation. Doing this will necessarily require thinking about, e.g., what `ndarray` methods and properties are actually fundamental, as you only want to focus on a few. With that in place, one could then, as you suggest, reorganize the reference documentation to put those most important properties up front, and ones that we really think are mistakes at the bottom, with explanations of why we think so and what the alternative is. Also for the reference documentation, it would help to group functions more logically.

That perhaps another rationale for doing this. The docs are likely to get a fairly major overhaul this year. If we don't write down a coherent plan then we're just going to make very similar decisions as when we'd write up a "standard", just ad hoc and with much less review.


The above could lead to three next steps, all of which I think would be useful. First, for (prospective) developers as well as for future maintenance, I think it would be quite a large benefit if we (slowly but surely) rewrote code that implements the less basic functionality in terms of more basic functions (e.g., replace use of `array.fill(...)` or `np.copyto(array, ...)` with `array[...] =`).

That could indeed be nice. I think Travis referred to this as defining an "RNumPy" (similar to RPython as a subset of Python).


Second, we could update Nathaniel's NEP about distinct properties duck arrays might want to mimic/implement.

I wasn't thinking about that indeed, but agreed that it could be helpful.


Third, we could actual implementing the logical groupings identified in the code base (and describing them!). Currently, it is a mess: for the C files, I typically have to grep to even find where things are done, and while for the functions defined in python files that is not necessary, many have historical rather than logical groupings (looking at you, `from_numeric`!), and even more descriptive ones like `shape_base` are split over `lib` and `core`. I think it would help everybody if we went to a python-like layout, with a true core and libraries such as polynomial, fft, ma, etc.

I'd really like this. Also to have sane namespace in numpy, and a basis for putting something in numpy.lib vs the main namespace vs some other namespace (there are a couple of semi-public ones).


Anyway, re-reading your message, I realize the above is not really what you wrote about, so perhaps this is irrelevant...

Not irrelevant, I think you're making some good points.

Cheers,
Ralf


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: defining a NumPy API standard?

Charles R Harris
In reply to this post by Marten van Kerkwijk


On Sat, Jun 1, 2019 at 10:12 AM Marten van Kerkwijk <[hidden email]> wrote:
Hi Ralf,

Despite sharing Nathaniel's doubts about the ease of defining the numpy API and the likelihood of people actually sticking to a limited subset of what numpy exposes, I quite like the actual things you propose to do!

But my liking it is for reasons that are different from your stated ones: I think the proposed actions are likely to benefit greatly  both for users (like Bill above) and current and prospective developers.  To me, it seems almost as a side benefit (if a very nice one) that it might help other projects to share an API; a larger benefit may come from tapping into the experience of other projects in thinking about what are the true  basic functions/method that one should have.

I generally agree with this. The most useful aspect of this exercise is likely to be clarifying NumPy for its own developers, and maybe offering a guide to future simplification. Trying to put something together that everyone agrees to as an official standard would be a big project and, as Nathaniel points out, would involve an enormous amount of work, much time, and doubtless many arguments.  What might be a less ambitious exercise would be identifying commonalities in the current numpy-like languages. That would have the advantage of feedback from actual user experience, and would be more like a lessons learned document that would be helpful to others.


More concretely, to address Nathaniel's (very reasonable) worry about ending up wasting a lot of time, I think it may be good to identify smaller parts, each of which are useful on their own.

In this respect, I think an excellent place to start might be something you are planning already anyway: update the user documentation. Doing this will necessarily require thinking about, e.g., what `ndarray` methods and properties are actually fundamental, as you only want to focus on a few. With that in place, one could then, as you suggest, reorganize the reference documentation to put those most important properties up front, and ones that we really think are mistakes at the bottom, with explanations of why we think so and what the alternative is. Also for the reference documentation, it would help to group functions more logically.

I keep thinking duck type. Or in this case, duck type lite. 


The above could lead to three next steps, all of which I think would be useful. First, for (prospective) developers as well as for future maintenance, I think it would be quite a large benefit if we (slowly but surely) rewrote code that implements the less basic functionality in terms of more basic functions (e.g., replace use of `array.fill(...)` or `np.copyto(array, ...)` with `array[...] =`).


I've had similar thoughts.
 
Second, we could update Nathaniel's NEP about distinct properties duck arrays might want to mimic/implement.


Yes.
 
Third, we could actual implementing the logical groupings identified in the code base (and describing them!). Currently, it is a mess: for the C files, I typically have to grep to even find where things are done, and while for the functions defined in python files that is not necessary, many have historical rather than logical groupings (looking at you, `from_numeric`!), and even more descriptive ones like `shape_base` are split over `lib` and `core`. I think it would help everybody if we went to a python-like layout, with a true core and libraries such as polynomial, fft, ma, etc.

Anyway, re-reading your message, I realize the above is not really what you wrote about, so perhaps this is irrelevant...


Chuck 

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: defining a NumPy API standard?

mattip
On 1/6/19 7:31 pm, Charles R Harris wrote:

> I generally agree with this. The most useful aspect of this exercise
> is likely to be clarifying NumPy for its own developers, and maybe
> offering a guide to future simplification. Trying to put something
> together that everyone agrees to as an official standard would be a
> big project and, as Nathaniel points out, would involve an enormous
> amount of work, much time, and doubtless many arguments.  What might
> be a less ambitious exercise would be identifying commonalities in the
> current numpy-like languages. That would have the advantage of
> feedback from actual user experience, and would be more like a lessons
> learned document that would be helpful to others.
>
>
>     More concretely, to address Nathaniel's (very reasonable) worry
>     about ending up wasting a lot of time, I think it may be good to
>     identify smaller parts, each of which are useful on their own.
>
>     In this respect, I think an excellent place to start might be
>     something you are planning already anyway: update the user
>     documentation
>

I would include tests as well. Rather than hammer out a full standard
based on extensive discussions and negotiations, I would suggest NumPy
might be able set a de-facto "standard" based on pieces of the the
current numpy user documentation and test suite. Then other projects
could use "passing the tests" as an indication that they implement the
NumPy API, and could refer to the documentation where appropriate. Once
we have a base repo under numpy with tests and documentations for the
generally accepted baseline interfaces. we can discuss on a case-by-case
basis via pull requests and issues whether other interfaces should be
included. If we find general classes of similarity that can be concisely
described but not all duckarray packages support (structured arrays, for
instance), these could become test-specifiers `@pytest.skipif(not
HAVE_STRUCTURED_ARRAYS)`, the tests and documentation would only apply
if that specifier exists.


Matti

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: defining a NumPy API standard?

Nathaniel Smith
In reply to this post by Hameer Abbasi
On Sat, Jun 1, 2019, 05:23 Hameer Abbasi <[hidden email]> wrote:
I think this hits the crux of the issue... There is a huge coordination problem. Users want to move their code from NumPy to Sparse or Dask all the time, but it’s not trivial to do. And libraries like sparse and Dask want to follow a standard (or at least hoped there was one) before they existed.

Those are big problems, but they aren't coordination problems :-)


If you and I both each have our own unrelated code that we want to move to Sparse, then we don't have to talk to each other and agree on how to do it.

-n

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: defining a NumPy API standard?

ralfgommers
In reply to this post by mattip


On Sat, Jun 1, 2019 at 8:46 PM Matti Picus <[hidden email]> wrote:
On 1/6/19 7:31 pm, Charles R Harris wrote:
> I generally agree with this. The most useful aspect of this exercise
> is likely to be clarifying NumPy for its own developers, and maybe
> offering a guide to future simplification. Trying to put something
> together that everyone agrees to as an official standard would be a
> big project and, as Nathaniel points out, would involve an enormous
> amount of work, much time, and doubtless many arguments.  What might
> be a less ambitious exercise would be identifying commonalities in the
> current numpy-like languages. That would have the advantage of
> feedback from actual user experience, and would be more like a lessons
> learned document that would be helpful to others.
>
>
>     More concretely, to address Nathaniel's (very reasonable) worry
>     about ending up wasting a lot of time, I think it may be good to
>     identify smaller parts, each of which are useful on their own.
>
>     In this respect, I think an excellent place to start might be
>     something you are planning already anyway: update the user
>     documentation
>

I would include tests as well. Rather than hammer out a full standard
based on extensive discussions and negotiations, I would suggest NumPy
might be able set a de-facto "standard" based on pieces of the the
current numpy user documentation and test suite.

I think this is potentially useful, but *far* more prescriptive and detailed than I had in mind. Both you and Nathaniel seem to have not understood what I mean by "out of scope", so I think that's my fault in not being explicit enough. I *do not* want to prescribe behavior. Instead, a simple yes/no for each function in numpy and method on ndarray.

Our API is huge. A simple count:
main namespace: 600
fft: 30
linalg: 30
random: 60
ndarray: 70
lib: 20
lib.npyio: 35
etc. (many more ill-thought out but not clearly private submodules)

Just the main namespace plus ndarray methods is close to 700 objects. If you want to build a NumPy-like thing, that's 700 decisions to make. I'm suggesting something as simple as a list of functions that constitute a sensible "core of NumPy". That list would not include anything in fft/linalg/random, since those can easily be separated out (indeed, if we could disappear fft and linalg and just rely on scipy, pyfftw etc., that would be great). It would not include financial functions. And so on. I guess we'd end up with most ndarray methods plus <150 functions.

That list could be used for many purposes: improve the docs, serve as the set of functions to implement in xnd.array, unumpy & co, Marten's suggestion of implementing other functions in terms of basic functions, etc.

Two other thoughts:
1. NumPy is not done. Our thinking on how to evolve the NumPy API is fairly muddled. When new functions are proposed, it's decided on on a case-by-case basis, usually without a guiding principle. We need to improve that. A "core of NumPy" list could be a part of that puzzle.
2. We often argue about deprecations. Deprecations are costly, but so is keeping around functions that are not very useful or have a poor design. This may offer a middle ground. Don't let others repeat our mistakes, signal to users that a function is of questionable value, without breaking already written code.

Cheers,
Ralf


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: defining a NumPy API standard?

Nathaniel Smith
In reply to this post by ralfgommers
On Sat, Jun 1, 2019, 09:13 Ralf Gommers <[hidden email]> wrote:


On Sat, Jun 1, 2019 at 11:32 AM Nathaniel Smith <[hidden email]> wrote:
It's possible I'm not getting what you're thinking, but from what you
describe in your email I think it's a bad idea.

Hi Nathaniel, I think you are indeed not getting what I meant and are just responding to the word "standard".

Well, that's the word you chose :-)

I think it's very possible that what you're thinking is a good idea, but it's actually something else, like better high-level documentation, or a NEP documenting things we wish we did differently but are stuck with, or a generic duck array test suite to improve compatibility and make it easier to bootstrap new libraries, etc.

The word "standard" is tricky:

- it has a pretty precise technical meaning that is different from all of those things, so if those are what you want then it's a bad word to use.

- it's a somewhat arcane niche of engineering practice that a lot of people don't have direct experience with, so there are a ton of people with vague and magical ideas about how standards work, and if you use the word then they'll start expecting all kinds of things. (See the response up thread where someone thinks that you just proposed to make a bunch of incompatible changes to numpy.) This makes it difficult to have a productive discussion, because everyone is misinterpreting each other.

I bet if we can articulate more precisely what exactly you're hoping to accomplish, then we'll also be able to figure out specific concrete actions that will help, and they won't involve the word "standard". For example:


I'll give a concrete example. Here is the xtensor to numpy comparison: https://xtensor.readthedocs.io/en/latest/numpy.html. The xtensor authors clearly have made sane choices, but they did have to spend a lot of effort making those choices - what to include and what not.

Now, the XND team is just starting to build out their Python API. Hameer is building out unumpy. There's all the other arrays libraries I mentioned. We can say "sort it out yourself, make your own choices", or we can provide some guidance. So far the authors of those libaries I have asked say they would appreciate the guidance.

That sounds great. Maybe you want... a mailing list or a forum for array library implementors to compare notes? ("So we ran into this unexpected problem implementing einsum, how did you handle it? And btw numpy devs, why is it like that in the first place?") Maybe you want someone to write up a review of existing APIs like xtensor, dask, xarray, sparse, ... to see where they deviated from numpy and if there are any commonalities? Or someone could do an analysis of existing code and publish tables of how often different features are used, so array implementors can make better choices about what to implement first? Or maybe just encouraging Hameer to be really proactive about sharing drafts and gathering feedback here? 

-n

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: defining a NumPy API standard?

ralfgommers


On Sat, Jun 1, 2019 at 10:32 PM Nathaniel Smith <[hidden email]> wrote:
On Sat, Jun 1, 2019, 09:13 Ralf Gommers <[hidden email]> wrote:


On Sat, Jun 1, 2019 at 11:32 AM Nathaniel Smith <[hidden email]> wrote:
It's possible I'm not getting what you're thinking, but from what you
describe in your email I think it's a bad idea.

Hi Nathaniel, I think you are indeed not getting what I meant and are just responding to the word "standard".

Well, that's the word you chose :-)

It's just one word out of 100 line email. I'm happy to retract it. Please pretend it wasn't there and re-read the rest. Replace it with the list of functions that I propose in my previous email.


I think it's very possible that what you're thinking is a good idea, but it's actually something else, like better high-level documentation, or a NEP documenting things we wish we did differently but are stuck with, or a generic duck array test suite to improve compatibility and make it easier to bootstrap new libraries, etc.

The word "standard" is tricky:

- it has a pretty precise technical meaning that is different from all of those things, so if those are what you want then it's a bad word to use.

- it's a somewhat arcane niche of engineering practice that a lot of people don't have direct experience with, so there are a ton of people with vague and magical ideas about how standards work, and if you use the word then they'll start expecting all kinds of things. (See the response up thread where someone thinks that you just proposed to make a bunch of incompatible changes to numpy.) This makes it difficult to have a productive discussion, because everyone is misinterpreting each other.

I bet if we can articulate more precisely what exactly you're hoping to accomplish,

Please see my email of 1 hour ago.


I'll give a concrete example. Here is the xtensor to numpy comparison: https://xtensor.readthedocs.io/en/latest/numpy.html. The xtensor authors clearly have made sane choices, but they did have to spend a lot of effort making those choices - what to include and what not.

Now, the XND team is just starting to build out their Python API. Hameer is building out unumpy. There's all the other arrays libraries I mentioned. We can say "sort it out yourself, make your own choices", or we can provide some guidance. So far the authors of those libaries I have asked say they would appreciate the guidance.

That sounds great. Maybe you want... a mailing list or a forum for array library implementors to compare notes?

No.

("So we ran into this unexpected problem implementing einsum, how did you handle it? And btw numpy devs, why is it like that in the first place?")

can be done on this list.

Maybe you want someone to write up a review of existing APIs like xtensor, dask, xarray, sparse, ... to see where they deviated from numpy and if there are any commonalities?

That will be useful in verifying that the list of functions for "core of NumPy" I proposed is sensible. We're not going to make up things out of thin air.
 
Or someone could do an analysis of existing code and publish tables of how often different features are used, so array implementors can make better choices about what to implement first?

That's done:)

And yes, it's another useful data point in verifying our choices.

Or maybe just encouraging Hameer to be really proactive about sharing drafts and gathering feedback here? 

No. (well, it's always good to be proactive, but besides the point here)

Cheers,
Ralf


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: defining a NumPy API standard?

Dashamir Hoxha
In reply to this post by ralfgommers
On Sat, Jun 1, 2019 at 10:05 PM Ralf Gommers <[hidden email]> wrote:

I think this is potentially useful, but *far* more prescriptive and detailed than I had in mind. Both you and Nathaniel seem to have not understood what I mean by "out of scope", so I think that's my fault in not being explicit enough. I *do not* want to prescribe behavior. Instead, a simple yes/no for each function in numpy and method on ndarray.

Our API is huge. A simple count:
main namespace: 600
fft: 30
linalg: 30
random: 60
ndarray: 70
lib: 20
lib.npyio: 35
etc. (many more ill-thought out but not clearly private submodules)

Just the main namespace plus ndarray methods is close to 700 objects. If you want to build a NumPy-like thing, that's 700 decisions to make. I'm suggesting something as simple as a list of functions that constitute a sensible "core of NumPy". That list would not include anything in fft/linalg/random, since those can easily be separated out (indeed, if we could disappear fft and linalg and just rely on scipy, pyfftw etc., that would be great). It would not include financial functions. And so on. I guess we'd end up with most ndarray methods plus <150 functions.

That list could be used for many purposes: improve the docs, serve as the set of functions to implement in xnd.array, unumpy & co, Marten's suggestion of implementing other functions in terms of basic functions, etc.

Two other thoughts:
1. NumPy is not done. Our thinking on how to evolve the NumPy API is fairly muddled. When new functions are proposed, it's decided on on a case-by-case basis, usually without a guiding principle. We need to improve that. A "core of NumPy" list could be a part of that puzzle.
2. We often argue about deprecations. Deprecations are costly, but so is keeping around functions that are not very useful or have a poor design. This may offer a middle ground. Don't let others repeat our mistakes, signal to users that a function is of questionable value, without breaking already written code.

This sounds like a restructuring or factorization of the API, in order to make it smaller, and thus easier to learn and use.
It may start with the docs, by paying more attention to the "core" or important functions and methods, and noting the deprecated, or not frequently used, or not important functions. This could also help the satellite projects, which use NumPy API as an example, and may also be influenced by them and their decisions.

Regards,
Dashamir

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: defining a NumPy API standard?

Nathaniel Smith
In reply to this post by ralfgommers
On Sat, Jun 1, 2019 at 1:05 PM Ralf Gommers <[hidden email]> wrote:
> I think this is potentially useful, but *far* more prescriptive and detailed than I had in mind. Both you and Nathaniel seem to have not understood what I mean by "out of scope", so I think that's my fault in not being explicit enough. I *do not* want to prescribe behavior. Instead, a simple yes/no for each function in numpy and method on ndarray.

So yes/no are the answers. But what's the question?

"If we were redesigning numpy in a fantasy world without external
constraints or compatibility issues, would we include this function?"
"Is this function well designed?"
"Do we think that supporting this function is necessary to achieve
practical duck-array compatibility?"
"If someone implements this function, should we give them a 'numpy
core compliant!' logo to put on their website?"
"Do we recommend that people use this function in new code?"
"If we were trying to design a minimal set of primitives and implement
the rest of numpy in terms of them, then is this function a good
candidate for a primitive?"

These are all really different things, and useful for solving
different problems... I feel like you might be lumping them together
some?

Also, I'm guessing there are a bunch of functions where you think part
of the interface is fine and part of the interface is broken. (E.g.
dot's behavior on high-dimensional arrays.) Do you think this "one
bool per function" structure will be fine-grained enough for what you
want to do?

> Two other thoughts:
> 1. NumPy is not done. Our thinking on how to evolve the NumPy API is fairly muddled. When new functions are proposed, it's decided on on a case-by-case basis, usually without a guiding principle. We need to improve that. A "core of NumPy" list could be a part of that puzzle.

I think we do have some rough consensus principles on what's in scope
and what isn't in scope for numpy, but yeah, articulating them more
clearly could be useful. Stuff like "output types and shape should be
predictable from input types and shape", "numpy's core
responsibilities are the array/dtype/ufunc interfaces, and providing a
lingua franca for python numerical libraries to interoperate" (and
therefore: "if it can live outside numpy it probably should"), etc.
I'm seeing this as a living document (a NEP?) that tries to capture
some rules of thumb and that we update as we go. That seems pretty
different to me than a long list of yes/no checkboxes though?

> 2. We often argue about deprecations. Deprecations are costly, but so is keeping around functions that are not very useful or have a poor design. This may offer a middle ground. Don't let others repeat our mistakes, signal to users that a function is of questionable value, without breaking already written code.

The idea has come up a few times of having a "soft deprecation" level,
where we put a warning in the docs but not in the code. It seems like
a reasonable idea to me. It's inherently a kind of case-by-case thing
that can be done incrementally. But, if someone wants to
systematically work through all the docs and do the case-by-case
analysis, that also seems like a reasonable idea to me. I'm not sure
if that's the same as your proposal or not.

-n

--
Nathaniel J. Smith -- https://vorpus.org
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: defining a NumPy API standard?

Marten van Kerkwijk
In reply to this post by ralfgommers

>     In this respect, I think an excellent place to start might be
>     something you are planning already anyway: update the user
>     documentation
>

I would include tests as well. Rather than hammer out a full standard
based on extensive discussions and negotiations, I would suggest NumPy
might be able set a de-facto "standard" based on pieces of the the
current numpy user documentation and test suite.

I think this is potentially useful, but *far* more prescriptive and detailed than I had in mind. Both you and Nathaniel seem to have not understood what I mean by "out of scope", so I think that's my fault in not being explicit enough. I *do not* want to prescribe behavior. Instead, a simple yes/no for each function in numpy and method on ndarray.

I quite like the idea of trying to be better at defining the API through tests - the substitution principle in action! Systematically applying tests to both ndarray and MaskedArray might be a start internally (just a pytest fixture away...). But definitely start with more of an overview.

-- Marten

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: defining a NumPy API standard?

Marten van Kerkwijk
In reply to this post by ralfgommers

Our API is huge. A simple count:
main namespace: 600
fft: 30
linalg: 30
random: 60
ndarray: 70
lib: 20
lib.npyio: 35
etc. (many more ill-thought out but not clearly private submodules)


I would perhaps start with ndarray itself. Quite a lot seems superfluous

Shapes:
- need: shape, strides, reshape, transpose;
- probably: ndim, size, T
- less so: nbytes, ravel, flatten, squeeze, and swapaxes.

Getting/setting:
- need __getitem__, __setitem__;
- less so: fill, put, take, item, itemset, repeat, compress, diagonal;

Datatype/Copies/views/conversion
- need: dtype, copy, view, astype, flags
- less so: ctypes, dump, dumps, getfield, setfield, itemsize, byteswap, newbyteorder, resize, setflags, tobytes, tofile, tolist, tostring,

Iteration
- need __iter__
- less so: flat

Numerics
- need: conj, real, imag
- maybe also: min, max, mean, sum, std, var, prod, partition, sort, tracet;
- less so: the arg* ones, cumsum, cumprod, clip, round, dot, all, any, nonzero, ptp, searchsorted,
choose.

All the best,

Marten

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: defining a NumPy API standard?

ralfgommers


On Sun, Jun 2, 2019 at 3:18 AM Marten van Kerkwijk <[hidden email]> wrote:

Our API is huge. A simple count:
main namespace: 600
fft: 30
linalg: 30
random: 60
ndarray: 70
lib: 20
lib.npyio: 35
etc. (many more ill-thought out but not clearly private submodules)


I would perhaps start with ndarray itself. Quite a lot seems superfluous

Shapes:
- need: shape, strides, reshape, transpose;
- probably: ndim, size, T
- less so: nbytes, ravel, flatten, squeeze, and swapaxes.

Getting/setting:
- need __getitem__, __setitem__;
- less so: fill, put, take, item, itemset, repeat, compress, diagonal;

Datatype/Copies/views/conversion
- need: dtype, copy, view, astype, flags
- less so: ctypes, dump, dumps, getfield, setfield, itemsize, byteswap, newbyteorder, resize, setflags, tobytes, tofile, tolist, tostring,

Iteration
- need __iter__
- less so: flat

Numerics
- need: conj, real, imag
- maybe also: min, max, mean, sum, std, var, prod, partition, sort, tracet;
- less so: the arg* ones, cumsum, cumprod, clip, round, dot, all, any, nonzero, ptp, searchsorted,
choose.

Exactly. This is great, thanks Marten. I agree with pretty much everything in this list.

Ralf


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: defining a NumPy API standard?

ralfgommers
In reply to this post by Dashamir Hoxha


On Sun, Jun 2, 2019 at 12:33 AM Dashamir Hoxha <[hidden email]> wrote:
On Sat, Jun 1, 2019 at 10:05 PM Ralf Gommers <[hidden email]> wrote:

I think this is potentially useful, but *far* more prescriptive and detailed than I had in mind. Both you and Nathaniel seem to have not understood what I mean by "out of scope", so I think that's my fault in not being explicit enough. I *do not* want to prescribe behavior. Instead, a simple yes/no for each function in numpy and method on ndarray.

Our API is huge. A simple count:
main namespace: 600
fft: 30
linalg: 30
random: 60
ndarray: 70
lib: 20
lib.npyio: 35
etc. (many more ill-thought out but not clearly private submodules)

Just the main namespace plus ndarray methods is close to 700 objects. If you want to build a NumPy-like thing, that's 700 decisions to make. I'm suggesting something as simple as a list of functions that constitute a sensible "core of NumPy". That list would not include anything in fft/linalg/random, since those can easily be separated out (indeed, if we could disappear fft and linalg and just rely on scipy, pyfftw etc., that would be great). It would not include financial functions. And so on. I guess we'd end up with most ndarray methods plus <150 functions.

That list could be used for many purposes: improve the docs, serve as the set of functions to implement in xnd.array, unumpy & co, Marten's suggestion of implementing other functions in terms of basic functions, etc.

Two other thoughts:
1. NumPy is not done. Our thinking on how to evolve the NumPy API is fairly muddled. When new functions are proposed, it's decided on on a case-by-case basis, usually without a guiding principle. We need to improve that. A "core of NumPy" list could be a part of that puzzle.
2. We often argue about deprecations. Deprecations are costly, but so is keeping around functions that are not very useful or have a poor design. This may offer a middle ground. Don't let others repeat our mistakes, signal to users that a function is of questionable value, without breaking already written code.

This sounds like a restructuring or factorization of the API, in order to make it smaller, and thus easier to learn and use.
It may start with the docs, by paying more attention to the "core" or important functions and methods, and noting the deprecated, or not frequently used, or not important functions. This could also help the satellite projects, which use NumPy API as an example, and may also be influenced by them and their decisions.

 Indeed. It will help restructure our docs. Perhaps not the reference guide (not sure yet), but definitely the user guide and other high-level docs we (or third parties) may want to create.

Ralf

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: defining a NumPy API standard?

ralfgommers
In reply to this post by Nathaniel Smith


On Sun, Jun 2, 2019 at 12:35 AM Nathaniel Smith <[hidden email]> wrote:
On Sat, Jun 1, 2019 at 1:05 PM Ralf Gommers <[hidden email]> wrote:
> I think this is potentially useful, but *far* more prescriptive and detailed than I had in mind. Both you and Nathaniel seem to have not understood what I mean by "out of scope", so I think that's my fault in not being explicit enough. I *do not* want to prescribe behavior. Instead, a simple yes/no for each function in numpy and method on ndarray.

So yes/no are the answers. But what's the question?

"If we were redesigning numpy in a fantasy world without external
constraints or compatibility issues, would we include this function?"
"Is this function well designed?"
"Do we think that supporting this function is necessary to achieve
practical duck-array compatibility?"
"If someone implements this function, should we give them a 'numpy
core compliant!' logo to put on their website?"
"Do we recommend that people use this function in new code?"
"If we were trying to design a minimal set of primitives and implement
the rest of numpy in terms of them, then is this function a good
candidate for a primitive?"

These are all really different things, and useful for solving
different problems... I feel like you might be lumping them together
some?

No, I feel like you just want to see a real proposal. At this point I've gotten some really useful feedback, in particular from Marten (thanks!), and I have a better idea of what to do. So I'll answer a few of your questions, and propose to leave the rest till I actually have some more solid to discuss. That will likely answer many of your questions.


Also, I'm guessing there are a bunch of functions where you think part
of the interface is fine and part of the interface is broken. (E.g.
dot's behavior on high-dimensional arrays.)

Indeed, but that's a much harder problem to tackle. Again, there's a reason I put function behavior explicitly out of scope.

Do you think this "one
bool per function" structure will be fine-grained enough for what you
want to do?

yes
 

> Two other thoughts:
> 1. NumPy is not done. Our thinking on how to evolve the NumPy API is fairly muddled. When new functions are proposed, it's decided on on a case-by-case basis, usually without a guiding principle. We need to improve that. A "core of NumPy" list could be a part of that puzzle.

I think we do have some rough consensus principles on what's in scope
and what isn't in scope for numpy,

Very rough perhaps. I don't think we are on the same wavelength at all about the cost of adding new functions, the cost of deprecations, the use of submodules and even what's public or private right now.

That can't be solved all at once, but I think what my idea will help with some of these.

but yeah, articulating them more
clearly could be useful. Stuff like "output types and shape should be
predictable from input types and shape", "numpy's core
responsibilities are the array/dtype/ufunc interfaces, and providing a
lingua franca for python numerical libraries to interoperate" (and
therefore: "if it can live outside numpy it probably should"), etc.

All of these are valid questions. Most of that propably needs to be in the scope document (https://www.numpy.org/neps/scope.html). Which also needs to be improved.

I'm seeing this as a living document (a NEP?)

NEP would work. Although I'd prefer a way to be able to reference some fixed version of it rather than it being always in flux.

that tries to capture
some rules of thumb and that we update as we go. That seems pretty
different to me than a long list of yes/no checkboxes though?

> 2. We often argue about deprecations. Deprecations are costly, but so is keeping around functions that are not very useful or have a poor design. This may offer a middle ground. Don't let others repeat our mistakes, signal to users that a function is of questionable value, without breaking already written code.

The idea has come up a few times of having a "soft deprecation" level,
where we put a warning in the docs but not in the code. It seems like
a reasonable idea to me. It's inherently a kind of case-by-case thing
that can be done incrementally. But, if someone wants to
systematically work through all the docs and do the case-by-case
analysis, that also seems like a reasonable idea to me. I'm not sure
if that's the same as your proposal or not.

not the same, but related.

Ralf


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
12