start of an array (tensor) and dataframe API standardization initiative

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

start of an array (tensor) and dataframe API standardization initiative

ralfgommers
Hi all,

I'd like to share this announcement blog post about the creation of a consortium for array and dataframe API standardization here: https://data-apis.org/blog/announcing_the_consortium/. It's still in the beginning stages, but starting to take shape. We have participation from one or more maintainers of most array and tensor libraries - NumPy, TensorFlow, PyTorch, MXNet, Dask, JAX, Xarray. Stephan Hoyer, Travis Oliphant and myself have been providing input from a NumPy perspective.

The effort is very much related to some of the interoperability work we've been doing in NumPy (e.g. it could provide an answer to what's described in https://numpy.org/neps/nep-0037-array-module.html#requesting-restricted-subsets-of-numpy-s-api).

At this point we're looking for feedback from maintainers at a high level (see the blog post for details).

Also important: the python-record-api tooling and data in its repo has very granular API usage data, of the kind we could really use when making decisions that impact backwards compatibility.

Cheers,
Ralf


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: start of an array (tensor) and dataframe API standardization initiative

ralfgommers
Hi all,

I'd like to share an update on this topic. The draft array API standard is now ready for wider review:

- Array API standard document: https://data-apis.github.io/array-api/latest/

It would be great if people - and in particular, NumPy maintainers - could have a look at it and see if that looks sensible from a NumPy perspective and whether the goals and benefits of adopting it are described clearly enough and are compelling.

I'm sure a NEP will be needed for proposing adoption of the standard once it is closer to completion, and work out what that means for interaction with the array protocol NEPs and/or NEP 37, and how an implementation would look. It's a bit early for that now, I'm thinking maybe by the end of the year. Some initial discussion now would be useful though, since it's easier to make changes now rather than when that API standard is already further along.

Cheers,
Ralf


On Mon, Aug 17, 2020 at 9:34 PM Ralf Gommers <[hidden email]> wrote:
Hi all,

I'd like to share this announcement blog post about the creation of a consortium for array and dataframe API standardization here: https://data-apis.org/blog/announcing_the_consortium/. It's still in the beginning stages, but starting to take shape. We have participation from one or more maintainers of most array and tensor libraries - NumPy, TensorFlow, PyTorch, MXNet, Dask, JAX, Xarray. Stephan Hoyer, Travis Oliphant and myself have been providing input from a NumPy perspective.

The effort is very much related to some of the interoperability work we've been doing in NumPy (e.g. it could provide an answer to what's described in https://numpy.org/neps/nep-0037-array-module.html#requesting-restricted-subsets-of-numpy-s-api).

At this point we're looking for feedback from maintainers at a high level (see the blog post for details).

Also important: the python-record-api tooling and data in its repo has very granular API usage data, of the kind we could really use when making decisions that impact backwards compatibility.

Cheers,
Ralf


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: start of an array (tensor) and dataframe API standardization initiative

Ilhan Polat
This is great work. Thanks to everyone who contributed. Very clean user-interface too.

One question: Can we propose feature requests already or is that discussion closed?

On Tue, Nov 10, 2020 at 7:21 PM Ralf Gommers <[hidden email]> wrote:
Hi all,

I'd like to share an update on this topic. The draft array API standard is now ready for wider review:

- Array API standard document: https://data-apis.github.io/array-api/latest/

It would be great if people - and in particular, NumPy maintainers - could have a look at it and see if that looks sensible from a NumPy perspective and whether the goals and benefits of adopting it are described clearly enough and are compelling.

I'm sure a NEP will be needed for proposing adoption of the standard once it is closer to completion, and work out what that means for interaction with the array protocol NEPs and/or NEP 37, and how an implementation would look. It's a bit early for that now, I'm thinking maybe by the end of the year. Some initial discussion now would be useful though, since it's easier to make changes now rather than when that API standard is already further along.

Cheers,
Ralf


On Mon, Aug 17, 2020 at 9:34 PM Ralf Gommers <[hidden email]> wrote:
Hi all,

I'd like to share this announcement blog post about the creation of a consortium for array and dataframe API standardization here: https://data-apis.org/blog/announcing_the_consortium/. It's still in the beginning stages, but starting to take shape. We have participation from one or more maintainers of most array and tensor libraries - NumPy, TensorFlow, PyTorch, MXNet, Dask, JAX, Xarray. Stephan Hoyer, Travis Oliphant and myself have been providing input from a NumPy perspective.

The effort is very much related to some of the interoperability work we've been doing in NumPy (e.g. it could provide an answer to what's described in https://numpy.org/neps/nep-0037-array-module.html#requesting-restricted-subsets-of-numpy-s-api).

At this point we're looking for feedback from maintainers at a high level (see the blog post for details).

Also important: the python-record-api tooling and data in its repo has very granular API usage data, of the kind we could really use when making decisions that impact backwards compatibility.

Cheers,
Ralf

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: start of an array (tensor) and dataframe API standardization initiative

YueCompl
In reply to this post by ralfgommers
This is great!

I'm working on some Haskell based mmap shared array lib, with Python like surface language API. I would adhere to such standard very willingly.

A quick skim but I can't find dataframe related info, that's scheduled for the future? Will take Pandas as primary reference?

Thanks with best regards,
Compl


On 2020-11-11, at 02:19, Ralf Gommers <[hidden email]> wrote:

Hi all,

I'd like to share an update on this topic. The draft array API standard is now ready for wider review:

- Array API standard document: https://data-apis.github.io/array-api/latest/

It would be great if people - and in particular, NumPy maintainers - could have a look at it and see if that looks sensible from a NumPy perspective and whether the goals and benefits of adopting it are described clearly enough and are compelling.

I'm sure a NEP will be needed for proposing adoption of the standard once it is closer to completion, and work out what that means for interaction with the array protocol NEPs and/or NEP 37, and how an implementation would look. It's a bit early for that now, I'm thinking maybe by the end of the year. Some initial discussion now would be useful though, since it's easier to make changes now rather than when that API standard is already further along.

Cheers,
Ralf


On Mon, Aug 17, 2020 at 9:34 PM Ralf Gommers <[hidden email]> wrote:
Hi all,

I'd like to share this announcement blog post about the creation of a consortium for array and dataframe API standardization here: https://data-apis.org/blog/announcing_the_consortium/. It's still in the beginning stages, but starting to take shape. We have participation from one or more maintainers of most array and tensor libraries - NumPy, TensorFlow, PyTorch, MXNet, Dask, JAX, Xarray. Stephan Hoyer, Travis Oliphant and myself have been providing input from a NumPy perspective.

The effort is very much related to some of the interoperability work we've been doing in NumPy (e.g. it could provide an answer to what's described in https://numpy.org/neps/nep-0037-array-module.html#requesting-restricted-subsets-of-numpy-s-api).

At this point we're looking for feedback from maintainers at a high level (see the blog post for details).

Also important: the python-record-api tooling and data in its repo has very granular API usage data, of the kind we could really use when making decisions that impact backwards compatibility.

Cheers,
Ralf

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: start of an array (tensor) and dataframe API standardization initiative

ralfgommers
In reply to this post by Ilhan Polat


On Wed, Nov 11, 2020 at 10:56 AM Ilhan Polat <[hidden email]> wrote:
This is great work. Thanks to everyone who contributed. Very clean user-interface too.

One question: Can we propose feature requests already or is that discussion closed?

It's not closed, this is the start of community review so if things are missing or need changing, now is a good time to bring them up - please have a look at CONTRIBUTING.md in the array-api repo.

What I would personally expect is that most discussion will be about the bigger picture topics and about the clarity of the document. There may be some individual functions that are important to add, if that's what you have in mind I would recommend looking at some merged PRs to see how the analysis is done (e.g. usage data, comparison between existing libraries). https://github.com/data-apis/array-api/pull/42 is a good example.

Cheers,
Ralf


On Tue, Nov 10, 2020 at 7:21 PM Ralf Gommers <[hidden email]> wrote:
Hi all,

I'd like to share an update on this topic. The draft array API standard is now ready for wider review:

- Array API standard document: https://data-apis.github.io/array-api/latest/

It would be great if people - and in particular, NumPy maintainers - could have a look at it and see if that looks sensible from a NumPy perspective and whether the goals and benefits of adopting it are described clearly enough and are compelling.

I'm sure a NEP will be needed for proposing adoption of the standard once it is closer to completion, and work out what that means for interaction with the array protocol NEPs and/or NEP 37, and how an implementation would look. It's a bit early for that now, I'm thinking maybe by the end of the year. Some initial discussion now would be useful though, since it's easier to make changes now rather than when that API standard is already further along.

Cheers,
Ralf


On Mon, Aug 17, 2020 at 9:34 PM Ralf Gommers <[hidden email]> wrote:
Hi all,

I'd like to share this announcement blog post about the creation of a consortium for array and dataframe API standardization here: https://data-apis.org/blog/announcing_the_consortium/. It's still in the beginning stages, but starting to take shape. We have participation from one or more maintainers of most array and tensor libraries - NumPy, TensorFlow, PyTorch, MXNet, Dask, JAX, Xarray. Stephan Hoyer, Travis Oliphant and myself have been providing input from a NumPy perspective.

The effort is very much related to some of the interoperability work we've been doing in NumPy (e.g. it could provide an answer to what's described in https://numpy.org/neps/nep-0037-array-module.html#requesting-restricted-subsets-of-numpy-s-api).

At this point we're looking for feedback from maintainers at a high level (see the blog post for details).

Also important: the python-record-api tooling and data in its repo has very granular API usage data, of the kind we could really use when making decisions that impact backwards compatibility.

Cheers,
Ralf

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: start of an array (tensor) and dataframe API standardization initiative

ralfgommers
In reply to this post by YueCompl


On Wed, Nov 11, 2020 at 12:15 PM YueCompl <[hidden email]> wrote:
This is great!

I'm working on some Haskell based mmap shared array lib, with Python like surface language API. I would adhere to such standard very willingly.

Awesome. Library authors from other languages is definitely something else we had in mind, so glad to hear it's helpful.

A quick skim but I can't find dataframe related info, that's scheduled for the future? Will take Pandas as primary reference?

Yes, that is planned but will take a while longer. Dataframes are less mature, and Pandas itself is still very much in flux (the first proposal after the 1.0 release was "let's deprecate <stuff> for 2.0", so it's a more complex puzzle. Pandas is an important reference, but I'd expect the end result to deviate more from Pandas than the array API differs from NumPy.

Cheers,
Ralf



Thanks with best regards,
Compl


On 2020-11-11, at 02:19, Ralf Gommers <[hidden email]> wrote:

Hi all,

I'd like to share an update on this topic. The draft array API standard is now ready for wider review:

- Array API standard document: https://data-apis.github.io/array-api/latest/

It would be great if people - and in particular, NumPy maintainers - could have a look at it and see if that looks sensible from a NumPy perspective and whether the goals and benefits of adopting it are described clearly enough and are compelling.

I'm sure a NEP will be needed for proposing adoption of the standard once it is closer to completion, and work out what that means for interaction with the array protocol NEPs and/or NEP 37, and how an implementation would look. It's a bit early for that now, I'm thinking maybe by the end of the year. Some initial discussion now would be useful though, since it's easier to make changes now rather than when that API standard is already further along.

Cheers,
Ralf


On Mon, Aug 17, 2020 at 9:34 PM Ralf Gommers <[hidden email]> wrote:
Hi all,

I'd like to share this announcement blog post about the creation of a consortium for array and dataframe API standardization here: https://data-apis.org/blog/announcing_the_consortium/. It's still in the beginning stages, but starting to take shape. We have participation from one or more maintainers of most array and tensor libraries - NumPy, TensorFlow, PyTorch, MXNet, Dask, JAX, Xarray. Stephan Hoyer, Travis Oliphant and myself have been providing input from a NumPy perspective.

The effort is very much related to some of the interoperability work we've been doing in NumPy (e.g. it could provide an answer to what's described in https://numpy.org/neps/nep-0037-array-module.html#requesting-restricted-subsets-of-numpy-s-api).

At this point we're looking for feedback from maintainers at a high level (see the blog post for details).

Also important: the python-record-api tooling and data in its repo has very granular API usage data, of the kind we could really use when making decisions that impact backwards compatibility.

Cheers,
Ralf

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: start of an array (tensor) and dataframe API standardization initiative

mattip
Administrator
In reply to this post by ralfgommers

On 11/10/20 8:19 PM, Ralf Gommers wrote:

> Hi all,
>
> I'd like to share an update on this topic. The draft array API
> standard is now ready for wider review:
>
> - Blog post: https://data-apis.org/blog/array_api_standard_release 
> <https://data-apis.org/blog/array_api_standard_release/>
> - Array API standard document:
> https://data-apis.github.io/array-api/latest/
> - Repo: https://github.com/data-apis/array-api/
>
> It would be great if people - and in particular, NumPy maintainers -
> could have a look at it and see if that looks sensible from a NumPy
> perspective and whether the goals and benefits of adopting it are
> described clearly enough and are compelling.
>

I think it is compelling for a first version. The test suite and
benchmark suite will be valuable tools. I hope future versions
standardize complex numbers as a dtype. I realize there is a limit to
the breadth of the scope of functions to be covered. Is there a page
that lists them in one place? For instance I tried to look up what the
standard has to say on issue https://github.com/numpy/numpy/issues/17760 
about using bincount on unt64 arrays. It took me a while to figure out
that bincount was not in the API (although unique(..., return_counts) is).


Matti

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: start of an array (tensor) and dataframe API standardization initiative

ralfgommers


On Thu, Nov 12, 2020 at 1:54 PM Matti Picus <[hidden email]> wrote:

On 11/10/20 8:19 PM, Ralf Gommers wrote:
> Hi all,
>
> I'd like to share an update on this topic. The draft array API
> standard is now ready for wider review:
>
> - Blog post: https://data-apis.org/blog/array_api_standard_release
> <https://data-apis.org/blog/array_api_standard_release/>
> - Array API standard document:
> https://data-apis.github.io/array-api/latest/
> - Repo: https://github.com/data-apis/array-api/
>
> It would be great if people - and in particular, NumPy maintainers -
> could have a look at it and see if that looks sensible from a NumPy
> perspective and whether the goals and benefits of adopting it are
> described clearly enough and are compelling.
>

I think it is compelling for a first version. The test suite and
benchmark suite will be valuable tools. I hope future versions
standardize complex numbers as a dtype.

Yes, that's definitely a desire - when implementations are there/ready. At the moment most libraries have very incomplete support for complex dtypes, largely because they're not very important for deep learning. Also NumPy's implementations/choices are shaky in places, and that's being turned up by the PyTorch effort that's ongoing now to implement complex dtype support in a NumPy-compatible way.

I realize there is a limit to
the breadth of the scope of functions to be covered. Is there a page
that lists them in one place? For instance I tried to look up what the
standard has to say on issue https://github.com/numpy/numpy/issues/17760
about using bincount on unt64 arrays. It took me a while to figure out
that bincount was not in the API (although unique(..., return_counts) is).

That's a good idea and still missing, thanks for asking. The test suite that's in development has a complete list [1]. In the document itself Sphinx search works, but it should be easier to get a complete overview perhaps (although it requires some thought - the NumPy docs don't have everything on one page either).


Cheers,
Ralf




Matti

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion