Future of ufuncs

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Future of ufuncs

Charles R Harris

Hi All,

This post is to open a discussion of the future of ufuncs. There are two contradictory ideas that have floated about regarding ufuncs evolution. One is to generalize ufuncs to operate on buffers, essentially separating them from their current entanglement with ndarrays. The other is to accept that they are fundamentally part of the ndarray universe and move them into the multiarray module, thus avoiding the odd overloading of functions in the multiarray module. The first has been a long time proposal that I once thought sounded good, but I've come to prefer the second. That change of mind was driven by the resulting code simplification and the removal of a dependence on a Python feature, buffers, that we cannot easily modify to adapt to changing needs and new dtypes. Because I'd like to move the ufuncs, if we decide to move them, sometime after NumPy 1.14 is released, now seems a good time to decide the issue.

Thoughts?

Chuck


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Future of ufuncs

Sebastian Berg
On Sun, 2017-05-28 at 14:53 -0600, Charles R Harris wrote:

> Hi All,
> This post is to open a discussion of the future of ufuncs. There are
> two contradictory ideas that have floated about regarding ufuncs
> evolution. One is to generalize ufuncs to operate on buffers,
> essentially separating them from their current entanglement with
> ndarrays. The other is to accept that they are fundamentally part of
> the ndarray universe and move them into the multiarray module, thus
> avoiding the odd overloading of functions in the multiarray module.
> The first has been a long time proposal that I once thought sounded
> good, but I've come to prefer the second. That change of mind was
> driven by the resulting code simplification and the removal of a
> dependence on a Python feature, buffers, that we cannot easily modify
> to adapt to changing needs and new dtypes. Because I'd like to move
> the ufuncs, if we decide to move them, sometime after NumPy 1.14 is
> released, now seems a good time to decide the issue.
> Thoughts?

I did not think about it much. But I agree that the dtypes are probably
the biggest issue, also I am not sure anymore if there is much of a
gain on having ufuncs work on buffers in any case?

The dtype thing goes a bit back to ideas like the datashape things and
trying to make the dtypes somewhat separate from numpy? Though I doubt
I would want to make that an explicit goal.

I wonder how much of the C-loops and type resolving we could/should
expose? What I mean is that ufuncs are:

 * type resolving (somewhat ufunc specific)
 * outer loops (normal, reduce, etc.) using nditer (buffering)
 * inner 1d loops

It is a bit more complicating, but just wondering if it might make
sense to try and expose the individual ufunc things (type resolving and
1d loop) but not all the outer loop nditer setup which is ndarray
specific in any case (honestly, I am not sure it is entirely possible
it is already exposed).

- Sebastian


> Chuck
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (817 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Future of ufuncs

Marten van Kerkwijk
Hi Chuck,

Like Sebastian, I wonder a little about what level you are talking
about. Presumably, it is the actual implementation of the ufunc? I.e.,
this is not about the upper logic that decides which `__array_ufunc__`
to call, etc.

If so, I agree with you that it would seem to make most sense to move
the implementation to `multiarray`; the current structure certainly is
a major hurdle to understanding how things work!

Indeed, I guess in terms of my earlier suggestion to make much of a
ufunc happen in `ndarray.__array_ufunc__`, one could seem the type
resolution and iteration happening there. If one were to expose the
inner loops, anyone working with buffers could then use the ufuncs by
defining their own __array_ufunc__.

All the best,

Marten
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Future of ufuncs

Charles R Harris


On Mon, May 29, 2017 at 12:32 PM, Marten van Kerkwijk <[hidden email]> wrote:
Hi Chuck,

Like Sebastian, I wonder a little about what level you are talking
about. Presumably, it is the actual implementation of the ufunc? I.e.,
this is not about the upper logic that decides which `__array_ufunc__`
to call, etc.

If so, I agree with you that it would seem to make most sense to move
the implementation to `multiarray`; the current structure certainly is
a major hurdle to understanding how things work!

Indeed, I guess in terms of my earlier suggestion to make much of a
ufunc happen in `ndarray.__array_ufunc__`, one could seem the type
resolution and iteration happening there. If one were to expose the
inner loops, anyone working with buffers could then use the ufuncs by
defining their own __array_ufunc__.

The idea of separating ufuncs from ndarray was put forward many years ago, maybe five or six. What I seek here is a record that we have given up on that ambition, so do not need to take it into consideration in the future. In particular, we can feel free to couple ufuncs even more tightly with ndarray.

Chuck


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Future of ufuncs

Nathaniel Smith
On Mon, May 29, 2017 at 1:51 PM, Charles R Harris
<[hidden email]> wrote:

>
>
> On Mon, May 29, 2017 at 12:32 PM, Marten van Kerkwijk
> <[hidden email]> wrote:
>>
>> Hi Chuck,
>>
>> Like Sebastian, I wonder a little about what level you are talking
>> about. Presumably, it is the actual implementation of the ufunc? I.e.,
>> this is not about the upper logic that decides which `__array_ufunc__`
>> to call, etc.
>>
>> If so, I agree with you that it would seem to make most sense to move
>> the implementation to `multiarray`; the current structure certainly is
>> a major hurdle to understanding how things work!
>>
>> Indeed, I guess in terms of my earlier suggestion to make much of a
>> ufunc happen in `ndarray.__array_ufunc__`, one could seem the type
>> resolution and iteration happening there. If one were to expose the
>> inner loops, anyone working with buffers could then use the ufuncs by
>> defining their own __array_ufunc__.
>
>
> The idea of separating ufuncs from ndarray was put forward many years ago,
> maybe five or six. What I seek here is a record that we have given up on
> that ambition, so do not need to take it into consideration in the future.
> In particular, we can feel free to couple ufuncs even more tightly with
> ndarray.

I think we do want to separate ufuncs from ndarray semantically: it
should be possible to use ufuncs on sparse arrays, dask arrays, etc.
etc.

But I don't think that altering ufuncs to work directly on
buffer/memoryview objects, or shipping them as a separate package from
the rest of numpy, is a useful step towards this goal.

Right now, handling buffers/memoryviews is easy: one can trivially
convert between them and ndarray without making any copies. I don't
know of any interesting problems that are blocked because ufuncs work
on ndarrays instead of buffer/memoryview objects. The interesting
problems are where there's a fundamentally different storage strategy
involved, like sparse/dask/... arrays.

And similarly, I don't see what problems are solved by splitting them
out for building or distribution.

OTOH, trying to accomplish either of these things definitely has a
cost in terms of churn, complexity, double the workload for
release-management, etc. Even the current split between the multiarray
and umath modules causes problems all the time. It's mostly boring
problems like having little utility functions that are needed in both
places but awkward to share, or problems caused by the complicated
machinery needed to let them interact properly (set_numeric_ops and
all that) – this doesn't seem like stuff that's adding any value.

Plus, there's a major problem that buffers/memoryviews don't have any
way to represent all the dtypes we currently support (e.g. datetime64)
and don't have any way to add new ones, and the only way to fix this
would be to write a PEP, shepherding patches through python-dev,
waiting for the next python major release and then dropping support
for all older Python releases. None of this is going to happen soon;
probably we should plan on the assumption that it will never happen.
So I don't see how this could work at all.

So my vote is for merging the multiarray and umath code bases
together, and then taking advantage of the resulting flexibility to
refactor the internals to provide cleanly separated interfaces at the
API level.

-n

--
Nathaniel J. Smith -- https://vorpus.org
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Loading...