Extending ufunc signature syntax for matmul, frozen dimensions

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Extending ufunc signature syntax for matmul, frozen dimensions

mattip
In looking to solve issue #9028 "no way to override matmul/@ if
__array_ufunc__ is set", it seems there is consensus around the idea of
making matmul a true gufunc, but matmul can behave differently for
different combinations of array and vector:

(n,k),(k,m)->(n,m)
(n,k),(k) -> (n)
(k),(k,m)->(m)

Currently there is no way to express that in the ufunc signature. The
proposed solution to issue #9029 is to extend the meaning of a signature
so "syntax like (n?,k),(k,m?)->(n?,m?) could mean that n and m are
optional dimensions; if missing in the input, they're treated as 1, and
then dropped from the output" Additionally, there is an open pull
request #5015 "Add frozen dimensions to gufunc signatures" to allow
signatures like '(3),(3)->(3)'.

I would like extending ufunc signature handling to implement both these
ideas, in a way that would be backwardly-compatible with the publicly
exposed PyUFuncObject. PyUFunc_FromFuncAndDataAndSignature is used to
allocate and initialize a PyUFuncObject, are there downstream projects
that allocate their own PyUFuncObject not via
PyUFunc_FromFuncAndDataAndSignature? If so, we could use one of the
"reserved" fields, or extend the meaning of the "identity" field to
allow version detection. Any thoughts?

Any other thoughts about extending the signature syntax?

Thanks,
Matti
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Extending ufunc signature syntax for matmul, frozen dimensions

Marten van Kerkwijk
Hi Matti,

This sounds great. For completeness, you omitted the vector-vector
case for matmul '(k),(k)->()' - but the suggested new signature for
`matmul` would cover that case as well, so not a problem.

All the best,

Marten
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Extending ufunc signature syntax for matmul, frozen dimensions

Marten van Kerkwijk
In reply to this post by mattip
I thought a bit further about this proposal: a disadvantage for matmul
specifically is that is does not solve the need for `matvec`,
`vecmat`, and `vecvec` gufuncs. That said, it might make sense to
implement those as "pseudo-ufuncs" that just add a 1 in the right
place and call `matmul`...
-- Marten
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Extending ufunc signature syntax for matmul, frozen dimensions

Stephan Hoyer-2
In reply to this post by mattip
On Sun, Apr 29, 2018 at 2:48 AM Matti Picus <[hidden email]> wrote:
The  proposed solution to issue #9029 is to extend the meaning of a signature so "syntax like (n?,k),(k,m?)->(n?,m?) could mean that n and m are optional dimensions; if missing in the input, they're treated as 1, and
then dropped from the output"

I agree that this is an elegant fix for matmul, but are there other use-cases for "optional dimensions" in gufuncs?

It feels a little wrong to add gufunc features if we can only think of one function that can use them.

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Extending ufunc signature syntax for matmul, frozen dimensions

Eric Wieser

I think I’m -1 on this - this just makes things harder on the implementers of _array_ufunc__ who now might have to work out which signature matches. I’d prefer the solution where np.matmul is a wrapper around one of three gufuncs (or maybe just around one with axis insertion) - this is similar to how np.linalg already works.

Eric


On Mon, 30 Apr 2018 at 14:34 Stephan Hoyer <[hidden email]> wrote:
On Sun, Apr 29, 2018 at 2:48 AM Matti Picus <[hidden email]> wrote:
The  proposed solution to issue #9029 is to extend the meaning of a signature so "syntax like (n?,k),(k,m?)->(n?,m?) could mean that n and m are optional dimensions; if missing in the input, they're treated as 1, and
then dropped from the output"

I agree that this is an elegant fix for matmul, but are there other use-cases for "optional dimensions" in gufuncs?

It feels a little wrong to add gufunc features if we can only think of one function that can use them.
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Extending ufunc signature syntax for matmul, frozen dimensions

Allan Haldane
In reply to this post by mattip
On 04/29/2018 05:46 AM, Matti Picus wrote:

> In looking to solve issue #9028 "no way to override matmul/@ if
> __array_ufunc__ is set", it seems there is consensus around the idea of
> making matmul a true gufunc, but matmul can behave differently for
> different combinations of array and vector:
>
> (n,k),(k,m)->(n,m)
> (n,k),(k) -> (n)
> (k),(k,m)->(m)
>
> Currently there is no way to express that in the ufunc signature. The
> proposed solution to issue #9029 is to extend the meaning of a signature
> so "syntax like (n?,k),(k,m?)->(n?,m?) could mean that n and m are
> optional dimensions; if missing in the input, they're treated as 1, and
> then dropped from the output" Additionally, there is an open pull
> request #5015 "Add frozen dimensions to gufunc signatures" to allow
> signatures like '(3),(3)->(3)'.

How much harder would it be to implement multiple-dispatch for gufunc
signatures, instead of modifying the signature to include `?` ?

There was some discussion of this last year:

http://numpy-discussion.10968.n7.nabble.com/Changes-to-generalized-ufunc-core-dimension-checking-tp42618p42638.html

That sounded like a clean solution to me, although I'm a bit ignorant of
the gufunc internals and the compatibility constraints.

I assume gufuncs already have code to match the signature to the array
dims, so it sounds fairly straightforward (I say without looking at any
code) to do this in a loop over alternate signatures until one works.

Allan
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Extending ufunc signature syntax for matmul, frozen dimensions

mattip


On 01/05/18 01:45, Allan Haldane wrote:

> On 04/29/2018 05:46 AM, Matti Picus wrote:
>> In looking to solve issue #9028 "no way to override matmul/@ if
>> __array_ufunc__ is set", it seems there is consensus around the idea of
>> making matmul a true gufunc, but matmul can behave differently for
>> different combinations of array and vector:
>>
>> (n,k),(k,m)->(n,m)
>> (n,k),(k) -> (n)
>> (k),(k,m)->(m)
>>
>> Currently there is no way to express that in the ufunc signature. The
>> proposed solution to issue #9029 is to extend the meaning of a signature
>> so "syntax like (n?,k),(k,m?)->(n?,m?) could mean that n and m are
>> optional dimensions; if missing in the input, they're treated as 1, and
>> then dropped from the output" Additionally, there is an open pull
>> request #5015 "Add frozen dimensions to gufunc signatures" to allow
>> signatures like '(3),(3)->(3)'.
> How much harder would it be to implement multiple-dispatch for gufunc
> signatures, instead of modifying the signature to include `?` ?
>
> There was some discussion of this last year:
>
> http://numpy-discussion.10968.n7.nabble.com/Changes-to-generalized-ufunc-core-dimension-checking-tp42618p42638.html
>
> That sounded like a clean solution to me, although I'm a bit ignorant of
> the gufunc internals and the compatibility constraints.
>
> I assume gufuncs already have code to match the signature to the array
> dims, so it sounds fairly straightforward (I say without looking at any
> code) to do this in a loop over alternate signatures until one works.
>
> Allan
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
I will take a look at multiple-dispatch for gufuncs.
The discussion also suggests using an axis kwarg when calling a gufunc
for which there is PR #1108 (https://github.com/numpy/numpy/pull/11018)
discussion).

Matti
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Extending ufunc signature syntax for matmul, frozen dimensions

einstein.edison
In reply to this post by Eric Wieser
I agree with Eric here. As one of the users of __array_ufunc__, I'd
much rather have three separate gufuncs or a single one with axis
insertion and removal. On 30/04/2018 at 23:38, Eric wrote: I think I’m
-1 on this - this just makes things harder on the implementers of
_array_ufunc__ who now might have to work out which signature matches.
I’d prefer the solution where np.matmul is a wrapper around one of
three gufuncs (or maybe just around one with axis insertion) - this is
similar to how np.linalg already works. Eric On Mon, 30 Apr 2018 at
14:34 Stephan Hoyer <[hidden email]> wrote: On Sun, Apr 29, 2018 at
2:48 AM Matti Picus <[hidden email]> wrote: The proposed
solution to issue #9029 is to extend the meaning of a signature so
"syntax like (n?,k),(k,m?)->(n?,m?) could mean that n and m are
optional dimensions; if missing in the input, they're treated as 1,
and then dropped from the output" I agree that this is an elegant fix
for matmul, but are there other use-cases for "optional dimensions" in
gufuncs? It feels a little wrong to add gufunc features if we can only
think of one function that can use them.
_______________________________________________ NumPy-Discussion
mailing list [hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Extending ufunc signature syntax for matmul, frozen dimensions

mattip
In reply to this post by Eric Wieser
On 01/05/18 00:38, Eric Wieser wrote:

>
> I think I’m -1 on this - this just makes things harder on the
> implementers of |_array_ufunc__| who now might have to work out which
> signature matches. I’d prefer the solution where |np.matmul| is a
> wrapper around one of three gufuncs (or maybe just around one with
> axis insertion) - this is similar to how np.linalg already works.
>
> Eric
>
> ​
>
> On Mon, 30 Apr 2018 at 14:34 Stephan Hoyer <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     On Sun, Apr 29, 2018 at 2:48 AM Matti Picus <[hidden email]
>     <mailto:[hidden email]>> wrote:
>
>         The proposed solution to issue #9029 is to extend the meaning
>         of a signature so "syntax like (n?,k),(k,m?)->(n?,m?) could
>         mean that n and m are optional dimensions; if missing in the
>         input, they're treated as 1, and
>         then dropped from the output"
>
>
>     I agree that this is an elegant fix for matmul, but are there
>     other use-cases for "optional dimensions" in gufuncs?
>
>     It feels a little wrong to add gufunc features if we can only
>     think of one function that can use them.
>     _______________________________________________
>     NumPy-Discussion mailing list
>     [hidden email] <mailto:[hidden email]>
>     https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion

I will try to prototype this solution and put it up for comment,
alongside the multi-signature one.
Matti
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Extending ufunc signature syntax for matmul, frozen dimensions

Marten van Kerkwijk
Just for completeness: there are *four* gufuncs (matmat, matvec,
vecmat, and vecvec).

I remain torn about the best way forward. The main argument against
using them inside matmul is that in order to decide which of the four
to use, matmul has to have access to the `shape` of the arguments.
This meants that means that `__array_ufunc__` cannot be used to
override `matmul` (or `@`) for any object which does not have a shape.
From that perspective, multiple signatures is definitely a more
elegant solution.

An advantage of the separate solution is that they are useful
independently of whether they are used internally in `matmul`; though,
then again, with a multi-signature matmul, these would be trivially
created as convenience functions.

-- Marten
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Extending ufunc signature syntax for matmul, frozen dimensions

einstein.edison
There is always the option of any downstream object overriding matmul, and I fail to see which objects won't have a shape. - Hameer


On 01/05/2018 at 21:08, Marten wrote:

Just for completeness: there are *four* gufuncs (matmat, matvec,
vecmat, and vecvec).

I remain torn about the best way forward. The main argument against
using them inside matmul is that in order to decide which of the four
to use, matmul has to have access to the `shape` of the arguments.
This meants that means that `__array_ufunc__` cannot be used to
override `matmul` (or `@`) for any object which does not have a shape.
From that perspective, multiple signatures is definitely a more
elegant solution.

An advantage of the separate solution is that they are useful
independently of whether they are used internally in `matmul`; though,
then again, with a multi-signature matmul, these would be trivially
created as convenience functions.

-- Marten
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Extending ufunc signature syntax for matmul, frozen dimensions

Marten van Kerkwijk
On Wed, May 2, 2018 at 6:24 AM, Hameer Abbasi <[hidden email]> wrote:
> There is always the option of any downstream object overriding matmul, and I
> fail to see which objects won't have a shape. - Hameer

I think we should not decide too readily on what is "reasonable" to
expect for a ufunc input. For instance, I'm currently writing a
chained-ufunc class which uses __array_ufunc__ to help make a chain
(something like `chained_ufunc = np.sin(np.multiply(Input(),
Input()))`). Here, my `Input` class defines `__array_ufunc__` but
definitely does not have a shape, and I would like to be able to
override `np.matmul` just like every other ufunc.

-- Marten
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Extending ufunc signature syntax for matmul, frozen dimensions

Stephan Hoyer-2
On Wed, May 2, 2018 at 8:39 AM Marten van Kerkwijk <[hidden email]> wrote:
I think we should not decide too readily on what is "reasonable" to
expect for a ufunc input.

I agree strongly with this.

I can think of a couple of other use-cases off hand:
- xarray.Dataset is a dict-like container of multiple arrays. Matrix-multiplication with a numpy array could make sense (just map over all the contained arrays), but xarray.Dataset itself is not an array and thus does not define shape.
- tensorflow.Tensor can have a dynamic shape that is only known when computation is explicitly run, not when computation is defined in Python.

The problem is even bigger for np.matmul because NumPy also wants to use the same logic for overriding @, and Python's built-in operators definitely should not have such restrictions.

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Extending ufunc signature syntax for matmul, frozen dimensions

mattip
In reply to this post by Marten van Kerkwijk
On 01/05/18 21:08, Marten van Kerkwijk wrote:

> Just for completeness: there are *four* gufuncs (matmat, matvec,
> vecmat, and vecvec).
>
> I remain torn about the best way forward. The main argument against
> using them inside matmul is that in order to decide which of the four
> to use, matmul has to have access to the `shape` of the arguments.
> This meants that means that `__array_ufunc__` cannot be used to
> override `matmul` (or `@`) for any object which does not have a shape.
>  From that perspective, multiple signatures is definitely a more
> elegant solution.
>
> An advantage of the separate solution is that they are useful
> independently of whether they are used internally in `matmul`; though,
> then again, with a multi-signature matmul, these would be trivially
> created as convenience functions.
>
> -- Marten
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
My goal is to solve issue #9028, "no way to override matmul/@ if
__array_ufunc__ is set on other". Maybe I am too focused on that, it
seems shape does not come into play here.

Given a call to matmul(self, other) it appears to me that the decision
to commit to self.matmul or to call other.__array_ufunc__("__call__",
self.matmul, ...) is independent of the shapes and needs only nin and nout.
In other words, the implementation of matmul becomes (simplified):

(matmul(self, other) called)->

     (use __array_ufunc__ and nin and nout to decide whether to defer to
other's __array_ufunc__ via PyUFunc_CheckOverride which implements NEP13) ->

         (yes: call other.__array_ufunc__ as for any other ufunc),

         (no: call matmul like we currently do, no more __aray__ufunc__
testing needed)

So the two avenues I am trying are

1) make matmul a gufunc and then it will automatically use the
__array_ufunc__ machinery without any added changes, but this requires
expanding the meaning of a signature to allow dispatch

2) generalize the __array_ufunc__ machinery to handle some kind of
wrapped function, the wrapper knows about nin and nout and calls
PyUFunc_CheckOverride, which would allow matmul to work unchanged and
might support other functions as well.

The issue of whether matmat, vecmat, matvec, and vecvec are functions,
gufuncs accessible from userland, or not defined at all is secondary to
the current issue of overriding matmul , we can decide that in the future.
If we do create ufuncs for these variants, calling a.vecmat(other) for
instance will still resolve to other's __array_ufunc__ without needing
to explore other's shape.

I probably misunderstood what you were driving at because I am so
focused on this particular issue.

Matti
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Extending ufunc signature syntax for matmul, frozen dimensions

Marten van Kerkwijk
Hi Matti,

In the original implementation of what was then __numpy_ufunc__, we
had overrides for both `np.dot` and `np.matmul` that worked exactly as
your option (2), but we decided in the end that those really are not
true ufuncs and we should not include ufunc mimics in the mix as
someone using `__array_ufunc__` should be able to count on being
passed a ufunc, including all its properties. Perhaps this needs
revisiting, and we should have some UFuncABC... But my own feeling
remains that matmul is close enough to a (set of) gufunc that making
it fit the gufunc mold is the way to go...

All the best,

Marten
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion