Forcing gufunc to error with size zero input

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Forcing gufunc to error with size zero input

Warren Weckesser-2
I'm experimenting with gufuncs, and I just created a simple one with
signature '(i)->()'.  Is there a way to configure the gufunc itself so
that an empty array results in an error?  Or would I have to create a
Python wrapper around the gufunc that does the error checking?
Currently, when passed an empty array, the ufunc loop is called with
the core dimension associated with i set to 0.  It would be nice if
the code didn't get that far, and the ufunc machinery "knew" that this
gufunc didn't accept a core dimension that is 0.  I'd like to
automatically get an error, something like the error produced by
`np.max([])`.

Warren
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Forcing gufunc to error with size zero input

Eric Wieser
Can you just raise an exception in the gufuncs inner loop? Or is there no mechanism to do that today?

I don't think you were proposing that core dimensions should _never_ be allowed to be 0, but if you were I disagree. I spent a fair amount of work enabling that for linalg because it provided some convenient base cases.

We could go down the route of augmenting the gufuncs signature syntax to support requiring non-empty dimensions, like we did for optional ones - although IMO we should consider switching from a string minilanguage to a structured object specification if we plan to go too much further with extending it.

On Sat, Sep 28, 2019, 17:47 Warren Weckesser <[hidden email]> wrote:
I'm experimenting with gufuncs, and I just created a simple one with
signature '(i)->()'.  Is there a way to configure the gufunc itself so
that an empty array results in an error?  Or would I have to create a
Python wrapper around the gufunc that does the error checking?
Currently, when passed an empty array, the ufunc loop is called with
the core dimension associated with i set to 0.  It would be nice if
the code didn't get that far, and the ufunc machinery "knew" that this
gufunc didn't accept a core dimension that is 0.  I'd like to
automatically get an error, something like the error produced by
`np.max([])`.

Warren
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Forcing gufunc to error with size zero input

Warren Weckesser-2
On 9/28/19, Eric Wieser <[hidden email]> wrote:
> Can you just raise an exception in the gufuncs inner loop? Or is there no
> mechanism to do that today?

Maybe?  I don't know what is the idiomatic way to handle errors
detected in an inner loop.  And pushing this particular error
detection into the inner loop doesn't feel right.


>
> I don't think you were proposing that core dimensions should _never_ be
> allowed to be 0,


No, I'm not suggesting that.  There are many cases where a length 0
core dimension is fine.

I'm interested in the case where there is not a meaningful definition
of the operation on the empty set.  The mean is an example.  Currently
`np.mean([])` generates two warnings (one useful, the other cryptic
and apparently incidental), and returns nan.  Returning nan is one way
to handle such a case; another is to raise an error like `np.amax([])`
does.  I'd like to raise an error in the example that I'm working on
('peaktopeak' at https://github.com/WarrenWeckesser/npuff).  The
function is a gufunc, not a reduction of a binary operation, so the
'identity' argument  of PyUFunc_FromFuncAndDataAndSignature has no
effect.

> but if you were I disagree. I spent a fair amount of work
> enabling that for linalg because it provided some convenient base cases.
>
> We could go down the route of augmenting the gufuncs signature syntax to
> support requiring non-empty dimensions, like we did for optional ones -
> although IMO we should consider switching from a string minilanguage to a
> structured object specification if we plan to go too much further with
> extending it.

After only a quick glance at that code: one option is to add a '+'
after the input names in the signature that must have a length that is
at least 1.  So the signature for functions like `mean` (if you were
to reimplement it as a gufunc, and wanted an error instead of nan),
`amax`, `ptp`, etc, would be '(i+)->()'.

However, the only meaningful uses-cases of this enhancement that I've
come up with are these simple reductions.  So I don't know if making
such a change to the signature is worthwhile.  On the other hand,
there are many examples of useful 1-d reductions that aren't the
reduction of an associative binary operation.  It might be worthwhile
to have a new convenience function just for the case '(i)->()', maybe
something like PyUFunc_OneDReduction_FromFuncAndData (ugh, that's
ugly, but I think you get the idea), and that function can have an
argument to specify that the length must be at least 1.

I'll see if that is feasible, but I won't be surprised to learn that
there are good reasons for *not* doing that.

Warren



>
> On Sat, Sep 28, 2019, 17:47 Warren Weckesser <[hidden email]>
> wrote:
>
>> I'm experimenting with gufuncs, and I just created a simple one with
>> signature '(i)->()'.  Is there a way to configure the gufunc itself so
>> that an empty array results in an error?  Or would I have to create a
>> Python wrapper around the gufunc that does the error checking?
>> Currently, when passed an empty array, the ufunc loop is called with
>> the core dimension associated with i set to 0.  It would be nice if
>> the code didn't get that far, and the ufunc machinery "knew" that this
>> gufunc didn't accept a core dimension that is 0.  I'd like to
>> automatically get an error, something like the error produced by
>> `np.max([])`.
>>
>> Warren
>> _______________________________________________
>> NumPy-Discussion mailing list
>> [hidden email]
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Forcing gufunc to error with size zero input

Warren Weckesser-2
On 9/29/19, Warren Weckesser <[hidden email]> wrote:

> On 9/28/19, Eric Wieser <[hidden email]> wrote:
>> Can you just raise an exception in the gufuncs inner loop? Or is there no
>> mechanism to do that today?
>
> Maybe?  I don't know what is the idiomatic way to handle errors
> detected in an inner loop.  And pushing this particular error
> detection into the inner loop doesn't feel right.
>
>
>>
>> I don't think you were proposing that core dimensions should _never_ be
>> allowed to be 0,
>
>
> No, I'm not suggesting that.  There are many cases where a length 0
> core dimension is fine.
>
> I'm interested in the case where there is not a meaningful definition
> of the operation on the empty set.  The mean is an example.  Currently
> `np.mean([])` generates two warnings (one useful, the other cryptic
> and apparently incidental), and returns nan.  Returning nan is one way
> to handle such a case; another is to raise an error like `np.amax([])`
> does.  I'd like to raise an error in the example that I'm working on
> ('peaktopeak' at https://github.com/WarrenWeckesser/npuff).  The
> function is a gufunc, not a reduction of a binary operation, so the
> 'identity' argument  of PyUFunc_FromFuncAndDataAndSignature has no
> effect.
>
>> but if you were I disagree. I spent a fair amount of work
>> enabling that for linalg because it provided some convenient base cases.
>>
>> We could go down the route of augmenting the gufuncs signature syntax to
>> support requiring non-empty dimensions, like we did for optional ones -
>> although IMO we should consider switching from a string minilanguage to a
>> structured object specification if we plan to go too much further with
>> extending it.
>
> After only a quick glance at that code: one option is to add a '+'
> after the input names in the signature that must have a length that is
> at least 1.  So the signature for functions like `mean` (if you were
> to reimplement it as a gufunc, and wanted an error instead of nan),
> `amax`, `ptp`, etc, would be '(i+)->()'.
>
> However, the only meaningful uses-cases of this enhancement that I've
> come up with are these simple reductions.


Of course, just minutes after sending the email, I realized I *do*
know of other signatures that could benefit from a check on the core
dimension size.  An implementation of Pearson's correlation
coefficient as a gufunc would have signature (i),(i)->(), and the core
dimension i must be at least *2* for the calculation to be well
defined.  Other correlations would also likely require a nonzero core
dimension.

Warren



>  So I don't know if making
> such a change to the signature is worthwhile.  On the other hand,
> there are many examples of useful 1-d reductions that aren't the
> reduction of an associative binary operation.  It might be worthwhile
> to have a new convenience function just for the case '(i)->()', maybe
> something like PyUFunc_OneDReduction_FromFuncAndData (ugh, that's
> ugly, but I think you get the idea), and that function can have an
> argument to specify that the length must be at least 1.
>
> I'll see if that is feasible, but I won't be surprised to learn that
> there are good reasons for *not* doing that.
>
> Warren
>
>
>
>>
>> On Sat, Sep 28, 2019, 17:47 Warren Weckesser <[hidden email]>
>> wrote:
>>
>>> I'm experimenting with gufuncs, and I just created a simple one with
>>> signature '(i)->()'.  Is there a way to configure the gufunc itself so
>>> that an empty array results in an error?  Or would I have to create a
>>> Python wrapper around the gufunc that does the error checking?
>>> Currently, when passed an empty array, the ufunc loop is called with
>>> the core dimension associated with i set to 0.  It would be nice if
>>> the code didn't get that far, and the ufunc machinery "knew" that this
>>> gufunc didn't accept a core dimension that is 0.  I'd like to
>>> automatically get an error, something like the error produced by
>>> `np.max([])`.
>>>
>>> Warren
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> [hidden email]
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>
>
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Forcing gufunc to error with size zero input

Sebastian Berg
In reply to this post by Warren Weckesser-2
On Sun, 2019-09-29 at 00:20 -0400, Warren Weckesser wrote:
> On 9/28/19, Eric Wieser <[hidden email]> wrote:
> > Can you just raise an exception in the gufuncs inner loop? Or is
> > there no
> > mechanism to do that today?
>
> Maybe?  I don't know what is the idiomatic way to handle errors
> detected in an inner loop.  And pushing this particular error
> detection into the inner loop doesn't feel right.
>

Basically, since you want to release the GIL, you can grab and set an
error right now. That will work, although grabbing the GIL from the
inner loop is not ideal, at least in the sense that it does not work
with subinterpreters (but numpy does not currently work with those in
any case). We do use this internally, I believe.

Well, even without dtypes, I think we probably want a few extra API
around UFuncs, and that is setup/teardown (not necessarily as such
functions), as well as a return value for the inner loop to signal
iteration stop.

There was a long discussion about that, for example here:
https://github.com/numpy/numpy/issues/12518

There is another use-case, that we probably want to allow optimized
loop selection (necessary/used in casting)..

Note that I believe all of this type of logic should be moved into a
UFuncImpl [0] object, so that it can be DType (and especially user
DType) specific without bloating up the current UFunc object too much.
Although that puts a lot of power out there, so may be good to limit it
a lot iniyially


Best,

Sebastian


[0] It was Erics suggestion/name, I do not know if it came up earlier.


>
> > I don't think you were proposing that core dimensions should
> > _never_ be
> > allowed to be 0,
>
> No, I'm not suggesting that.  There are many cases where a length 0
> core dimension is fine.
>
> I'm interested in the case where there is not a meaningful definition
> of the operation on the empty set.  The mean is an
> example.  Currently
> `np.mean([])` generates two warnings (one useful, the other cryptic
> and apparently incidental), and returns nan.  Returning nan is one
> way
> to handle such a case; another is to raise an error like
> `np.amax([])`
> does.  I'd like to raise an error in the example that I'm working on
> ('peaktopeak' at https://github.com/WarrenWeckesser/npuff).  The
> function is a gufunc, not a reduction of a binary operation, so the
> 'identity' argument  of PyUFunc_FromFuncAndDataAndSignature has no
> effect.
>
> > but if you were I disagree. I spent a fair amount of work
> > enabling that for linalg because it provided some convenient base
> > cases.
> >
> > We could go down the route of augmenting the gufuncs signature
> > syntax to
> > support requiring non-empty dimensions, like we did for optional
> > ones -
> > although IMO we should consider switching from a string
> > minilanguage to a
> > structured object specification if we plan to go too much further
> > with
> > extending it.
>
> After only a quick glance at that code: one option is to add a '+'
> after the input names in the signature that must have a length that
> is
> at least 1.  So the signature for functions like `mean` (if you were
> to reimplement it as a gufunc, and wanted an error instead of nan),
> `amax`, `ptp`, etc, would be '(i+)->()'.
>
> However, the only meaningful uses-cases of this enhancement that I've
> come up with are these simple reductions.  So I don't know if making
> such a change to the signature is worthwhile.  On the other hand,
> there are many examples of useful 1-d reductions that aren't the
> reduction of an associative binary operation.  It might be worthwhile
> to have a new convenience function just for the case '(i)->()', maybe
> something like PyUFunc_OneDReduction_FromFuncAndData (ugh, that's
> ugly, but I think you get the idea), and that function can have an
> argument to specify that the length must be at least 1.
>
> I'll see if that is feasible, but I won't be surprised to learn that
> there are good reasons for *not* doing that.
>
> Warren
>
>
>
> > On Sat, Sep 28, 2019, 17:47 Warren Weckesser <
> > [hidden email]>
> > wrote:
> >
> > > I'm experimenting with gufuncs, and I just created a simple one
> > > with
> > > signature '(i)->()'.  Is there a way to configure the gufunc
> > > itself so
> > > that an empty array results in an error?  Or would I have to
> > > create a
> > > Python wrapper around the gufunc that does the error checking?
> > > Currently, when passed an empty array, the ufunc loop is called
> > > with
> > > the core dimension associated with i set to 0.  It would be nice
> > > if
> > > the code didn't get that far, and the ufunc machinery "knew" that
> > > this
> > > gufunc didn't accept a core dimension that is 0.  I'd like to
> > > automatically get an error, something like the error produced by
> > > `np.max([])`.
> > >
> > > Warren
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > [hidden email]
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > >
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
>

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Forcing gufunc to error with size zero input

Warren Weckesser-2
In reply to this post by Warren Weckesser-2
On 9/29/19, Warren Weckesser <[hidden email]> wrote:

> On 9/28/19, Eric Wieser <[hidden email]> wrote:
>> Can you just raise an exception in the gufuncs inner loop? Or is there no
>> mechanism to do that today?
>
> Maybe?  I don't know what is the idiomatic way to handle errors
> detected in an inner loop.  And pushing this particular error
> detection into the inner loop doesn't feel right.
>
>
>>
>> I don't think you were proposing that core dimensions should _never_ be
>> allowed to be 0,
>
>
> No, I'm not suggesting that.  There are many cases where a length 0
> core dimension is fine.
>
> I'm interested in the case where there is not a meaningful definition
> of the operation on the empty set.  The mean is an example.  Currently
> `np.mean([])` generates two warnings (one useful, the other cryptic
> and apparently incidental), and returns nan.  Returning nan is one way
> to handle such a case; another is to raise an error like `np.amax([])`
> does.  I'd like to raise an error in the example that I'm working on
> ('peaktopeak' at https://github.com/WarrenWeckesser/npuff).  The


FYI: I renamed that repository to 'ufunclab':
    https://github.com/WarrenWeckesser/ufunclab

Warren


> function is a gufunc, not a reduction of a binary operation, so the
> 'identity' argument  of PyUFunc_FromFuncAndDataAndSignature has no
> effect.
>
>> but if you were I disagree. I spent a fair amount of work
>> enabling that for linalg because it provided some convenient base cases.
>>
>> We could go down the route of augmenting the gufuncs signature syntax to
>> support requiring non-empty dimensions, like we did for optional ones -
>> although IMO we should consider switching from a string minilanguage to a
>> structured object specification if we plan to go too much further with
>> extending it.
>
> After only a quick glance at that code: one option is to add a '+'
> after the input names in the signature that must have a length that is
> at least 1.  So the signature for functions like `mean` (if you were
> to reimplement it as a gufunc, and wanted an error instead of nan),
> `amax`, `ptp`, etc, would be '(i+)->()'.
>
> However, the only meaningful uses-cases of this enhancement that I've
> come up with are these simple reductions.  So I don't know if making
> such a change to the signature is worthwhile.  On the other hand,
> there are many examples of useful 1-d reductions that aren't the
> reduction of an associative binary operation.  It might be worthwhile
> to have a new convenience function just for the case '(i)->()', maybe
> something like PyUFunc_OneDReduction_FromFuncAndData (ugh, that's
> ugly, but I think you get the idea), and that function can have an
> argument to specify that the length must be at least 1.
>
> I'll see if that is feasible, but I won't be surprised to learn that
> there are good reasons for *not* doing that.
>
> Warren
>
>
>
>>
>> On Sat, Sep 28, 2019, 17:47 Warren Weckesser <[hidden email]>
>> wrote:
>>
>>> I'm experimenting with gufuncs, and I just created a simple one with
>>> signature '(i)->()'.  Is there a way to configure the gufunc itself so
>>> that an empty array results in an error?  Or would I have to create a
>>> Python wrapper around the gufunc that does the error checking?
>>> Currently, when passed an empty array, the ufunc loop is called with
>>> the core dimension associated with i set to 0.  It would be nice if
>>> the code didn't get that far, and the ufunc machinery "knew" that this
>>> gufunc didn't accept a core dimension that is 0.  I'd like to
>>> automatically get an error, something like the error produced by
>>> `np.max([])`.
>>>
>>> Warren
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> [hidden email]
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>
>
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion