Why does fancy indexing work like this?

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Why does fancy indexing work like this?

Aaron Meurer
Why does fancy indexing have this behavior?

>>> a = np.empty((0, 1, 2))
>>> b = np.empty((1, 1, 2))
>>> a[np.array([10, 10])]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: index 10 is out of bounds for axis 0 with size 0
>>> a[:, np.array([10, 10])]
array([], shape=(0, 2, 2), dtype=float64)
>>> a[:, :, np.array([10, 10])]
array([], shape=(0, 1, 2), dtype=float64)
>>> b[np.array([10, 10])]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: index 10 is out of bounds for axis 0 with size 1
>>> b[:, np.array([10, 10])]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: index 10 is out of bounds for axis 1 with size 1
>>> b[:, :, np.array([10, 10])]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: index 10 is out of bounds for axis 2 with size 2

As far as I can tell, the behavior is that if an array has a 0
dimension and an integer array index indexes an axis that isn't 0,
there are no bounds checks. Why does it do this? It seems to be
inconsistent with the behavior of shape () fancy indices (integer
indices). I couldn't find any reference to this behavior in
https://numpy.org/doc/stable/reference/arrays.indexing.html.

Aaron Meurer
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Why does fancy indexing work like this?

Sebastian Berg
On Wed, 2020-07-22 at 16:23 -0600, Aaron Meurer wrote:

> Why does fancy indexing have this behavior?
>
> > > > a = np.empty((0, 1, 2))
> > > > b = np.empty((1, 1, 2))
> > > > a[np.array([10, 10])]
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> IndexError: index 10 is out of bounds for axis 0 with size 0
> > > > a[:, np.array([10, 10])]
> array([], shape=(0, 2, 2), dtype=float64)
> > > > a[:, :, np.array([10, 10])]
> array([], shape=(0, 1, 2), dtype=float64)
> > > > b[np.array([10, 10])]
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> IndexError: index 10 is out of bounds for axis 0 with size 1
> > > > b[:, np.array([10, 10])]
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> IndexError: index 10 is out of bounds for axis 1 with size 1
> > > > b[:, :, np.array([10, 10])]
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> IndexError: index 10 is out of bounds for axis 2 with size 2
>
> As far as I can tell, the behavior is that if an array has a 0
> dimension and an integer array index indexes an axis that isn't 0,
> there are no bounds checks. Why does it do this? It seems to be
> inconsistent with the behavior of shape () fancy indices (integer
> indices). I couldn't find any reference to this behavior in
> https://numpy.org/doc/stable/reference/arrays.indexing.html.
>
The reason is because we used to not do this when there are *two*
advanced indices:

   arr = np.ones((5, 6))
   arr[[], [10, 10]]

giving an empty result.  If you check on master (and maybe on 1.19.x, I
am not sure). You should see that all of your examples give a
deprecation warning to be turned into an error (except the example I
gave above, which can be argued to be correct).

- Sebastian


> Aaron Meurer
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
>


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Why does fancy indexing work like this?

Aaron Meurer
Ah, so I guess I caught this issue right as it got fixed. There are no
warnings in 1.19.0, but I can confirm I get the warnings in numpy
master. 1.19.1 isn't on conda yet, but I tried building it and didn't
get the warning there. So I guess I need to wait for 0.19.2.

How long do deprecation cycles like this tend to last (I'm also
curious when the warnings for things like a[[[0, 1], [0, 1]]] will go
away)?

Aaron Meurer

On Wed, Jul 22, 2020 at 4:32 PM Sebastian Berg
<[hidden email]> wrote:

>
> On Wed, 2020-07-22 at 16:23 -0600, Aaron Meurer wrote:
> > Why does fancy indexing have this behavior?
> >
> > > > > a = np.empty((0, 1, 2))
> > > > > b = np.empty((1, 1, 2))
> > > > > a[np.array([10, 10])]
> > Traceback (most recent call last):
> >   File "<stdin>", line 1, in <module>
> > IndexError: index 10 is out of bounds for axis 0 with size 0
> > > > > a[:, np.array([10, 10])]
> > array([], shape=(0, 2, 2), dtype=float64)
> > > > > a[:, :, np.array([10, 10])]
> > array([], shape=(0, 1, 2), dtype=float64)
> > > > > b[np.array([10, 10])]
> > Traceback (most recent call last):
> >   File "<stdin>", line 1, in <module>
> > IndexError: index 10 is out of bounds for axis 0 with size 1
> > > > > b[:, np.array([10, 10])]
> > Traceback (most recent call last):
> >   File "<stdin>", line 1, in <module>
> > IndexError: index 10 is out of bounds for axis 1 with size 1
> > > > > b[:, :, np.array([10, 10])]
> > Traceback (most recent call last):
> >   File "<stdin>", line 1, in <module>
> > IndexError: index 10 is out of bounds for axis 2 with size 2
> >
> > As far as I can tell, the behavior is that if an array has a 0
> > dimension and an integer array index indexes an axis that isn't 0,
> > there are no bounds checks. Why does it do this? It seems to be
> > inconsistent with the behavior of shape () fancy indices (integer
> > indices). I couldn't find any reference to this behavior in
> > https://numpy.org/doc/stable/reference/arrays.indexing.html.
> >
>
> The reason is because we used to not do this when there are *two*
> advanced indices:
>
>    arr = np.ones((5, 6))
>    arr[[], [10, 10]]
>
> giving an empty result.  If you check on master (and maybe on 1.19.x, I
> am not sure). You should see that all of your examples give a
> deprecation warning to be turned into an error (except the example I
> gave above, which can be argued to be correct).
>
> - Sebastian
>
>
> > Aaron Meurer
> > _______________________________________________
> > NumPy-Discussion mailing list
> > [hidden email]
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Why does fancy indexing work like this?

Aaron Meurer
On Wed, Jul 22, 2020 at 4:55 PM Aaron Meurer <[hidden email]> wrote:
>
> Ah, so I guess I caught this issue right as it got fixed. There are no
> warnings in 1.19.0, but I can confirm I get the warnings in numpy
> master. 1.19.1 isn't on conda yet, but I tried building it and didn't
> get the warning there. So I guess I need to wait for 0.19.2.

Or rather 1.20 I guess https://github.com/numpy/numpy/pull/15900.

By the way, it would be useful if deprecation warnings like this had a
functionality to enable the actual post-deprecation behavior. Right
now the warning says to run warnings.simplefilter('error'), but this
causes the above indexing to raise DeprecationWarning, not IndexError.

Aaron Meurer
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Why does fancy indexing work like this?

Sebastian Berg
In reply to this post by Aaron Meurer
On Wed, 2020-07-22 at 16:55 -0600, Aaron Meurer wrote:
> Ah, so I guess I caught this issue right as it got fixed. There are
> no

Yes, on a general note. Advanced indexing grew over time in a maze of
paths, and things like empty arrays were long not too well supported in
many parts of NumPy.  That this went through

> warnings in 1.19.0, but I can confirm I get the warnings in numpy
> master. 1.19.1 isn't on conda yet, but I tried building it and didn't
> get the warning there. So I guess I need to wait for 0.19.2.

We don't add warnings in minor releases, so 1.19.2 will definitely
never get it.  I did not remember whether it was in there, because it
was merged around the same time 1.19.x was branched.

About your warnings, do you have a nice way to do that?  The mechanism
for warnings does not really give a good way to catch that a warning
was raised and then turn it into an error.  Unless someone contributes
a slick way to do it, I am not sure the complexity pays off.

IIRC, I added the note about raising the warning, because in this
particular case the deprecation warning (turned into an error) happens
to be chained due to implementation details.  (so you do see the
"original" error printed out).

>
> How long do deprecation cycles like this tend to last (I'm also
> curious when the warnings for things like a[[[0, 1], [0, 1]]] will go
> away)?

Not sure, this is a corner case, and is bugging pandas a bit, so it may
be a bit quicker, but likely still 2 releases?

We are not always good about phasing out deprecations immediately when
it is plausible. The one you mention strikes me as a bigger one though,
so I think we should wait about 2 years.  It is plausible that we are
there already, even for a while.

Cheers,

Sebastian


>
> Aaron Meurer
>
> On Wed, Jul 22, 2020 at 4:32 PM Sebastian Berg
> <[hidden email]> wrote:
> > On Wed, 2020-07-22 at 16:23 -0600, Aaron Meurer wrote:
> > > Why does fancy indexing have this behavior?
> > >
> > > > > > a = np.empty((0, 1, 2))
> > > > > > b = np.empty((1, 1, 2))
> > > > > > a[np.array([10, 10])]
> > > Traceback (most recent call last):
> > >   File "<stdin>", line 1, in <module>
> > > IndexError: index 10 is out of bounds for axis 0 with size 0
> > > > > > a[:, np.array([10, 10])]
> > > array([], shape=(0, 2, 2), dtype=float64)
> > > > > > a[:, :, np.array([10, 10])]
> > > array([], shape=(0, 1, 2), dtype=float64)
> > > > > > b[np.array([10, 10])]
> > > Traceback (most recent call last):
> > >   File "<stdin>", line 1, in <module>
> > > IndexError: index 10 is out of bounds for axis 0 with size 1
> > > > > > b[:, np.array([10, 10])]
> > > Traceback (most recent call last):
> > >   File "<stdin>", line 1, in <module>
> > > IndexError: index 10 is out of bounds for axis 1 with size 1
> > > > > > b[:, :, np.array([10, 10])]
> > > Traceback (most recent call last):
> > >   File "<stdin>", line 1, in <module>
> > > IndexError: index 10 is out of bounds for axis 2 with size 2
> > >
> > > As far as I can tell, the behavior is that if an array has a 0
> > > dimension and an integer array index indexes an axis that isn't
> > > 0,
> > > there are no bounds checks. Why does it do this? It seems to be
> > > inconsistent with the behavior of shape () fancy indices (integer
> > > indices). I couldn't find any reference to this behavior in
> > > https://numpy.org/doc/stable/reference/arrays.indexing.html.
> > >
> >
> > The reason is because we used to not do this when there are *two*
> > advanced indices:
> >
> >    arr = np.ones((5, 6))
> >    arr[[], [10, 10]]
> >
> > giving an empty result.  If you check on master (and maybe on
> > 1.19.x, I
> > am not sure). You should see that all of your examples give a
> > deprecation warning to be turned into an error (except the example
> > I
> > gave above, which can be argued to be correct).
> >
> > - Sebastian
> >
> >
> > > Aaron Meurer
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > [hidden email]
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > [hidden email]
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
>

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Why does fancy indexing work like this?

Aaron Meurer
> About your warnings, do you have a nice way to do that?  The mechanism
> for warnings does not really give a good way to catch that a warning
> was raised and then turn it into an error.  Unless someone contributes
> a slick way to do it, I am not sure the complexity pays off.

I don't really know how flags and options and such work in NumPy, but
I would imagine something like

if flags['post-deprecation'] = True: # Either a single flag for all
deprecations or a per-deprecation flag
    raise IndexError(...)
else:
    warnings.warn(...)

I don't know if the fact that the code that does this is in C
complicates things.

In other words, something that works kind of like __future__ flags for
upgrading the behavior to post-deprecation.

>
> IIRC, I added the note about raising the warning, because in this
> particular case the deprecation warning (turned into an error) happens
> to be chained due to implementation details.  (so you do see the
> "original" error printed out).

Yes, it's nice that you can see it. But for my use case, I want to be
able to "except IndexError". Basically, for ndindex, I test against
NumPy to make sure the semantics are identical, and that includes
making sure identical exceptions are raised. I also want to make it so
that the ndindex semantics always follow post-deprecation behavior for
any NumPy deprecations, since that leads to a cleaner API. But that
means that my test code has to do fancy shenanigans to catch these
deprecation warnings and treat them like the right errors.

But even as a general principle, I think for any deprecation warning,
users should be able to update their code in such a way that the
current version doesn't give the warning and also it will continue to
work and be idiomatic for future versions. For simple deprecations
where you remove a function x(), this is often as simple as telling
people to replace x() with y(). But these deprecations aren't so
simple, because the indexing itself is valid and will stay valid, it's
just the behavior that will change. If there's no way to do this, then
a deprecation warning serves little purpose because users who see the
warning won't be able to do anything about it until things actually
change. There would be little difference from just changing things
outright. For the list as tuple indexing thing, you can already kind
of do this by making sure your fancy indices are always arrays. For
the out of bounds one, it's a little harder. I guess for most
use-cases, you aren't actually checking for IndexErrors, and the thing
that will become an error usually indicates a bug in user code, so
maybe it isn't a huge deal (I admit my use-cases aren't typical).

Aaron Meurer
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Why does fancy indexing work like this?

Sebastian Berg
On Wed, 2020-07-22 at 17:35 -0600, Aaron Meurer wrote:

> > About your warnings, do you have a nice way to do that?  The
> > mechanism
> > for warnings does not really give a good way to catch that a
> > warning
> > was raised and then turn it into an error.  Unless someone
> > contributes
> > a slick way to do it, I am not sure the complexity pays off.
>
> I don't really know how flags and options and such work in NumPy, but
> I would imagine something like
>
> if flags['post-deprecation'] = True: # Either a single flag for all
> deprecations or a per-deprecation flag
>     raise IndexError(...)
> else:
>     warnings.warn(...)
>
We have never done global flags for these things much in NumPy, I don't
know of precedence in other packages, possibly aside future imports,
but I am not even sure they have been used in this way.

> I don't know if the fact that the code that does this is in C
> complicates things.
>
> In other words, something that works kind of like __future__ flags
> for
> upgrading the behavior to post-deprecation.
>
> > IIRC, I added the note about raising the warning, because in this
> > particular case the deprecation warning (turned into an error)
> > happens
> > to be chained due to implementation details.  (so you do see the
> > "original" error printed out).
>
> Yes, it's nice that you can see it. But for my use case, I want to be
> able to "except IndexError". Basically, for ndindex, I test against
> NumPy to make sure the semantics are identical, and that includes
> making sure identical exceptions are raised. I also want to make it
> so
> that the ndindex semantics always follow post-deprecation behavior
> for
> any NumPy deprecations, since that leads to a cleaner API. But that
> means that my test code has to do fancy shenanigans to catch these
> deprecation warnings and treat them like the right errors.
>
> But even as a general principle, I think for any deprecation warning,
> users should be able to update their code in such a way that the
> current version doesn't give the warning and also it will continue to
For FutureWarnings, I will always try very hard to give an option to
opt-in to new behaviour or old behaviour – ideally with code compatible
also with earlier NumPy versions.

Here, for a DeprecationWarning that has obviously no "alternative", I
cannot think of any precedence in any other package or Python itself
doing such a dance.  And it is extremely fringe (you only need it
because you are testing another package against numpy behaviour!).

So I am happy to merge it if its proposed (maybe its easier for you to
add this to NumPy then work around it in your tests), but I am honestly
concerned that proposing this as a general principle is far more churn
then worth the trouble.  At least unless there is some consensus (and
probably precendence in the scientific python ecosystem or python
itself).

Cheers,

Sebastian


> work and be idiomatic for future versions. For simple deprecations
> where you remove a function x(), this is often as simple as telling
> people to replace x() with y(). But these deprecations aren't so
> simple, because the indexing itself is valid and will stay valid,
> it's
> just the behavior that will change. If there's no way to do this,
> then
> a deprecation warning serves little purpose because users who see the
> warning won't be able to do anything about it until things actually
> change. There would be little difference from just changing things
> outright. For the list as tuple indexing thing, you can already kind
> of do this by making sure your fancy indices are always arrays. For
> the out of bounds one, it's a little harder. I guess for most
> use-cases, you aren't actually checking for IndexErrors, and the
> thing
> that will become an error usually indicates a bug in user code, so
> maybe it isn't a huge deal (I admit my use-cases aren't typical).
>
> Aaron Meurer
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
>

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Why does fancy indexing work like this?

Sebastian Berg
On Thu, 2020-07-23 at 10:18 -0500, Sebastian Berg wrote:

> On Wed, 2020-07-22 at 17:35 -0600, Aaron Meurer wrote:
> > > About your warnings, do you have a nice way to do that?  The
> > > mechanism
> > > for warnings does not really give a good way to catch that a
> > > warning
> > > was raised and then turn it into an error.  Unless someone
> > > contributes
> > > a slick way to do it, I am not sure the complexity pays off.
> >
> > I don't really know how flags and options and such work in NumPy,
> > but
> > I would imagine something like
> >
> > if flags['post-deprecation'] = True: # Either a single flag for all
> > deprecations or a per-deprecation flag
> >     raise IndexError(...)
> > else:
> >     warnings.warn(...)
> >
>
> We have never done global flags for these things much in NumPy, I
> don't
> know of precedence in other packages, possibly aside future imports,
> but I am not even sure they have been used in this way.
>
> > I don't know if the fact that the code that does this is in C
> > complicates things.
> >
> > In other words, something that works kind of like __future__ flags
> > for
> > upgrading the behavior to post-deprecation.
> >
> > > IIRC, I added the note about raising the warning, because in this
> > > particular case the deprecation warning (turned into an error)
> > > happens
> > > to be chained due to implementation details.  (so you do see the
> > > "original" error printed out).
> >
> > Yes, it's nice that you can see it. But for my use case, I want to
> > be
> > able to "except IndexError". Basically, for ndindex, I test against
> > NumPy to make sure the semantics are identical, and that includes
> > making sure identical exceptions are raised. I also want to make it
> > so
> > that the ndindex semantics always follow post-deprecation behavior
> > for
> > any NumPy deprecations, since that leads to a cleaner API. But that
> > means that my test code has to do fancy shenanigans to catch these
> > deprecation warnings and treat them like the right errors.
> >
> > But even as a general principle, I think for any deprecation
> > warning,
> > users should be able to update their code in such a way that the
> > current version doesn't give the warning and also it will continue
> > to
>
> For FutureWarnings, I will always try very hard to give an option to
> opt-in to new behaviour or old behaviour – ideally with code
> compatible
> also with earlier NumPy versions.
>
> Here, for a DeprecationWarning that has obviously no "alternative", I
> cannot think of any precedence in any other package or Python itself
> doing such a dance.  And it is extremely fringe (you only need it
> because you are testing another package against numpy behaviour!).
>
> So I am happy to merge it if its proposed (maybe its easier for you
> to
> add this to NumPy then work around it in your tests), but I am
> honestly
> concerned that proposing this as a general principle is far more
> churn
> then worth the trouble.  At least unless there is some consensus (and
> probably precendence in the scientific python ecosystem or python
> itself).
>
After writing this, I realized that I actually remember the *opposite*
discussion occurring before.  I think in some of the equality
deprecations, we actually raise the new error due to an internal
try/except clause.  And there was a complaint that its confusing that a
non-deprecation-warning is raised when the error will only happen with
DeprecationWarnings being set to error.

- Sebastian


> Cheers,
>
> Sebastian
>
>
> > work and be idiomatic for future versions. For simple deprecations
> > where you remove a function x(), this is often as simple as telling
> > people to replace x() with y(). But these deprecations aren't so
> > simple, because the indexing itself is valid and will stay valid,
> > it's
> > just the behavior that will change. If there's no way to do this,
> > then
> > a deprecation warning serves little purpose because users who see
> > the
> > warning won't be able to do anything about it until things actually
> > change. There would be little difference from just changing things
> > outright. For the list as tuple indexing thing, you can already
> > kind
> > of do this by making sure your fancy indices are always arrays. For
> > the out of bounds one, it's a little harder. I guess for most
> > use-cases, you aren't actually checking for IndexErrors, and the
> > thing
> > that will become an error usually indicates a bug in user code, so
> > maybe it isn't a huge deal (I admit my use-cases aren't typical).
> >
> > Aaron Meurer
> > _______________________________________________
> > NumPy-Discussion mailing list
> > [hidden email]
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Why does fancy indexing work like this?

Aaron Meurer
> After writing this, I realized that I actually remember the *opposite*
> discussion occurring before.  I think in some of the equality
> deprecations, we actually raise the new error due to an internal
> try/except clause.  And there was a complaint that its confusing that a
> non-deprecation-warning is raised when the error will only happen with
> DeprecationWarnings being set to error.
>
> - Sebastian

I noticed that warnings.catch_warnings does the right thing with
warnings that are raised alongside an exception (although it is a bit
clunky to use).

Aaron Meurer
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion