Allow __getitem__ to support custom objects

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Allow __getitem__ to support custom objects

Aaron Meurer
For ndindex (https://quansight.github.io/ndindex/), the biggest issue
with the API is that to use an ndindex object to actually index an
array, you have to use a[idx.raw] instead of a[idx]. This is because
for NumPy arrays, you cannot allow custom objects to be indices. The
exception is objects that define __index__, but this only works for
integer indices. If __index__ returns anything other than an integer,
you get an IndexError. This is annoying because it's easy to forget to
do this when working with the ndindex API, and the error message from
NumPy isn't informative about what went wrong unless you know to
expect it.

I'd like to propose an API that would allow custom objects to define
how they should be converted to a standard NumPy index, similar to
__index__ but that supports all index types. I think there are two
options here:

- Allow __index__ to return any index type, not just integers. This is
the simplest because it reuses an existing API, and __index__ is the
best possible name for this API. However, I'm not sure, but this may
actually conflict with the text of PEP 357
(https://www.python.org/dev/peps/pep-0357/). Also, some other APIs use
__index__ to check if something is an indexable integer, which
wouldn't accept generic index. For example, elements of a slice can be
any object that defines __index__.

- Add a new __numpy_index__ API that works like

def __numpy_index__(self):
    return <tuple, integer, slice, newaxis, ellipsis, or integer or
boolean array>

In NumPy, __getitem__ and __setitem__ on ndarray would first check if
the input index type is one of the known types as it currently does,
then it would try __index__, and if neither of those fails, it would
call __numpy_index__(index) and use that.

Note: there is a more general way that NumPy arrays could allow
__getitem__ to be defined on custom objects, which I am NOT proposing.
Instead of an API that returns one of the current predefined index
types (tuple, integer, slice, newaxis, ellipsis, or integer or boolean
array), there could instead be an API that takes the array as input
and returns another array (or view) as an output. This would allow an
object to define itself as an index in arbitrary ways, even if such an
index would not actually be possible via traditional indexing. There
are definitely some interesting ideas that could be done with this,
but this idea would be much more complicated, and isn't something that
I need. Unless the community feels that a more general API like this
would be preferred, I would suggest deferring something like it to a
later discussion.

What would be the best way to go about getting something like this
implemented? Is it simple enough that we can just work out the details
here and on a pull request, or should I write a NEP?

Aaron Meurer
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Allow __getitem__ to support custom objects

Sebastian Berg
On Tue, 2020-10-27 at 17:15 -0600, Aaron Meurer wrote:

> For ndindex (https://quansight.github.io/ndindex/), the biggest issue
> with the API is that to use an ndindex object to actually index an
> array, you have to use a[idx.raw] instead of a[idx]. This is because
> for NumPy arrays, you cannot allow custom objects to be indices. The
> exception is objects that define __index__, but this only works for
> integer indices. If __index__ returns anything other than an integer,
> you get an IndexError. This is annoying because it's easy to forget
> to
> do this when working with the ndindex API, and the error message from
> NumPy isn't informative about what went wrong unless you know to
> expect it.
>
> I'd like to propose an API that would allow custom objects to define
> how they should be converted to a standard NumPy index, similar to
> __index__ but that supports all index types. I think there are two
> options here:
>
> - Allow __index__ to return any index type, not just integers. This
> is
> the simplest because it reuses an existing API, and __index__ is the
> best possible name for this API. However, I'm not sure, but this may
> actually conflict with the text of PEP 357
> (https://www.python.org/dev/peps/pep-0357/). Also, some other APIs
> use
> __index__ to check if something is an indexable integer, which
> wouldn't accept generic index. For example, elements of a slice can
> be
> any object that defines __index__.
>
Index converts to an integer (safely).  There is an assumptions that
the integer is good for indexing, but I the name shouldn't be taken to
mean it is specific to indexing (even if that was the main motivation).


> - Add a new __numpy_index__ API that works like
>
> def __numpy_index__(self):
>     return <tuple, integer, slice, newaxis, ellipsis, or integer or
> boolean array>
>
> In NumPy, __getitem__ and __setitem__ on ndarray would first check if
> the input index type is one of the known types as it currently does,
> then it would try __index__, and if neither of those fails, it would
> call __numpy_index__(index) and use that.
Do you anticipate just:

    arr[index]

or also:

    arr[index1, index2]

Would you expect pandas or array-like objects to support this as well?

If we only do `arr[index]` might subclassing tuple be sufficient?  Do
you have any thought on how this might play out with a potential
`arr.oindex[...]`?

Adding either to NumPy is probably fairly straight forward, although I
prefer either not slow down every single indexing operation for an
extremely niche use-case (which is likely possible) or timing that it
is insignificant.

What might help me is understanding that `ndindex` itself better. Since
it seems like asking to add a protocol that may very well be used by
only this one project?

>
> Note: there is a more general way that NumPy arrays could allow
> __getitem__ to be defined on custom objects, which I am NOT
> proposing.
> Instead of an API that returns one of the current predefined index
> types (tuple, integer, slice, newaxis, ellipsis, or integer or
> boolean
> array), there could instead be an API that takes the array as input
> and returns another array (or view) as an output. This would allow an
> object to define itself as an index in arbitrary ways, even if such
> an
> index would not actually be possible via traditional indexing. There
> are definitely some interesting ideas that could be done with this,
> but this idea would be much more complicated, and isn't something
> that
> I need. Unless the community feels that a more general API like this
> would be preferred, I would suggest deferring something like it to a
> later discussion.
>
> What would be the best way to go about getting something like this
> implemented? Is it simple enough that we can just work out the
> details
> here and on a pull request, or should I write a NEP?
A short NEP may make sense, at least if this is supposed to be a
generic protocol for general array-likes, which I guess it would have
to be ready for.

Cheers,

Sebastian


>
> Aaron Meurer
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
>


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Allow __getitem__ to support custom objects

Aaron Meurer
On Thu, Oct 29, 2020 at 6:09 PM Sebastian Berg
<[hidden email]> wrote:

>
> On Tue, 2020-10-27 at 17:15 -0600, Aaron Meurer wrote:
> > For ndindex (https://quansight.github.io/ndindex/), the biggest issue
> > with the API is that to use an ndindex object to actually index an
> > array, you have to use a[idx.raw] instead of a[idx]. This is because
> > for NumPy arrays, you cannot allow custom objects to be indices. The
> > exception is objects that define __index__, but this only works for
> > integer indices. If __index__ returns anything other than an integer,
> > you get an IndexError. This is annoying because it's easy to forget
> > to
> > do this when working with the ndindex API, and the error message from
> > NumPy isn't informative about what went wrong unless you know to
> > expect it.
> >
> > I'd like to propose an API that would allow custom objects to define
> > how they should be converted to a standard NumPy index, similar to
> > __index__ but that supports all index types. I think there are two
> > options here:
> >
> > - Allow __index__ to return any index type, not just integers. This
> > is
> > the simplest because it reuses an existing API, and __index__ is the
> > best possible name for this API. However, I'm not sure, but this may
> > actually conflict with the text of PEP 357
> > (https://www.python.org/dev/peps/pep-0357/). Also, some other APIs
> > use
> > __index__ to check if something is an indexable integer, which
> > wouldn't accept generic index. For example, elements of a slice can
> > be
> > any object that defines __index__.
> >
>
> Index converts to an integer (safely).  There is an assumptions that
> the integer is good for indexing, but I the name shouldn't be taken to
> mean it is specific to indexing (even if that was the main motivation).
>
>
> > - Add a new __numpy_index__ API that works like
> >
> > def __numpy_index__(self):
> >     return <tuple, integer, slice, newaxis, ellipsis, or integer or
> > boolean array>
> >
> > In NumPy, __getitem__ and __setitem__ on ndarray would first check if
> > the input index type is one of the known types as it currently does,
> > then it would try __index__, and if neither of those fails, it would
> > call __numpy_index__(index) and use that.
>
> Do you anticipate just:
>
>     arr[index]
>
> or also:
>
>     arr[index1, index2]

I think both should work. If the second one doesn't work it would be
surprising.

>
> Would you expect pandas or array-like objects to support this as well?

Yes, it would probably be best for array-like to also work with the same API.

I don't know much about Pandas. It seems like it already allows a lot
of indexing stuff. Do Series/Dataframe already have such an API?

>
> If we only do `arr[index]` might subclassing tuple be sufficient?

I guess that technically works, except now your objects have to act
like a tuple, even if they represent something like a slice (Python
does not allow subclassing slice). For ndindex I've tried to make a
distinction between objects as representing indices and the built-in
objects that happen to be used to represent those indices by default.
So an ndindex.Tuple explicitly doesn't work like a Tuple, an
ndindex.Integer doesn't work like an int, and so on. That way there is
a clear distinction between ndindex operations and operations on the
built-in types.

> Do
> you have any thought on how this might play out with a potential
> `arr.oindex[...]`?

I think oindex[idx] would call the same API on idx. I'm not sure if it
matters that it's oindex, since that's at a higher level.

>
> Adding either to NumPy is probably fairly straight forward, although I
> prefer either not slow down every single indexing operation for an
> extremely niche use-case (which is likely possible) or timing that it
> is insignificant.

I'm not sure it would. The current cases would all be tried first. The
only time the new protocol would be used is when the index type isn't
one of the currently allowed types, which currently raises IndexError.

>
> What might help me is understanding that `ndindex` itself better. Since
> it seems like asking to add a protocol that may very well be used by
> only this one project?

That's fair. Maybe the more general API would make more sense then? I
think it would need more thinking out, but it would allow a lot more
use-cases.

Aaron Meurer

>
> >
> > Note: there is a more general way that NumPy arrays could allow
> > __getitem__ to be defined on custom objects, which I am NOT
> > proposing.
> > Instead of an API that returns one of the current predefined index
> > types (tuple, integer, slice, newaxis, ellipsis, or integer or
> > boolean
> > array), there could instead be an API that takes the array as input
> > and returns another array (or view) as an output. This would allow an
> > object to define itself as an index in arbitrary ways, even if such
> > an
> > index would not actually be possible via traditional indexing. There
> > are definitely some interesting ideas that could be done with this,
> > but this idea would be much more complicated, and isn't something
> > that
> > I need. Unless the community feels that a more general API like this
> > would be preferred, I would suggest deferring something like it to a
> > later discussion.
> >
> > What would be the best way to go about getting something like this
> > implemented? Is it simple enough that we can just work out the
> > details
> > here and on a pull request, or should I write a NEP?
>
> A short NEP may make sense, at least if this is supposed to be a
> generic protocol for general array-likes, which I guess it would have
> to be ready for.
>
> Cheers,
>
> Sebastian
>
>
> >
> > Aaron Meurer
> > _______________________________________________
> > NumPy-Discussion mailing list
> > [hidden email]
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Allow __getitem__ to support custom objects

Sebastian Berg
On Thu, 2020-10-29 at 23:58 -0600, Aaron Meurer wrote:

> On Thu, Oct 29, 2020 at 6:09 PM Sebastian Berg
> <[hidden email]> wrote:
> > On Tue, 2020-10-27 at 17:15 -0600, Aaron Meurer wrote:
> > > For ndindex (https://quansight.github.io/ndindex/), the biggest
> > > issue
> > > with the API is that to use an ndindex object to actually index
> > > an
> > > array, you have to use a[idx.raw] instead of a[idx]. This is
> > > because
> > > for NumPy arrays, you cannot allow custom objects to be indices.
> > > The
> > > exception is objects that define __index__, but this only works
> > > for
> > > integer indices. If __index__ returns anything other than an
> > > integer,
> > > you get an IndexError. This is annoying because it's easy to
> > > forget
> > > to
> > > do this when working with the ndindex API, and the error message
> > > from
> > > NumPy isn't informative about what went wrong unless you know to
> > > expect it.
> > >
> > > I'd like to propose an API that would allow custom objects to
> > > define
> > > how they should be converted to a standard NumPy index, similar
> > > to
> > > __index__ but that supports all index types. I think there are
> > > two
> > > options here:
> > >
> > > - Allow __index__ to return any index type, not just integers.
> > > This
> > > is
> > > the simplest because it reuses an existing API, and __index__ is
> > > the
> > > best possible name for this API. However, I'm not sure, but this
> > > may
> > > actually conflict with the text of PEP 357
> > > (https://www.python.org/dev/peps/pep-0357/). Also, some other
> > > APIs
> > > use
> > > __index__ to check if something is an indexable integer, which
> > > wouldn't accept generic index. For example, elements of a slice
> > > can
> > > be
> > > any object that defines __index__.
> > >
> >
> > Index converts to an integer (safely).  There is an assumptions
> > that
> > the integer is good for indexing, but I the name shouldn't be taken
> > to
> > mean it is specific to indexing (even if that was the main
> > motivation).
> >
> >
> > > - Add a new __numpy_index__ API that works like
> > >
> > > def __numpy_index__(self):
> > >     return <tuple, integer, slice, newaxis, ellipsis, or integer
> > > or
> > > boolean array>
> > >
> > > In NumPy, __getitem__ and __setitem__ on ndarray would first
> > > check if
> > > the input index type is one of the known types as it currently
> > > does,
> > > then it would try __index__, and if neither of those fails, it
> > > would
> > > call __numpy_index__(index) and use that.
> >
> > Do you anticipate just:
> >
> >     arr[index]
> >
> > or also:
> >
> >     arr[index1, index2]
>
> I think both should work. If the second one doesn't work it would be
> surprising.
>
> > Would you expect pandas or array-like objects to support this as
> > well?
>
> Yes, it would probably be best for array-like to also work with the
> same API.
>
> I don't know much about Pandas. It seems like it already allows a lot
> of indexing stuff. Do Series/Dataframe already have such an API?
I do not think so, but indexing in pandas works differently often. So I
was curious whether y

>
> > If we only do `arr[index]` might subclassing tuple be sufficient?
>
> I guess that technically works, except now your objects have to act
> like a tuple, even if they represent something like a slice (Python
> does not allow subclassing slice). For ndindex I've tried to make a
> distinction between objects as representing indices and the built-in
> objects that happen to be used to represent those indices by default.
> So an ndindex.Tuple explicitly doesn't work like a Tuple, an
> ndindex.Integer doesn't work like an int, and so on. That way there
> is
> a clear distinction between ndindex operations and operations on the
> built-in types.
>
> > Do
> > you have any thought on how this might play out with a potential
> > `arr.oindex[...]`?
>
> I think oindex[idx] would call the same API on idx. I'm not sure if
> it
> matters that it's oindex, since that's at a higher level.
It is at a higher level, but it seemed to me that `ndindex` largely
plays at that level.  For example, you have a method to implement index
chaining:

    arr[idx1][idx2] == arr[idx1.as_subindex(idx2)]

(or similar). But this will not work:

    arr.oindex[idx1].oindex[idx2] != arr.idx[idx1.as_subindex(idx2)]

Also the "result" shape, or even questions like `.isempty()` will give
different answers when  used as an `.oindex[...]`.

This is why I though that `arr[idx1, idx2]` is possibly very different
case from `arr[idx]` at least for current NumPy indexing logic (it
would be better with `arr.oindex[]`).
The difference doesn't matter in your proposal, but I had the
impression that the `arr[idx1, idx2]` form might be rare/unused and
that form would not be able to carry information such as whether this
is supposed to be an "oindex".

Maybe it helps to look back at `.oindex` to explain this. A possible
solution to subclass handling if we add `arr.oindex` is to make it so
that:

    myarr.oindex[indx]

could call:

    myarr.__getitem__(indx_object)

Where `index_object` knows that this is was an oindex.  The main reason
is the expectation that many subclasses may implement `__getitem__`,
but probably just do:

    def __getitem__(self, indx):
         new_data = self.data[indx]
         # Do something with new_data.

Now for `ndindex` it would seem to make a lot of sense to have an
OIndex object, etc. for the same reason.

Of course how we implement `.oindex` can be pretty separate from this.

>
> > Adding either to NumPy is probably fairly straight forward,
> > although I
> > prefer either not slow down every single indexing operation for an
> > extremely niche use-case (which is likely possible) or timing that
> > it
> > is insignificant.
>
> I'm not sure it would. The current cases would all be tried first.
> The
> only time the new protocol would be used is when the index type isn't
> one of the currently allowed types, which currently raises
> IndexError.
>
> > What might help me is understanding that `ndindex` itself better.
> > Since
> > it seems like asking to add a protocol that may very well be used
> > by
> > only this one project?
>
> That's fair. Maybe the more general API would make more sense then? I
> think it would need more thinking out, but it would allow a lot more
> use-cases.
>
A general API might make sense, but I am edgy about reversing the roles
of who performs the indexing. For one thing that probably would break
subclassing and overriding of `__getitem__`?


Cheers,

Sebastian



> Aaron Meurer
>
> > > Note: there is a more general way that NumPy arrays could allow
> > > __getitem__ to be defined on custom objects, which I am NOT
> > > proposing.
> > > Instead of an API that returns one of the current predefined
> > > index
> > > types (tuple, integer, slice, newaxis, ellipsis, or integer or
> > > boolean
> > > array), there could instead be an API that takes the array as
> > > input
> > > and returns another array (or view) as an output. This would
> > > allow an
> > > object to define itself as an index in arbitrary ways, even if
> > > such
> > > an
> > > index would not actually be possible via traditional indexing.
> > > There
> > > are definitely some interesting ideas that could be done with
> > > this,
> > > but this idea would be much more complicated, and isn't something
> > > that
> > > I need. Unless the community feels that a more general API like
> > > this
> > > would be preferred, I would suggest deferring something like it
> > > to a
> > > later discussion.
> > >
> > > What would be the best way to go about getting something like
> > > this
> > > implemented? Is it simple enough that we can just work out the
> > > details
> > > here and on a pull request, or should I write a NEP?
> >
> > A short NEP may make sense, at least if this is supposed to be a
> > generic protocol for general array-likes, which I guess it would
> > have
> > to be ready for.
> >
> > Cheers,
> >
> > Sebastian
> >
> >
> > > Aaron Meurer
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > [hidden email]
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > [hidden email]
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
>

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Allow __getitem__ to support custom objects

Aaron Meurer
On Fri, Oct 30, 2020 at 9:18 AM Sebastian Berg
<[hidden email]> wrote:

>
> On Thu, 2020-10-29 at 23:58 -0600, Aaron Meurer wrote:
> > On Thu, Oct 29, 2020 at 6:09 PM Sebastian Berg
> > <[hidden email]> wrote:
> > > On Tue, 2020-10-27 at 17:15 -0600, Aaron Meurer wrote:
> > > > For ndindex (https://quansight.github.io/ndindex/), the biggest
> > > > issue
> > > > with the API is that to use an ndindex object to actually index
> > > > an
> > > > array, you have to use a[idx.raw] instead of a[idx]. This is
> > > > because
> > > > for NumPy arrays, you cannot allow custom objects to be indices.
> > > > The
> > > > exception is objects that define __index__, but this only works
> > > > for
> > > > integer indices. If __index__ returns anything other than an
> > > > integer,
> > > > you get an IndexError. This is annoying because it's easy to
> > > > forget
> > > > to
> > > > do this when working with the ndindex API, and the error message
> > > > from
> > > > NumPy isn't informative about what went wrong unless you know to
> > > > expect it.
> > > >
> > > > I'd like to propose an API that would allow custom objects to
> > > > define
> > > > how they should be converted to a standard NumPy index, similar
> > > > to
> > > > __index__ but that supports all index types. I think there are
> > > > two
> > > > options here:
> > > >
> > > > - Allow __index__ to return any index type, not just integers.
> > > > This
> > > > is
> > > > the simplest because it reuses an existing API, and __index__ is
> > > > the
> > > > best possible name for this API. However, I'm not sure, but this
> > > > may
> > > > actually conflict with the text of PEP 357
> > > > (https://www.python.org/dev/peps/pep-0357/). Also, some other
> > > > APIs
> > > > use
> > > > __index__ to check if something is an indexable integer, which
> > > > wouldn't accept generic index. For example, elements of a slice
> > > > can
> > > > be
> > > > any object that defines __index__.
> > > >
> > >
> > > Index converts to an integer (safely).  There is an assumptions
> > > that
> > > the integer is good for indexing, but I the name shouldn't be taken
> > > to
> > > mean it is specific to indexing (even if that was the main
> > > motivation).
> > >
> > >
> > > > - Add a new __numpy_index__ API that works like
> > > >
> > > > def __numpy_index__(self):
> > > >     return <tuple, integer, slice, newaxis, ellipsis, or integer
> > > > or
> > > > boolean array>
> > > >
> > > > In NumPy, __getitem__ and __setitem__ on ndarray would first
> > > > check if
> > > > the input index type is one of the known types as it currently
> > > > does,
> > > > then it would try __index__, and if neither of those fails, it
> > > > would
> > > > call __numpy_index__(index) and use that.
> > >
> > > Do you anticipate just:
> > >
> > >     arr[index]
> > >
> > > or also:
> > >
> > >     arr[index1, index2]
> >
> > I think both should work. If the second one doesn't work it would be
> > surprising.
> >
> > > Would you expect pandas or array-like objects to support this as
> > > well?
> >
> > Yes, it would probably be best for array-like to also work with the
> > same API.
> >
> > I don't know much about Pandas. It seems like it already allows a lot
> > of indexing stuff. Do Series/Dataframe already have such an API?
>
> I do not think so, but indexing in pandas works differently often. So I
> was curious whether y
>
> >
> > > If we only do `arr[index]` might subclassing tuple be sufficient?
> >
> > I guess that technically works, except now your objects have to act
> > like a tuple, even if they represent something like a slice (Python
> > does not allow subclassing slice). For ndindex I've tried to make a
> > distinction between objects as representing indices and the built-in
> > objects that happen to be used to represent those indices by default.
> > So an ndindex.Tuple explicitly doesn't work like a Tuple, an
> > ndindex.Integer doesn't work like an int, and so on. That way there
> > is
> > a clear distinction between ndindex operations and operations on the
> > built-in types.
> >
> > > Do
> > > you have any thought on how this might play out with a potential
> > > `arr.oindex[...]`?
> >
> > I think oindex[idx] would call the same API on idx. I'm not sure if
> > it
> > matters that it's oindex, since that's at a higher level.
>
> It is at a higher level, but it seemed to me that `ndindex` largely
> plays at that level.  For example, you have a method to implement index
> chaining:
>
>     arr[idx1][idx2] == arr[idx1.as_subindex(idx2)]
>
> (or similar). But this will not work:
>
>     arr.oindex[idx1].oindex[idx2] != arr.idx[idx1.as_subindex(idx2)]

Just to be clear, this isn't how as_subindex works. as_subindex is
actually the inverse of composition (composition isn't implemented
yet). But I get the point anyway. Most of the ndindex API won't be
valid for oindex.  I haven't thought too much yet about how outer
indexing fits into ndindex, but it is something a lot of people seem
to be interested in.

>
> Also the "result" shape, or even questions like `.isempty()` will give
> different answers when  used as an `.oindex[...]`.
>
> This is why I though that `arr[idx1, idx2]` is possibly very different
> case from `arr[idx]` at least for current NumPy indexing logic (it
> would be better with `arr.oindex[]`).
> The difference doesn't matter in your proposal, but I had the
> impression that the `arr[idx1, idx2]` form might be rare/unused and
> that form would not be able to carry information such as whether this
> is supposed to be an "oindex".
>
> Maybe it helps to look back at `.oindex` to explain this. A possible
> solution to subclass handling if we add `arr.oindex` is to make it so
> that:
>
>     myarr.oindex[indx]
>
> could call:
>
>     myarr.__getitem__(indx_object)
>
> Where `index_object` knows that this is was an oindex.  The main reason
> is the expectation that many subclasses may implement `__getitem__`,
> but probably just do:
>
>     def __getitem__(self, indx):
>          new_data = self.data[indx]
>          # Do something with new_data.
>
> Now for `ndindex` it would seem to make a lot of sense to have an
> OIndex object, etc. for the same reason.
>
> Of course how we implement `.oindex` can be pretty separate from this.

Yeah, one of the ideas for the more general API is that you could have
a[oindex(idx)] where oindex() returns some object that does outer
indexing. If outer indices always map to normal indices, this could be
done with the simpler API (I haven't looked at it enough to say
whether that's the case or not yet).

>
> >
> > > Adding either to NumPy is probably fairly straight forward,
> > > although I
> > > prefer either not slow down every single indexing operation for an
> > > extremely niche use-case (which is likely possible) or timing that
> > > it
> > > is insignificant.
> >
> > I'm not sure it would. The current cases would all be tried first.
> > The
> > only time the new protocol would be used is when the index type isn't
> > one of the currently allowed types, which currently raises
> > IndexError.
> >
> > > What might help me is understanding that `ndindex` itself better.
> > > Since
> > > it seems like asking to add a protocol that may very well be used
> > > by
> > > only this one project?
> >
> > That's fair. Maybe the more general API would make more sense then? I
> > think it would need more thinking out, but it would allow a lot more
> > use-cases.
> >
>
> A general API might make sense, but I am edgy about reversing the roles
> of who performs the indexing. For one thing that probably would break
> subclassing and overriding of `__getitem__`?

Yeah it might. In Python, we have __getattr__, which lets A define how
A.x works, and __get__ (i.e., descriptors), which allows x to define
how A.x works. The whole thing is tied together by the higher level
__getattribute__, which defines the logic for both (as well as
__dict__ and all that other stuff). You should almost never override
__getattribute__ (unless you *really* know what you're doing).

I'm not sure what that says here. Maybe that __getattr__ shouldn't be
overridden by subclasses but rather some other NumPy specific method
that __getattr__ calls?

Aaron Meurer

>
>
> Cheers,
>
> Sebastian
>
>
>
> > Aaron Meurer
> >
> > > > Note: there is a more general way that NumPy arrays could allow
> > > > __getitem__ to be defined on custom objects, which I am NOT
> > > > proposing.
> > > > Instead of an API that returns one of the current predefined
> > > > index
> > > > types (tuple, integer, slice, newaxis, ellipsis, or integer or
> > > > boolean
> > > > array), there could instead be an API that takes the array as
> > > > input
> > > > and returns another array (or view) as an output. This would
> > > > allow an
> > > > object to define itself as an index in arbitrary ways, even if
> > > > such
> > > > an
> > > > index would not actually be possible via traditional indexing.
> > > > There
> > > > are definitely some interesting ideas that could be done with
> > > > this,
> > > > but this idea would be much more complicated, and isn't something
> > > > that
> > > > I need. Unless the community feels that a more general API like
> > > > this
> > > > would be preferred, I would suggest deferring something like it
> > > > to a
> > > > later discussion.
> > > >
> > > > What would be the best way to go about getting something like
> > > > this
> > > > implemented? Is it simple enough that we can just work out the
> > > > details
> > > > here and on a pull request, or should I write a NEP?
> > >
> > > A short NEP may make sense, at least if this is supposed to be a
> > > generic protocol for general array-likes, which I guess it would
> > > have
> > > to be ready for.
> > >
> > > Cheers,
> > >
> > > Sebastian
> > >
> > >
> > > > Aaron Meurer
> > > > _______________________________________________
> > > > NumPy-Discussion mailing list
> > > > [hidden email]
> > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > >
> > >
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > [hidden email]
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > _______________________________________________
> > NumPy-Discussion mailing list
> > [hidden email]
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion