ndrange, like range but multidimensiontal

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

ndrange, like range but multidimensiontal

hmaarrfk
Hi All,

I've been using numpy array objects to store collections of 2D (and soon ND) variables. When iterating through these collections, I often found it useful to use `ndindex`, which for `for loops` behaves much like `range` with only a `stop` parameter.

That said, it lacks a few features that are now present in `range` are missing from `ndindex`, most notably the ability to iterate over a subset of the ndindex.

I found myself often writing `itertools.product(range(1, data.shapep[0]), range(3, data.shape[2]))` for custom iterations. While it does flatten out the for loop, it is arguable less readable than having 1 or 2 levels of nested for loops.

It is quite possible that `nditer` would solve my problems, but unfortunately I am still not able to make sense of then numerous options it has.

I propose an `ndrange` class that can be used to iterate over nd-collections mimicking the API of `range` as much as possible and adapting it to the ND case (i.e. returning tuples instead of singletons).

Since this is an enhancement proposal, I am bringing the idea to the mailing list for reactions.

The implementation in this PR https://github.com/numpy/numpy/pull/12094 is based on keeping track of a tuple of python `range` range objects. The `__iter__` method returns the result of `itertools.product(*self._ranges)`

By leveraging python's `range` implementation, operations like `containement` `index`, `reversed`, `equality` and most importantly slicing of the ndrange object are possible to offer to the general numpy audiance.

For example, iterating through a 2D collection but avoiding indexing the first and last column used to look like this:

```
c = np.empty((4, 4), dtype=object)
# ... compute on c
for j in range(c.shape[0]):
     for i in range(1, c.shape[1]-1):
         c[j, i] # = compute on c[j, i] that depends on the index i, j
```

With `np.ndrange` it can look something like this:

```
c = np.empty((4, 4), dtype=object)
# ... compute on c
for i in np.ndrange(c.shape)[:, 1:-1]:
    c[i] # = some operation on c[i] that depends on the index i
```

very pythonic, very familiar to numpy users

Thank you for the feedback,

Mark

References:
An issue requesting expansion to the ndindex API on github: https://github.com/numpy/numpy/issues/6393

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: ndrange, like range but multidimensiontal

Allan Haldane
On 10/07/2018 10:32 AM, Mark Harfouche wrote:

> With `np.ndrange` it can look something like this:
>
> ```
> c = np.empty((4, 4), dtype=object)
> # ... compute on c
> for i in np.ndrange(c.shape)[:, 1:-1]:
>      c[i] # = some operation on c[i] that depends on the index i
> ```
>
> very pythonic, very familiar to numpy users

So if I understand, this does the same as `np.ndindex` but allows
numpy-like slicing of the returned generator object, as requested in #6393.

I don't like the duplication in functionality between ndindex and
ndrange here. Better rather to add the slicing functionality to ndindex,
  than create a whole new nearly-identical function. np.ndindex is
already a somewhat obscure and discouraged method since it is usually
better to find a vectorized numpy operation instead of a for loop, and I
don't like adding more obscure functions.

But as an improvement to np.ndindex, I think adding this functionality
seems good if it can be nicely implemented. Maybe there is a way to use
the same optimization tricks as in the current implementation of ndindex
but allow different stop/step? A simple wrapper of ndindex?

Cheers,
Allan
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: ndrange, like range but multidimensiontal

hmaarrfk
In reply to this post by hmaarrfk
Allan,

Sorry for the delay. I had my mailing list preferences set to digest. I changed them for now. (I hope this message continues that thread).

Thank you for your feedback. You are correct in identifying that the real feature is expanding the `ndindex` API to support slicing. See comments about the separate points you raised below

## Expanding the API of ndindex

> Better rather to add the slicing functionality to ndindex, than create a whole new nearly-identical function.

This is a very important point. I should have included a note about it. My [first attempt](https://github.com/hmaarrfk/numpy/pull/1/files#diff-1bd953557a98073031ce66d05dbde3c8R663) did try that approach.
I ran into 2 issues:
1. Getting around the catch-all positional argument is annoying, and logic to do that will likely be error prone. Peculiarities about how we implement it might cause some very strange for `tuple-like` inputs that we don't expect.
2. `ndindex` is an iterator itself. As proposed, `ndrange`, like `range`, is not an iterator. Changing this behaviour would likely lead to breaking code that uses that assumption. For example anybody using introspection or code like:

```
indx = np.ndindex(5, 5)
next(indx)  # Don't look at the (0, 0) coordinate
for i in indx:
    print(i)
```
would break if `ndindex` becomes "not an iterator"

For these two reasons, I thought it was easier to simply have a new class, that seems like a close sibling to `ndindex`.

I personally don't care about point 1 so much. In my mind, start, stop and step is confusing in ND. but maybe some might find it useful? Point 1 also makes it harder to make `ndrange` more familiar to `range` users.

> I don't like adding more obscure functions

Hopefully the name `ndrange` makes it easier to find?

## Writing vectorized code 

> np.ndindex is  already a somewhat obscure and discouraged method since it is usually better to find a vectorized numpy operation instead of a for loop

I understand that this kind of function is not focused on `numerical` operations on the elements of the matrix itself. It really is there to help fill the void of any useful multi-dimensional python container.

I think `ndrange`/`ndindex` is there to be used like `np.vectorized`. I've tried to use `np.vectorize` in my own code, but quickly found that making logic fit into vectorize's requirements was often more complicated than writing my own loop multi-nested loops. In my opinion, nested `range` loops or `ndrange`/`ndindex` is a much more natural way to loop over collections compared to `np.vectorized`.

I'm glad to add warnings to the docs.

## Implementation detail: itertools.product + range vs nditer

> Maybe there is a way to use  the same optimization tricks as in the current implementation of ndindex  but allow different stop/step?

My primary goal here is to make `ndrange` behave much like `range`. By implementing it on top of `range`, it makes it obvious to me how to enforce that behaviour as the API of range gets expanded (though it seems to have settled since Python 3.3). Whatever we decide to call `ndrange`/`ndindex`, the tests I wrote can help ensure we have good range-API coverage (for now).

itertools.product + range seems to be much faster than the current implementation of ndindex

(python 3.6)
```
%%timeit

for i in np.ndindex(100, 100):
    pass
3.94 ms ± 19.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
import itertools
for i in itertools.product(range(100), range(100)):
    pass
231 µs ± 1.09 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: ndrange, like range but multidimensiontal

Allan Haldane
On 10/8/18 12:21 PM, Mark Harfouche wrote:

> 2. `ndindex` is an iterator itself. As proposed, `ndrange`, like
> `range`, is not an iterator. Changing this behaviour would likely lead
> to breaking code that uses that assumption. For example anybody using
> introspection or code like:
>
> ```
> indx = np.ndindex(5, 5)
> next(indx)  # Don't look at the (0, 0) coordinate
> for i in indx:
>     print(i)
> ```
> would break if `ndindex` becomes "not an iterator"

OK, I see now. Just like python3 has separate range and range_iterator
types, where range is sliceable, we would have separate ndrange and
ndindex types, where ndrange is sliceable. You're just copying the
python3 api. That justifies it pretty well for me.

I still think we shouldn't have two functions which do nearly the same
thing. We should only have one, and get rid of the other. I see two ways
forward:

 * replace ndindex by your ndrange code, so it is no longer an iter.
   This would require some deprecation cycles for the cases that break.
 * deprecate ndindex in favor of a new function ndrange. We would keep
   ndindex around for back-compatibility, with a dep warning to use
   ndrange instead.

Doing a code search on github, I can see that a lot of people's code
would break if ndindex no longer was an iter. I also like the name
ndrange for its allusion to python3's range behavior. That makes me lean
towards the second option of a separate ndrange, with possible
deprecation of ndindex.

> itertools.product + range seems to be much faster than the current
> implementation of ndindex
>
> (python 3.6)
> ```
> %%timeit
>
> for i in np.ndindex(100, 100):
>     pass
> 3.94 ms ± 19.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>
> %%timeit
> import itertools
> for i in itertools.product(range(100), range(100)):
>     pass
> 231 µs ± 1.09 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
> ```

If the new code ends up faster than the old code, that's great, and
further justification for using ndrange instead of ndindex. I had
thought using nditer in the old code was fastest.

So as far as I am concerned, I say go ahead with the PR the way you are
doing it.

Allan
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: ndrange, like range but multidimensiontal

Stephan Hoyer-2
I'm open to adding ndrange, and "soft-deprecating" ndindex (i.e., discouraging its use in our docs, but not actually deprecating it). Certainly ndrange seems like a small but meaningful improvement in the interface.

That said, I'm not convinced this is really worth the trouble. I think the nested loop is still pretty readable/clear, and there are few times when I've actually found ndindex() be useful.

On Mon, Oct 8, 2018 at 12:35 PM Allan Haldane <[hidden email]> wrote:
On 10/8/18 12:21 PM, Mark Harfouche wrote:
> 2. `ndindex` is an iterator itself. As proposed, `ndrange`, like
> `range`, is not an iterator. Changing this behaviour would likely lead
> to breaking code that uses that assumption. For example anybody using
> introspection or code like:
>
> ```
> indx = np.ndindex(5, 5)
> next(indx)  # Don't look at the (0, 0) coordinate
> for i in indx:
>     print(i)
> ```
> would break if `ndindex` becomes "not an iterator"

OK, I see now. Just like python3 has separate range and range_iterator
types, where range is sliceable, we would have separate ndrange and
ndindex types, where ndrange is sliceable. You're just copying the
python3 api. That justifies it pretty well for me.

I still think we shouldn't have two functions which do nearly the same
thing. We should only have one, and get rid of the other. I see two ways
forward:

 * replace ndindex by your ndrange code, so it is no longer an iter.
   This would require some deprecation cycles for the cases that break.
 * deprecate ndindex in favor of a new function ndrange. We would keep
   ndindex around for back-compatibility, with a dep warning to use
   ndrange instead.

Doing a code search on github, I can see that a lot of people's code
would break if ndindex no longer was an iter. I also like the name
ndrange for its allusion to python3's range behavior. That makes me lean
towards the second option of a separate ndrange, with possible
deprecation of ndindex.

> itertools.product + range seems to be much faster than the current
> implementation of ndindex
>
> (python 3.6)
> ```
> %%timeit
>
> for i in np.ndindex(100, 100):
>     pass
> 3.94 ms ± 19.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>
> %%timeit
> import itertools
> for i in itertools.product(range(100), range(100)):
>     pass
> 231 µs ± 1.09 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
> ```

If the new code ends up faster than the old code, that's great, and
further justification for using ndrange instead of ndindex. I had
thought using nditer in the old code was fastest.

So as far as I am concerned, I say go ahead with the PR the way you are
doing it.

Allan
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: ndrange, like range but multidimensiontal

hmaarrfk
since ndrange is a superset of the features of ndindex, we can implement ndindex with ndrange or keep it as is.
ndindex is now a glorified `nditer` object anyway. So it isn't so much of a maintenance burden.
As for how ndindex is implemented, I'm a little worried about python 2 performance seeing as range is a list.
I would wait on changing the way ndindex is implemented for now.

I agree with Stephan that ndindex should be kept in. Many want backward compatible code. It would be hard for me to justify why a dependency should be bumped up to bleeding edge numpy just for a convenience iterator.

Honestly, I was really surprised to see such a speed difference, I thought it would have been closer.

Allan, I decided to run a few more benchmarks, the nditer just seems slow for single array access some reason. Maybe a bug?

```
import numpy as np
import itertools
a = np.ones((1000, 1000))

b = {}
for i in np.ndindex(a.shape):
    b[i] = i

%%timeit
# op_flag=('readonly',) doesn't change performance
for a_value in np.nditer(a):
    pass
109 ms ± 921 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
for i in itertools.product(range(1000), range(1000)):
    a_value = a[i]
113 ms ± 1.72 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
for i in itertools.product(range(1000), range(1000)):
    c = b[i]
193 ms ± 3.89 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
for a_value in a.flat:
    pass
25.3 ms ± 278 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
for k, v in b.items():
    pass
19.9 ms ± 675 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
for i in itertools.product(range(1000), range(1000)):
    pass
28 ms ± 715 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
```

On Mon, Oct 8, 2018 at 4:26 PM Stephan Hoyer <[hidden email]> wrote:
I'm open to adding ndrange, and "soft-deprecating" ndindex (i.e., discouraging its use in our docs, but not actually deprecating it). Certainly ndrange seems like a small but meaningful improvement in the interface.

That said, I'm not convinced this is really worth the trouble. I think the nested loop is still pretty readable/clear, and there are few times when I've actually found ndindex() be useful.

On Mon, Oct 8, 2018 at 12:35 PM Allan Haldane <[hidden email]> wrote:
On 10/8/18 12:21 PM, Mark Harfouche wrote:
> 2. `ndindex` is an iterator itself. As proposed, `ndrange`, like
> `range`, is not an iterator. Changing this behaviour would likely lead
> to breaking code that uses that assumption. For example anybody using
> introspection or code like:
>
> ```
> indx = np.ndindex(5, 5)
> next(indx)  # Don't look at the (0, 0) coordinate
> for i in indx:
>     print(i)
> ```
> would break if `ndindex` becomes "not an iterator"

OK, I see now. Just like python3 has separate range and range_iterator
types, where range is sliceable, we would have separate ndrange and
ndindex types, where ndrange is sliceable. You're just copying the
python3 api. That justifies it pretty well for me.

I still think we shouldn't have two functions which do nearly the same
thing. We should only have one, and get rid of the other. I see two ways
forward:

 * replace ndindex by your ndrange code, so it is no longer an iter.
   This would require some deprecation cycles for the cases that break.
 * deprecate ndindex in favor of a new function ndrange. We would keep
   ndindex around for back-compatibility, with a dep warning to use
   ndrange instead.

Doing a code search on github, I can see that a lot of people's code
would break if ndindex no longer was an iter. I also like the name
ndrange for its allusion to python3's range behavior. That makes me lean
towards the second option of a separate ndrange, with possible
deprecation of ndindex.

> itertools.product + range seems to be much faster than the current
> implementation of ndindex
>
> (python 3.6)
> ```
> %%timeit
>
> for i in np.ndindex(100, 100):
>     pass
> 3.94 ms ± 19.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>
> %%timeit
> import itertools
> for i in itertools.product(range(100), range(100)):
>     pass
> 231 µs ± 1.09 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
> ```

If the new code ends up faster than the old code, that's great, and
further justification for using ndrange instead of ndindex. I had
thought using nditer in the old code was fastest.

So as far as I am concerned, I say go ahead with the PR the way you are
doing it.

Allan
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: ndrange, like range but multidimensiontal

Stephan Hoyer-2
The speed difference is interesting but really a different question than the public API.

I'm coming around to ndrange(). I can see how it could be useful for symbolic manipulation of arrays and indexing operations, similar to what we do in dask and xarray.

On Mon, Oct 8, 2018 at 4:25 PM Mark Harfouche <[hidden email]> wrote:
since ndrange is a superset of the features of ndindex, we can implement ndindex with ndrange or keep it as is.
ndindex is now a glorified `nditer` object anyway. So it isn't so much of a maintenance burden.
As for how ndindex is implemented, I'm a little worried about python 2 performance seeing as range is a list.
I would wait on changing the way ndindex is implemented for now.

I agree with Stephan that ndindex should be kept in. Many want backward compatible code. It would be hard for me to justify why a dependency should be bumped up to bleeding edge numpy just for a convenience iterator.

Honestly, I was really surprised to see such a speed difference, I thought it would have been closer.

Allan, I decided to run a few more benchmarks, the nditer just seems slow for single array access some reason. Maybe a bug?

```
import numpy as np
import itertools
a = np.ones((1000, 1000))

b = {}
for i in np.ndindex(a.shape):
    b[i] = i

%%timeit
# op_flag=('readonly',) doesn't change performance
for a_value in np.nditer(a):
    pass
109 ms ± 921 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
for i in itertools.product(range(1000), range(1000)):
    a_value = a[i]
113 ms ± 1.72 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
for i in itertools.product(range(1000), range(1000)):
    c = b[i]
193 ms ± 3.89 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
for a_value in a.flat:
    pass
25.3 ms ± 278 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
for k, v in b.items():
    pass
19.9 ms ± 675 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
for i in itertools.product(range(1000), range(1000)):
    pass
28 ms ± 715 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
```

On Mon, Oct 8, 2018 at 4:26 PM Stephan Hoyer <[hidden email]> wrote:
I'm open to adding ndrange, and "soft-deprecating" ndindex (i.e., discouraging its use in our docs, but not actually deprecating it). Certainly ndrange seems like a small but meaningful improvement in the interface.

That said, I'm not convinced this is really worth the trouble. I think the nested loop is still pretty readable/clear, and there are few times when I've actually found ndindex() be useful.

On Mon, Oct 8, 2018 at 12:35 PM Allan Haldane <[hidden email]> wrote:
On 10/8/18 12:21 PM, Mark Harfouche wrote:
> 2. `ndindex` is an iterator itself. As proposed, `ndrange`, like
> `range`, is not an iterator. Changing this behaviour would likely lead
> to breaking code that uses that assumption. For example anybody using
> introspection or code like:
>
> ```
> indx = np.ndindex(5, 5)
> next(indx)  # Don't look at the (0, 0) coordinate
> for i in indx:
>     print(i)
> ```
> would break if `ndindex` becomes "not an iterator"

OK, I see now. Just like python3 has separate range and range_iterator
types, where range is sliceable, we would have separate ndrange and
ndindex types, where ndrange is sliceable. You're just copying the
python3 api. That justifies it pretty well for me.

I still think we shouldn't have two functions which do nearly the same
thing. We should only have one, and get rid of the other. I see two ways
forward:

 * replace ndindex by your ndrange code, so it is no longer an iter.
   This would require some deprecation cycles for the cases that break.
 * deprecate ndindex in favor of a new function ndrange. We would keep
   ndindex around for back-compatibility, with a dep warning to use
   ndrange instead.

Doing a code search on github, I can see that a lot of people's code
would break if ndindex no longer was an iter. I also like the name
ndrange for its allusion to python3's range behavior. That makes me lean
towards the second option of a separate ndrange, with possible
deprecation of ndindex.

> itertools.product + range seems to be much faster than the current
> implementation of ndindex
>
> (python 3.6)
> ```
> %%timeit
>
> for i in np.ndindex(100, 100):
>     pass
> 3.94 ms ± 19.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>
> %%timeit
> import itertools
> for i in itertools.product(range(100), range(100)):
>     pass
> 231 µs ± 1.09 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
> ```

If the new code ends up faster than the old code, that's great, and
further justification for using ndrange instead of ndindex. I had
thought using nditer in the old code was fastest.

So as far as I am concerned, I say go ahead with the PR the way you are
doing it.

Allan
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: ndrange, like range but multidimensiontal

Eric Wieser

One thing that worries me here - in python, range(...) in essence generates a lazy list - so I’d expect ndrange to generate a lazy ndarray. In practice, that means it would be a duck-type defining an __array__ method to evaluate it, and only implement methods already present in numpy.

It’s not clear to me what the datatype of such an array-like would be. Candidates I can think of are:

  1. [('i0', intp), ('i1', intp), ...], but this makes tuple coercion a little awkward
  2. (intp, (N,)) - which collapses into a shape + (3,) array
  3. object_.
  4. Some new np.tuple_ dtype, a heterogenous tuple, which is like the structured np.void but without field names. I’m not sure how vectorized element indexing would be spelt though.

Eric


On Tue, 9 Oct 2018 at 21:59 Stephan Hoyer <[hidden email]> wrote:
The speed difference is interesting but really a different question than the public API.

I'm coming around to ndrange(). I can see how it could be useful for symbolic manipulation of arrays and indexing operations, similar to what we do in dask and xarray.

On Mon, Oct 8, 2018 at 4:25 PM Mark Harfouche <[hidden email]> wrote:
since ndrange is a superset of the features of ndindex, we can implement ndindex with ndrange or keep it as is.
ndindex is now a glorified `nditer` object anyway. So it isn't so much of a maintenance burden.
As for how ndindex is implemented, I'm a little worried about python 2 performance seeing as range is a list.
I would wait on changing the way ndindex is implemented for now.

I agree with Stephan that ndindex should be kept in. Many want backward compatible code. It would be hard for me to justify why a dependency should be bumped up to bleeding edge numpy just for a convenience iterator.

Honestly, I was really surprised to see such a speed difference, I thought it would have been closer.

Allan, I decided to run a few more benchmarks, the nditer just seems slow for single array access some reason. Maybe a bug?

```
import numpy as np
import itertools
a = np.ones((1000, 1000))

b = {}
for i in np.ndindex(a.shape):
    b[i] = i

%%timeit
# op_flag=('readonly',) doesn't change performance
for a_value in np.nditer(a):
    pass
109 ms ± 921 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
for i in itertools.product(range(1000), range(1000)):
    a_value = a[i]
113 ms ± 1.72 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
for i in itertools.product(range(1000), range(1000)):
    c = b[i]
193 ms ± 3.89 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
for a_value in a.flat:
    pass
25.3 ms ± 278 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
for k, v in b.items():
    pass
19.9 ms ± 675 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
for i in itertools.product(range(1000), range(1000)):
    pass
28 ms ± 715 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
```

On Mon, Oct 8, 2018 at 4:26 PM Stephan Hoyer <[hidden email]> wrote:
I'm open to adding ndrange, and "soft-deprecating" ndindex (i.e., discouraging its use in our docs, but not actually deprecating it). Certainly ndrange seems like a small but meaningful improvement in the interface.

That said, I'm not convinced this is really worth the trouble. I think the nested loop is still pretty readable/clear, and there are few times when I've actually found ndindex() be useful.

On Mon, Oct 8, 2018 at 12:35 PM Allan Haldane <[hidden email]> wrote:
On 10/8/18 12:21 PM, Mark Harfouche wrote:
> 2. `ndindex` is an iterator itself. As proposed, `ndrange`, like
> `range`, is not an iterator. Changing this behaviour would likely lead
> to breaking code that uses that assumption. For example anybody using
> introspection or code like:
>
> ```
> indx = np.ndindex(5, 5)
> next(indx)  # Don't look at the (0, 0) coordinate
> for i in indx:
>     print(i)
> ```
> would break if `ndindex` becomes "not an iterator"

OK, I see now. Just like python3 has separate range and range_iterator
types, where range is sliceable, we would have separate ndrange and
ndindex types, where ndrange is sliceable. You're just copying the
python3 api. That justifies it pretty well for me.

I still think we shouldn't have two functions which do nearly the same
thing. We should only have one, and get rid of the other. I see two ways
forward:

 * replace ndindex by your ndrange code, so it is no longer an iter.
   This would require some deprecation cycles for the cases that break.
 * deprecate ndindex in favor of a new function ndrange. We would keep
   ndindex around for back-compatibility, with a dep warning to use
   ndrange instead.

Doing a code search on github, I can see that a lot of people's code
would break if ndindex no longer was an iter. I also like the name
ndrange for its allusion to python3's range behavior. That makes me lean
towards the second option of a separate ndrange, with possible
deprecation of ndindex.

> itertools.product + range seems to be much faster than the current
> implementation of ndindex
>
> (python 3.6)
> ```
> %%timeit
>
> for i in np.ndindex(100, 100):
>     pass
> 3.94 ms ± 19.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>
> %%timeit
> import itertools
> for i in itertools.product(range(100), range(100)):
>     pass
> 231 µs ± 1.09 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
> ```

If the new code ends up faster than the old code, that's great, and
further justification for using ndrange instead of ndindex. I had
thought using nditer in the old code was fastest.

So as far as I am concerned, I say go ahead with the PR the way you are
doing it.

Allan
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: ndrange, like range but multidimensiontal

hmaarrfk

Eric,

Great point. The multi-dimensional slicing and sequence return type is definitely strange. I was thinking about that last night.
I’m a little new to the __array__ methods.
Are you saying that the sequence behaviour would stay the same, (ie. __iter__, __revesed__, __contains__), but
np.asarray(np.ndrange((3, 3)))
would return something like an array of tuples?
I’m not sure this is something that anybody can’t already with do meshgrid + stack

and only implement methods already present in numpy.

I’m not sure what this means.

I’ll note that in Python 3, range is it’s own thing. It is still a sequence type but it doesn’t support addition.
I’m kinda ok with ndrange/ndindex being a sequence type, supporting ND slicing, but not being an array ;)

I’m kinda warming up to the idea of expanding ndindex.

  1. The additional start and step can be omitted from ndindex for a while (indefinitely?). Slicing is way more convenient anyway.
  2. Warnings can help people move from nd.index(1, 2, 3) to nd.index((1, 2, 3))
  3. ndindex can return a seperate iterator, but the ndindex object would hold a reference to it. Calls to ndindex.__next__ would simply return next(of_that_object)
    Note. This would break introspection since the iterator is no longer ndindex type. I’m kinda OK with this though, but breaking code is never nice :(
  4. Bench-marking can help motivate the choice of iterator used for step=(1,) * N start=(0,) * N
  5. Wait until 2019 because I don’t want to deal with performance regressions of potentially using range in Python2 and I don’t want this to motivate any implementation details.

Mark


On Wed, Oct 10, 2018 at 12:36 AM Eric Wieser <[hidden email]> wrote:

One thing that worries me here - in python, range(...) in essence generates a lazy list - so I’d expect ndrange to generate a lazy ndarray. In practice, that means it would be a duck-type defining an __array__ method to evaluate it, and only implement methods already present in numpy.

It’s not clear to me what the datatype of such an array-like would be. Candidates I can think of are:

  1. [('i0', intp), ('i1', intp), ...], but this makes tuple coercion a little awkward
  2. (intp, (N,)) - which collapses into a shape + (3,) array
  3. object_.
  4. Some new np.tuple_ dtype, a heterogenous tuple, which is like the structured np.void but without field names. I’m not sure how vectorized element indexing would be spelt though.

Eric


On Tue, 9 Oct 2018 at 21:59 Stephan Hoyer <[hidden email]> wrote:
The speed difference is interesting but really a different question than the public API.

I'm coming around to ndrange(). I can see how it could be useful for symbolic manipulation of arrays and indexing operations, similar to what we do in dask and xarray.

On Mon, Oct 8, 2018 at 4:25 PM Mark Harfouche <[hidden email]> wrote:
since ndrange is a superset of the features of ndindex, we can implement ndindex with ndrange or keep it as is.
ndindex is now a glorified `nditer` object anyway. So it isn't so much of a maintenance burden.
As for how ndindex is implemented, I'm a little worried about python 2 performance seeing as range is a list.
I would wait on changing the way ndindex is implemented for now.

I agree with Stephan that ndindex should be kept in. Many want backward compatible code. It would be hard for me to justify why a dependency should be bumped up to bleeding edge numpy just for a convenience iterator.

Honestly, I was really surprised to see such a speed difference, I thought it would have been closer.

Allan, I decided to run a few more benchmarks, the nditer just seems slow for single array access some reason. Maybe a bug?

```
import numpy as np
import itertools
a = np.ones((1000, 1000))

b = {}
for i in np.ndindex(a.shape):
    b[i] = i

%%timeit
# op_flag=('readonly',) doesn't change performance
for a_value in np.nditer(a):
    pass
109 ms ± 921 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
for i in itertools.product(range(1000), range(1000)):
    a_value = a[i]
113 ms ± 1.72 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
for i in itertools.product(range(1000), range(1000)):
    c = b[i]
193 ms ± 3.89 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
for a_value in a.flat:
    pass
25.3 ms ± 278 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
for k, v in b.items():
    pass
19.9 ms ± 675 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
for i in itertools.product(range(1000), range(1000)):
    pass
28 ms ± 715 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
```

On Mon, Oct 8, 2018 at 4:26 PM Stephan Hoyer <[hidden email]> wrote:
I'm open to adding ndrange, and "soft-deprecating" ndindex (i.e., discouraging its use in our docs, but not actually deprecating it). Certainly ndrange seems like a small but meaningful improvement in the interface.

That said, I'm not convinced this is really worth the trouble. I think the nested loop is still pretty readable/clear, and there are few times when I've actually found ndindex() be useful.

On Mon, Oct 8, 2018 at 12:35 PM Allan Haldane <[hidden email]> wrote:
On 10/8/18 12:21 PM, Mark Harfouche wrote:
> 2. `ndindex` is an iterator itself. As proposed, `ndrange`, like
> `range`, is not an iterator. Changing this behaviour would likely lead
> to breaking code that uses that assumption. For example anybody using
> introspection or code like:
>
> ```
> indx = np.ndindex(5, 5)
> next(indx)  # Don't look at the (0, 0) coordinate
> for i in indx:
>     print(i)
> ```
> would break if `ndindex` becomes "not an iterator"

OK, I see now. Just like python3 has separate range and range_iterator
types, where range is sliceable, we would have separate ndrange and
ndindex types, where ndrange is sliceable. You're just copying the
python3 api. That justifies it pretty well for me.

I still think we shouldn't have two functions which do nearly the same
thing. We should only have one, and get rid of the other. I see two ways
forward:

 * replace ndindex by your ndrange code, so it is no longer an iter.
   This would require some deprecation cycles for the cases that break.
 * deprecate ndindex in favor of a new function ndrange. We would keep
   ndindex around for back-compatibility, with a dep warning to use
   ndrange instead.

Doing a code search on github, I can see that a lot of people's code
would break if ndindex no longer was an iter. I also like the name
ndrange for its allusion to python3's range behavior. That makes me lean
towards the second option of a separate ndrange, with possible
deprecation of ndindex.

> itertools.product + range seems to be much faster than the current
> implementation of ndindex
>
> (python 3.6)
> ```
> %%timeit
>
> for i in np.ndindex(100, 100):
>     pass
> 3.94 ms ± 19.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>
> %%timeit
> import itertools
> for i in itertools.product(range(100), range(100)):
>     pass
> 231 µs ± 1.09 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
> ```

If the new code ends up faster than the old code, that's great, and
further justification for using ndrange instead of ndindex. I had
thought using nditer in the old code was fastest.

So as far as I am concerned, I say go ahead with the PR the way you are
doing it.

Allan
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: ndrange, like range but multidimensiontal

Stephan Hoyer-2
In reply to this post by Eric Wieser
On Tue, Oct 9, 2018 at 9:34 PM Eric Wieser <[hidden email]> wrote:

One thing that worries me here - in python, range(...) in essence generates a lazy list - so I’d expect ndrange to generate a lazy ndarray. In practice, that means it would be a duck-type defining an __array__ method to evaluate it, and only implement methods already present in numpy.

It’s not clear to me what the datatype of such an array-like would be. Candidates I can think of are:

  1. [('i0', intp), ('i1', intp), ...], but this makes tuple coercion a little awkward
I think this would be the appropriate choice. What about it makes tuple coercion awkward? If you use this as the dtype, you both set and get element as tuples.

In particular, I would say that ndrange() should be a lazy equivalent to the following explicit constructor:

def ndrange(shape):
dtype = [('i' + str(i), np.intp) for i in range(len(shape))]
array = np.empty(shape, dtype)
for indices in np.ndindex(*shape):
array[indices] = indices
return array

>>> ndrange((2,)
array([(0,), (1,)], dtype=[('i0', '<i8')])

>>> ndrange((2, 3))
array([[(0, 0), (0, 1), (0, 2)], [(1, 0), (1, 1), (1, 2)]], dtype=[('i0', '<i8'), ('i1', '<i8')])

The one deviation in behavior would be that ndrange() iterates over flattened elements rather than the first axes.

It is indeed a little awkward to have field names, but given that NumPy creates those automatically when you supply a dtype like 'i8,i8' this is probably a reasonable choice.

  1. (intp, (N,)) - which collapses into a shape + (3,) array
  2. object_.
  3. Some new np.tuple_ dtype, a heterogenous tuple, which is like the structured np.void but without field names. I’m not sure how vectorized element indexing would be spelt though.

Eric


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: ndrange, like range but multidimensiontal

Allan Haldane
In reply to this post by Eric Wieser
On 10/10/18 12:34 AM, Eric Wieser wrote:
> One thing that worries me here - in python, |range(...)| in essence
> generates a lazy |list| - so I’d expect |ndrange| to generate a lazy
> |ndarray|. In practice, that means it would be a duck-type defining an
> |__array__| method to evaluate it, and only implement methods already
> present in numpy.

Isn't that what arange is for?

It seems like there are two uses of python3's range: 1. creating a 1d
iterable of indices for use in for-loops, and 2. with list(range) can be
used to create a sequence of integers.

Numpy can extend this in two directions:
 * ndrange returns an iterable of nd indices (for for-loops).
 * arange returns an 1d ndarray of integers instead of a list

The application of for-loops, which is more niche, doesn't need
ndarray's vectorized properties, so I'm not convinced it should return
an ndarray. It certainly seems simpler not to return an ndarray, due to
the dtype question.

arange on its own seems to cover the need for a vectorized version of range.

Allan
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: ndrange, like range but multidimensiontal

hmaarrfk

I’m really open to these kinds of array extensions but, I (personally) just don’t know how to do this efficiently.
I feel like ogrid and mgrid are probably enough for people that want think kind of feature.

My implementation would just be based on python primitives which would yield performance similar to

In [2]: %timeit np.arange(1000)
1.25 µs ± 4.01 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [4]: %timeit np.asarray(range(1000))
99.6 µs ± 1.38 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Here is how mgrid can be used to return something similar to the indices from ndrange

In [10]: np.mgrid[1:10:3, 2:10:3][:, 1, 1]
Out[10]: array([4, 5])

In [13]: np.ndrange((10, 10))[1::3, 2::3][1, 1]
Out[13]: (4, 5)

On Wed, Oct 10, 2018 at 2:22 PM Allan Haldane <[hidden email]> wrote:
On 10/10/18 12:34 AM, Eric Wieser wrote:
> One thing that worries me here - in python, |range(...)| in essence
> generates a lazy |list| - so I’d expect |ndrange| to generate a lazy
> |ndarray|. In practice, that means it would be a duck-type defining an
> |__array__| method to evaluate it, and only implement methods already
> present in numpy.

Isn't that what arange is for?

It seems like there are two uses of python3's range: 1. creating a 1d
iterable of indices for use in for-loops, and 2. with list(range) can be
used to create a sequence of integers.

Numpy can extend this in two directions:
 * ndrange returns an iterable of nd indices (for for-loops).
 * arange returns an 1d ndarray of integers instead of a list

The application of for-loops, which is more niche, doesn't need
ndarray's vectorized properties, so I'm not convinced it should return
an ndarray. It certainly seems simpler not to return an ndarray, due to
the dtype question.

arange on its own seems to cover the need for a vectorized version of range.

Allan
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: ndrange, like range but multidimensiontal

Eric Wieser
In reply to this post by Stephan Hoyer-2

If you use this as the dtype, you both set and get element as tuples.

Elements are not got as tuples, but they can be explicitly cast

What about it makes tuple coercion awkward?

This explicit cast

>>> dt_ind2d = np.dtype([('i0', np.intp), ('i1', np.intp)])
>>> ind = np.zeros((), dt_ind2d)[0]
>>> ind, type(ind)
((0, 0), <class 'numpy.void'>)
>>> m[ind]
Traceback (most recent call last):
  File "<pyshell#17>", line 1, in <module>
    m[inds[0]]
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
>>> m[tuple(ind)]
1.0

On Wed, 10 Oct 2018 at 09:08 Stephan Hoyer shoyer@... wrote:

On Tue, Oct 9, 2018 at 9:34 PM Eric Wieser <[hidden email]> wrote:

One thing that worries me here - in python, range(...) in essence generates a lazy list - so I’d expect ndrange to generate a lazy ndarray. In practice, that means it would be a duck-type defining an __array__ method to evaluate it, and only implement methods already present in numpy.

It’s not clear to me what the datatype of such an array-like would be. Candidates I can think of are:

  1. [('i0', intp), ('i1', intp), ...], but this makes tuple coercion a little awkward
I think this would be the appropriate choice. What about it makes tuple coercion awkward? If you use this as the dtype, you both set and get element as tuples.

In particular, I would say that ndrange() should be a lazy equivalent to the following explicit constructor:

def ndrange(shape):
dtype = [('i' + str(i), np.intp) for i in range(len(shape))]
array = np.empty(shape, dtype)
for indices in np.ndindex(*shape):
array[indices] = indices
return array

>>> ndrange((2,)
array([(0,), (1,)], dtype=[('i0', '<i8')])

>>> ndrange((2, 3))
array([[(0, 0), (0, 1), (0, 2)], [(1, 0), (1, 1), (1, 2)]], dtype=[('i0', '<i8'), ('i1', '<i8')])

The one deviation in behavior would be that ndrange() iterates over flattened elements rather than the first axes.

It is indeed a little awkward to have field names, but given that NumPy creates those automatically when you supply a dtype like 'i8,i8' this is probably a reasonable choice.

  1. (intp, (N,)) - which collapses into a shape + (3,) array
  2. object_.
  3. Some new np.tuple_ dtype, a heterogenous tuple, which is like the structured np.void but without field names. I’m not sure how vectorized element indexing would be spelt though.

Eric

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: ndrange, like range but multidimensiontal

Eric Wieser
In reply to this post by Allan Haldane

Isn’t that what arange is for?

Imagining ourselves in python2 land for now - I’m proposing arange is to range, as ndrange is to xrange

I’m not convinced it should return an ndarray

I agree - I think it should return a range-like object that:

  • Is convertible via __array__ if needed
  • Looks like an ndarray, with:
    • a .dtype attribute
    • a __getitem__(Tuple[int]) which returns numpy scalars
    • .ravel() and .flat for choosing iteration order.

On Wed, 10 Oct 2018 at 11:21 Allan Haldane allanhaldane@... wrote:

On 10/10/18 12:34 AM, Eric Wieser wrote:
> One thing that worries me here - in python, |range(...)| in essence
> generates a lazy |list| - so I’d expect |ndrange| to generate a lazy
> |ndarray|. In practice, that means it would be a duck-type defining an
> |__array__| method to evaluate it, and only implement methods already
> present in numpy.

Isn't that what arange is for?

It seems like there are two uses of python3's range: 1. creating a 1d
iterable of indices for use in for-loops, and 2. with list(range) can be
used to create a sequence of integers.

Numpy can extend this in two directions:
 * ndrange returns an iterable of nd indices (for for-loops).
 * arange returns an 1d ndarray of integers instead of a list

The application of for-loops, which is more niche, doesn't need
ndarray's vectorized properties, so I'm not convinced it should return
an ndarray. It certainly seems simpler not to return an ndarray, due to
the dtype question.

arange on its own seems to cover the need for a vectorized version of range.

Allan
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: ndrange, like range but multidimensiontal

hmaarrfk
Eric, interesting ideas.

> __getitem__(Tuple[int]) which returns numpy scalars

I'm not sure what you mean. Even if you supply a numpy uint8 to range, it still returns a python int class.
Would you like ndrange to return a tuple of `uint8` in this case?

```
In [3]: a = iter(range(np.uint8(10)))                                          

In [4]: next(a).__class__                                                      
Out[4]: int

In [5]: np.uint8(10).__class__                                                 
Out[5]: numpy.uint8
```

Ravel seems like a cool way to choose iteration order. In the PR, I mentionned that one reason that I removed `'F'` order from the PR was:
1. My implementation was not competitive with the `C` order implementation in terms of speed (can be fixed)
2. I don't know if it something that people really need to iterate over collections (annoying to maintain if unused)

Instead, I just showed an example how people could iterate in `F` order should they need to.

I'm not sure if we ever want the `ndrange` object to return a full matrix. It seems like we would be creating a custom tuple class just for this which seems pretty niche.


On Thu, Oct 11, 2018 at 10:21 AM Eric Wieser <[hidden email]> wrote:

Isn’t that what arange is for?

Imagining ourselves in python2 land for now - I’m proposing arange is to range, as ndrange is to xrange

I’m not convinced it should return an ndarray

I agree - I think it should return a range-like object that:

  • Is convertible via __array__ if needed
  • Looks like an ndarray, with:
    • a .dtype attribute
    • a __getitem__(Tuple[int]) which returns numpy scalars
    • .ravel() and .flat for choosing iteration order.

On Wed, 10 Oct 2018 at 11:21 Allan Haldane allanhaldane@... wrote:

On 10/10/18 12:34 AM, Eric Wieser wrote:
> One thing that worries me here - in python, |range(...)| in essence
> generates a lazy |list| - so I’d expect |ndrange| to generate a lazy
> |ndarray|. In practice, that means it would be a duck-type defining an
> |__array__| method to evaluate it, and only implement methods already
> present in numpy.

Isn't that what arange is for?

It seems like there are two uses of python3's range: 1. creating a 1d
iterable of indices for use in for-loops, and 2. with list(range) can be
used to create a sequence of integers.

Numpy can extend this in two directions:
 * ndrange returns an iterable of nd indices (for for-loops).
 * arange returns an 1d ndarray of integers instead of a list

The application of for-loops, which is more niche, doesn't need
ndarray's vectorized properties, so I'm not convinced it should return
an ndarray. It certainly seems simpler not to return an ndarray, due to
the dtype question.

arange on its own seems to cover the need for a vectorized version of range.

Allan
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: ndrange, like range but multidimensiontal

Eric Wieser

I’m not sure if we ever want the ndrange object to return a full matrix.

np.array(ndrange(...)) should definitely return a full array, because that’s what the user asked for.

Even if you supply a numpy uint8 to range, it still returns a python int class.

If we want to design ndrange with the intent of indexing only, then it should probably always use np.intp, whatever the type of the provided arguments

Would you like ndrange to return a tuple of uint8 in this case?

Tuples are just one of the four options I listed in a previous message. The downside of tuples is there’s no easy way to say “take just the first axis of this range”.
Whatever we pick, the return value should be such that np.array(ndrange(...))[ind] == ndrange(...)[idx]


On Thu, 11 Oct 2018 at 18:54 Mark Harfouche <[hidden email]> wrote:
Eric, interesting ideas.

> __getitem__(Tuple[int]) which returns numpy scalars

I'm not sure what you mean. Even if you supply a numpy uint8 to range, it still returns a python int class.
Would you like ndrange to return a tuple of `uint8` in this case?

```
In [3]: a = iter(range(np.uint8(10)))                                          

In [4]: next(a).__class__                                                      
Out[4]: int

In [5]: np.uint8(10).__class__                                                 
Out[5]: numpy.uint8
```

Ravel seems like a cool way to choose iteration order. In the PR, I mentionned that one reason that I removed `'F'` order from the PR was:
1. My implementation was not competitive with the `C` order implementation in terms of speed (can be fixed)
2. I don't know if it something that people really need to iterate over collections (annoying to maintain if unused)

Instead, I just showed an example how people could iterate in `F` order should they need to.

I'm not sure if we ever want the `ndrange` object to return a full matrix. It seems like we would be creating a custom tuple class just for this which seems pretty niche.


On Thu, Oct 11, 2018 at 10:21 AM Eric Wieser <[hidden email]> wrote:

Isn’t that what arange is for?

Imagining ourselves in python2 land for now - I’m proposing arange is to range, as ndrange is to xrange

I’m not convinced it should return an ndarray

I agree - I think it should return a range-like object that:

  • Is convertible via __array__ if needed
  • Looks like an ndarray, with:
    • a .dtype attribute
    • a __getitem__(Tuple[int]) which returns numpy scalars
    • .ravel() and .flat for choosing iteration order.

On Wed, 10 Oct 2018 at 11:21 Allan Haldane allanhaldane@... wrote:

On 10/10/18 12:34 AM, Eric Wieser wrote:
> One thing that worries me here - in python, |range(...)| in essence
> generates a lazy |list| - so I’d expect |ndrange| to generate a lazy
> |ndarray|. In practice, that means it would be a duck-type defining an
> |__array__| method to evaluate it, and only implement methods already
> present in numpy.

Isn't that what arange is for?

It seems like there are two uses of python3's range: 1. creating a 1d
iterable of indices for use in for-loops, and 2. with list(range) can be
used to create a sequence of integers.

Numpy can extend this in two directions:
 * ndrange returns an iterable of nd indices (for for-loops).
 * arange returns an 1d ndarray of integers instead of a list

The application of for-loops, which is more niche, doesn't need
ndarray's vectorized properties, so I'm not convinced it should return
an ndarray. It certainly seems simpler not to return an ndarray, due to
the dtype question.

arange on its own seems to cover the need for a vectorized version of range.

Allan
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: ndrange, like range but multidimensiontal

hmaarrfk
> If we want to design ndrange with the intent of indexing only

This is the only use I had in mind. But I feel like you are able to envision different use cases.

>  Whatever we pick, the return value should be such that np.array(ndrange(...))[ind] == ndrange(...)[idx]
I can see the appeal to this.

On Thu, Oct 11, 2018 at 10:31 PM Eric Wieser <[hidden email]> wrote:

I’m not sure if we ever want the ndrange object to return a full matrix.

np.array(ndrange(...)) should definitely return a full array, because that’s what the user asked for.

Even if you supply a numpy uint8 to range, it still returns a python int class.

If we want to design ndrange with the intent of indexing only, then it should probably always use np.intp, whatever the type of the provided arguments

Would you like ndrange to return a tuple of uint8 in this case?

Tuples are just one of the four options I listed in a previous message. The downside of tuples is there’s no easy way to say “take just the first axis of this range”.
Whatever we pick, the return value should be such that np.array(ndrange(...))[ind] == ndrange(...)[idx]


On Thu, 11 Oct 2018 at 18:54 Mark Harfouche <[hidden email]> wrote:
Eric, interesting ideas.

> __getitem__(Tuple[int]) which returns numpy scalars

I'm not sure what you mean. Even if you supply a numpy uint8 to range, it still returns a python int class.
Would you like ndrange to return a tuple of `uint8` in this case?

```
In [3]: a = iter(range(np.uint8(10)))                                          

In [4]: next(a).__class__                                                      
Out[4]: int

In [5]: np.uint8(10).__class__                                                 
Out[5]: numpy.uint8
```

Ravel seems like a cool way to choose iteration order. In the PR, I mentionned that one reason that I removed `'F'` order from the PR was:
1. My implementation was not competitive with the `C` order implementation in terms of speed (can be fixed)
2. I don't know if it something that people really need to iterate over collections (annoying to maintain if unused)

Instead, I just showed an example how people could iterate in `F` order should they need to.

I'm not sure if we ever want the `ndrange` object to return a full matrix. It seems like we would be creating a custom tuple class just for this which seems pretty niche.


On Thu, Oct 11, 2018 at 10:21 AM Eric Wieser <[hidden email]> wrote:

Isn’t that what arange is for?

Imagining ourselves in python2 land for now - I’m proposing arange is to range, as ndrange is to xrange

I’m not convinced it should return an ndarray

I agree - I think it should return a range-like object that:

  • Is convertible via __array__ if needed
  • Looks like an ndarray, with:
    • a .dtype attribute
    • a __getitem__(Tuple[int]) which returns numpy scalars
    • .ravel() and .flat for choosing iteration order.

On Wed, 10 Oct 2018 at 11:21 Allan Haldane allanhaldane@... wrote:

On 10/10/18 12:34 AM, Eric Wieser wrote:
> One thing that worries me here - in python, |range(...)| in essence
> generates a lazy |list| - so I’d expect |ndrange| to generate a lazy
> |ndarray|. In practice, that means it would be a duck-type defining an
> |__array__| method to evaluate it, and only implement methods already
> present in numpy.

Isn't that what arange is for?

It seems like there are two uses of python3's range: 1. creating a 1d
iterable of indices for use in for-loops, and 2. with list(range) can be
used to create a sequence of integers.

Numpy can extend this in two directions:
 * ndrange returns an iterable of nd indices (for for-loops).
 * arange returns an 1d ndarray of integers instead of a list

The application of for-loops, which is more niche, doesn't need
ndarray's vectorized properties, so I'm not convinced it should return
an ndarray. It certainly seems simpler not to return an ndarray, due to
the dtype question.

arange on its own seems to cover the need for a vectorized version of range.

Allan
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion