Add guaranteed no-copy to array creation and reshape?

classic Classic list List threaded Threaded
33 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Add guaranteed no-copy to array creation and reshape?

Sebastian Berg
Hi all,

In https://github.com/numpy/numpy/pull/11897 I am looking into the
addition of a `copy=np.never_copy` argument to:
  * np.array
  * arr.reshape/np.reshape
  * arr.astype

Which would cause an error to be raised when numpy cannot guarantee
that the returned array is a view of the input array.
The motivation is to easier avoid accidental copies of large data, or
ensure that in-place manipulation will be meaningful.

The copy flag API would be:
  * `copy=True` forces a copy
  * `copy=False` allows numpy to copy if necessary
  * `copy=np.never_copy` will error if a copy would be necessary
  * (almost) all other input will be deprecated.

Unfortunately using `copy="never"` is tricky, because currently
`np.array(..., copy="never")` behaves exactly the same as
`np.array(..., copy=bool("never"))`. So that the wrong result would be
given on old numpy versions and it would be a behaviour change.

Some things that are a not so nice maybe:
 * adding/using `np.never_copy` is not very nice
 * Scalars need a copy and so will not be allowed
 * For rare array-likes numpy may not be able to guarantee no-copy,
   although it could happen (but should not).


The history is that a long while ago I considered adding a copy flag to
`reshape` so that it is possible to do `copy=np.never_copy` (or
similar) to ensure that no copy is made. In these, you may want
something like an assertion:

```
new_arr = arr.reshape(new_shape)
assert np.may_share_memory(arr, new_arr)

# Which is sometimes -- but should not be -- written as:
arr.shape = new_shape  # unnecessary container modification

# Or:
view = np.array(arr, order="F")
assert np.may_share_memory(arr, new_arr)
```

but is more readable and will not cause an intermediate copy on error.


So what do you think? Other variants would be to not expose this for
`np.array` and probably limit `copy="never"` to the reshape method. Or
just to not do it at all. Or to also accept "never" for `reshape`,
although I think I would prefer to keep it in sync and wait for a few
years to consider that.

Best,

Sebastian


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Add guaranteed no-copy to array creation and reshape?

ralfgommers


On Wed, Dec 26, 2018 at 3:29 PM Sebastian Berg <[hidden email]> wrote:
Hi all,

In https://github.com/numpy/numpy/pull/11897 I am looking into the
addition of a `copy=np.never_copy` argument to:
  * np.array
  * arr.reshape/np.reshape
  * arr.astype

Which would cause an error to be raised when numpy cannot guarantee
that the returned array is a view of the input array.
The motivation is to easier avoid accidental copies of large data, or
ensure that in-place manipulation will be meaningful.

The copy flag API would be:
  * `copy=True` forces a copy
  * `copy=False` allows numpy to copy if necessary
  * `copy=np.never_copy` will error if a copy would be necessary
  * (almost) all other input will be deprecated.

Unfortunately using `copy="never"` is tricky, because currently
`np.array(..., copy="never")` behaves exactly the same as
`np.array(..., copy=bool("never"))`. So that the wrong result would be
given on old numpy versions and it would be a behaviour change.

I think np.never_copy is really ugly. I'd much rather simply use 'never', and clearly document that if users start using this and they critically rely on it really being never, then they should ensure that their code is only used with numpy >= 1.17.0.

Note also that this would not be a backwards compatibility break, because `copy` is now clearly documented as bool, and not bool_like or some such thing. So we do not need to worry about the very improbable case that users now are using `copy='never'`.

If others think `copy='never'` isn't acceptable now, there are two other options:
1. add code first to catch `copy='never'` in 1.17.x and raise on it, then in a later numpy version introduce it.
2. just do nothing. I'd prefer that over `np.never_copy`.

Cheers,
Ralf


Some things that are a not so nice maybe:
 * adding/using `np.never_copy` is not very nice
 * Scalars need a copy and so will not be allowed
 * For rare array-likes numpy may not be able to guarantee no-copy,
   although it could happen (but should not).


The history is that a long while ago I considered adding a copy flag to
`reshape` so that it is possible to do `copy=np.never_copy` (or
similar) to ensure that no copy is made. In these, you may want
something like an assertion:

```
new_arr = arr.reshape(new_shape)
assert np.may_share_memory(arr, new_arr)

# Which is sometimes -- but should not be -- written as:
arr.shape = new_shape  # unnecessary container modification

# Or:
view = np.array(arr, order="F")
assert np.may_share_memory(arr, new_arr)
```

but is more readable and will not cause an intermediate copy on error.


So what do you think? Other variants would be to not expose this for
`np.array` and probably limit `copy="never"` to the reshape method. Or
just to not do it at all. Or to also accept "never" for `reshape`,
although I think I would prefer to keep it in sync and wait for a few
years to consider that.

Best,

Sebastian

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Add guaranteed no-copy to array creation and reshape?

Sebastian Berg
On Wed, 2018-12-26 at 16:40 -0800, Ralf Gommers wrote:

>
>
> On Wed, Dec 26, 2018 at 3:29 PM Sebastian Berg <
> [hidden email]> wrote:
> > Hi all,
> >
> > In https://github.com/numpy/numpy/pull/11897 I am looking into the
> > addition of a `copy=np.never_copy` argument to:
> >   * np.array
> >   * arr.reshape/np.reshape
> >   * arr.astype
> >
> > Which would cause an error to be raised when numpy cannot guarantee
> > that the returned array is a view of the input array.
> > The motivation is to easier avoid accidental copies of large data,
> > or
> > ensure that in-place manipulation will be meaningful.
> >
> > The copy flag API would be:
> >   * `copy=True` forces a copy
> >   * `copy=False` allows numpy to copy if necessary
> >   * `copy=np.never_copy` will error if a copy would be necessary
> >   * (almost) all other input will be deprecated.
> >
> > Unfortunately using `copy="never"` is tricky, because currently
> > `np.array(..., copy="never")` behaves exactly the same as
> > `np.array(..., copy=bool("never"))`. So that the wrong result would
> > be
> > given on old numpy versions and it would be a behaviour change.
>
> I think np.never_copy is really ugly. I'd much rather simply use
> 'never', and clearly document that if users start using this and they
> critically rely on it really being never, then they should ensure
> that their code is only used with numpy >= 1.17.0.
>
> Note also that this would not be a backwards compatibility break,
> because `copy` is now clearly documented as bool, and not bool_like
> or some such thing. So we do not need to worry about the very
> improbable case that users now are using `copy='never'`.
>
I agree that it is much nicer to pass "never" and that I am not too
worried about people actually passing in strings. But, I am a bit
worried of new code silently doing the exact opposite when run on an
older version. Although, I guess we typically do not worry too much
about such compatibilities and it is possible to make a note of it in
the docs.

- Sebastian

> If others think `copy='never'` isn't acceptable now, there are two
> other options:
> 1. add code first to catch `copy='never'` in 1.17.x and raise on it,
> then in a later numpy version introduce it.
> 2. just do nothing. I'd prefer that over `np.never_copy`.
>
> Cheers,
> Ralf
>
> > Some things that are a not so nice maybe:
> >  * adding/using `np.never_copy` is not very nice
> >  * Scalars need a copy and so will not be allowed
> >  * For rare array-likes numpy may not be able to guarantee no-copy,
> >    although it could happen (but should not).
> >
> >
> > The history is that a long while ago I considered adding a copy
> > flag to
> > `reshape` so that it is possible to do `copy=np.never_copy` (or
> > similar) to ensure that no copy is made. In these, you may want
> > something like an assertion:
> >
> > ```
> > new_arr = arr.reshape(new_shape)
> > assert np.may_share_memory(arr, new_arr)
> >
> > # Which is sometimes -- but should not be -- written as:
> > arr.shape = new_shape  # unnecessary container modification
> >
> > # Or:
> > view = np.array(arr, order="F")
> > assert np.may_share_memory(arr, new_arr)
> > ```
> >
> > but is more readable and will not cause an intermediate copy on
> > error.
> >
> >
> > So what do you think? Other variants would be to not expose this
> > for
> > `np.array` and probably limit `copy="never"` to the reshape method.
> > Or
> > just to not do it at all. Or to also accept "never" for `reshape`,
> > although I think I would prefer to keep it in sync and wait for a
> > few
> > years to consider that.
> >
> > Best,
> >
> > Sebastian
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > [hidden email]
> > https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Add guaranteed no-copy to array creation and reshape?

Eric Wieser
In reply to this post by ralfgommers

I think np.never_copy is really ugly. I’d much rather simply use ‘never’,

I actually think np.never_copy is the better choice of the two. Arguments for it:

  • It’s consistent with np.newaxis in spelling (modulo the _)
  • If mistyped, it can be caught easily by IDEs.
  • It’s typeable with mypy, unlike constant string literals which currently aren’t
  • If code written against new numpy is run on old numpy, it will error rather than doing the wrong thing
  • It won’t be possible to miss parts of our existing API that evaluate a copy argument in a boolean context
  • It won’t be possible for downstream projects to misinterpret by not checking for ‘never’ - an error will be raised instead.

Arguments against it:

  • It’s more characters than "never"
  • The implementation of np.never_copy is a little verbose / ugly

Eric

On Thu, Dec 27, 2018, 1:41 AM Ralf Gommers <[hidden email] wrote:



On Wed, Dec 26, 2018 at 3:29 PM Sebastian Berg <[hidden email]> wrote:
Hi all,

In https://github.com/numpy/numpy/pull/11897 I am looking into the
addition of a `copy=np.never_copy` argument to:
  * np.array
  * arr.reshape/np.reshape
  * arr.astype

Which would cause an error to be raised when numpy cannot guarantee
that the returned array is a view of the input array.
The motivation is to easier avoid accidental copies of large data, or
ensure that in-place manipulation will be meaningful.

The copy flag API would be:
  * `copy=True` forces a copy
  * `copy=False` allows numpy to copy if necessary
  * `copy=np.never_copy` will error if a copy would be necessary
  * (almost) all other input will be deprecated.

Unfortunately using `copy="never"` is tricky, because currently
`np.array(..., copy="never")` behaves exactly the same as
`np.array(..., copy=bool("never"))`. So that the wrong result would be
given on old numpy versions and it would be a behaviour change.

I think np.never_copy is really ugly. I'd much rather simply use 'never', and clearly document that if users start using this and they critically rely on it really being never, then they should ensure that their code is only used with numpy >= 1.17.0.

Note also that this would not be a backwards compatibility break, because `copy` is now clearly documented as bool, and not bool_like or some such thing. So we do not need to worry about the very improbable case that users now are using `copy='never'`.

If others think `copy='never'` isn't acceptable now, there are two other options:
1. add code first to catch `copy='never'` in 1.17.x and raise on it, then in a later numpy version introduce it.
2. just do nothing. I'd prefer that over `np.never_copy`.

Cheers,
Ralf


Some things that are a not so nice maybe:
 * adding/using `np.never_copy` is not very nice
 * Scalars need a copy and so will not be allowed
 * For rare array-likes numpy may not be able to guarantee no-copy,
   although it could happen (but should not).


The history is that a long while ago I considered adding a copy flag to
`reshape` so that it is possible to do `copy=np.never_copy` (or
similar) to ensure that no copy is made. In these, you may want
something like an assertion:

```
new_arr = arr.reshape(new_shape)
assert np.may_share_memory(arr, new_arr)

# Which is sometimes -- but should not be -- written as:
arr.shape = new_shape  # unnecessary container modification

# Or:
view = np.array(arr, order="F")
assert np.may_share_memory(arr, new_arr)
```

but is more readable and will not cause an intermediate copy on error.


So what do you think? Other variants would be to not expose this for
`np.array` and probably limit `copy="never"` to the reshape method. Or
just to not do it at all. Or to also accept "never" for `reshape`,
although I think I would prefer to keep it in sync and wait for a few
years to consider that.

Best,

Sebastian

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Add guaranteed no-copy to array creation and reshape?

Sebastian Berg
On Thu, 2018-12-27 at 14:50 +0100, Eric Wieser wrote:

> > I think np.never_copy is really ugly. I’d much rather simply use
> > ‘never’,
> >
>
> I actually think np.never_copy is the better choice of the two.
> Arguments for it:
>
> It’s consistent with np.newaxis in spelling (modulo the _)
> If mistyped, it can be caught easily by IDEs.
> It’s typeable with mypy, unlike constant string literals which
> currently aren’t
> If code written against new numpy is run on old numpy, it will error
> rather than doing the wrong thing
I am not sure I think that the above are too strong arguments, since it
is not what is typically done for keywords (nobody suggests np.CLIP
instead of "clip"). Unless you feel this is different because it is a
mix of strings and bools here?

> It won’t be possible to miss parts of our existing API that evaluate
> a copy argument in a boolean context
> It won’t be possible for downstream projects to misinterpret by not
> checking for ‘never’ - an error will be raised instead.

The last one is a good reason I think, since the dispatcher would pass
on the string, so a downstream API user which are not picky about the
input will interpret it wrong without the special argument.
Unless we replace the string when dispatching, which seems strange on
first sight.

Overall, I am still a bit undecided right now.

- Sebastian


> Arguments against it:
>
> It’s more characters than "never"
> The implementation of np.never_copy is a little verbose / ugly
> Eric
>
> On Thu, Dec 27, 2018, 1:41 AM Ralf Gommers <[hidden email]
> wrote:
>
> >
> > On Wed, Dec 26, 2018 at 3:29 PM Sebastian Berg <
> > [hidden email]> wrote:
> > > Hi all,
> > >
> > > In https://github.com/numpy/numpy/pull/11897 I am looking into
> > > the
> > > addition of a `copy=np.never_copy` argument to:
> > >   * np.array
> > >   * arr.reshape/np.reshape
> > >   * arr.astype
> > >
> > > Which would cause an error to be raised when numpy cannot
> > > guarantee
> > > that the returned array is a view of the input array.
> > > The motivation is to easier avoid accidental copies of large
> > > data, or
> > > ensure that in-place manipulation will be meaningful.
> > >
> > > The copy flag API would be:
> > >   * `copy=True` forces a copy
> > >   * `copy=False` allows numpy to copy if necessary
> > >   * `copy=np.never_copy` will error if a copy would be necessary
> > >   * (almost) all other input will be deprecated.
> > >
> > > Unfortunately using `copy="never"` is tricky, because currently
> > > `np.array(..., copy="never")` behaves exactly the same as
> > > `np.array(..., copy=bool("never"))`. So that the wrong result
> > > would be
> > > given on old numpy versions and it would be a behaviour change.
> >
> > I think np.never_copy is really ugly. I'd much rather simply use
> > 'never', and clearly document that if users start using this and
> > they critically rely on it really being never, then they should
> > ensure that their code is only used with numpy >= 1.17.0.
> >
> > Note also that this would not be a backwards compatibility break,
> > because `copy` is now clearly documented as bool, and not bool_like
> > or some such thing. So we do not need to worry about the very
> > improbable case that users now are using `copy='never'`.
> >
> > If others think `copy='never'` isn't acceptable now, there are two
> > other options:
> > 1. add code first to catch `copy='never'` in 1.17.x and raise on
> > it, then in a later numpy version introduce it.
> > 2. just do nothing. I'd prefer that over `np.never_copy`.
> >
> > Cheers,
> > Ralf
> >
> > > Some things that are a not so nice maybe:
> > >  * adding/using `np.never_copy` is not very nice
> > >  * Scalars need a copy and so will not be allowed
> > >  * For rare array-likes numpy may not be able to guarantee no-
> > > copy,
> > >    although it could happen (but should not).
> > >
> > >
> > > The history is that a long while ago I considered adding a copy
> > > flag to
> > > `reshape` so that it is possible to do `copy=np.never_copy` (or
> > > similar) to ensure that no copy is made. In these, you may want
> > > something like an assertion:
> > >
> > > ```
> > > new_arr = arr.reshape(new_shape)
> > > assert np.may_share_memory(arr, new_arr)
> > >
> > > # Which is sometimes -- but should not be -- written as:
> > > arr.shape = new_shape  # unnecessary container modification
> > >
> > > # Or:
> > > view = np.array(arr, order="F")
> > > assert np.may_share_memory(arr, new_arr)
> > > ```
> > >
> > > but is more readable and will not cause an intermediate copy on
> > > error.
> > >
> > >
> > > So what do you think? Other variants would be to not expose this
> > > for
> > > `np.array` and probably limit `copy="never"` to the reshape
> > > method. Or
> > > just to not do it at all. Or to also accept "never" for
> > > `reshape`,
> > > although I think I would prefer to keep it in sync and wait for a
> > > few
> > > years to consider that.
> > >
> > > Best,
> > >
> > > Sebastian
> > >
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > [hidden email]
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > [hidden email]
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Add guaranteed no-copy to array creation and reshape?

Juan Nunez-Iglesias
On Fri, Dec 28, 2018, at 3:40 AM, Sebastian Berg wrote:
> It’s consistent with np.newaxis in spelling (modulo the _)
> If mistyped, it can be caught easily by IDEs.
> It’s typeable with mypy, unlike constant string literals which
> currently aren’t
> If code written against new numpy is run on old numpy, it will error
> rather than doing the wrong thing

I am not sure I think that the above are too strong arguments, since it
is not what is typically done for keywords (nobody suggests np.CLIP
instead of "clip"). Unless you feel this is different because it is a
mix of strings and bools here?

:+1: to Eric's list. I don't think it's different because of the mix, I think it's different because deprecating things is painful. But now that we have good typing in Python, I think of string literals as an anti-pattern going forward.

But, for me, the biggest reason is the silent different behaviour in old vs new NumPy. As a package maintainer this is just a nightmare.

I have a lot of sympathy for the ugliness argument, but I don't think `np.never_copy` (or e.g. `np.modes.never`) is that much worse than a string.

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Add guaranteed no-copy to array creation and reshape?

ralfgommers


On Thu, Dec 27, 2018 at 3:27 PM Juan Nunez-Iglesias <[hidden email]> wrote:
On Fri, Dec 28, 2018, at 3:40 AM, Sebastian Berg wrote:
> It’s consistent with np.newaxis in spelling (modulo the _)
> If mistyped, it can be caught easily by IDEs.
> It’s typeable with mypy, unlike constant string literals which
> currently aren’t
> If code written against new numpy is run on old numpy, it will error
> rather than doing the wrong thing

I am not sure I think that the above are too strong arguments, since it
is not what is typically done for keywords (nobody suggests np.CLIP
instead of "clip"). Unless you feel this is different because it is a
mix of strings and bools here?

:+1: to Eric's list. I don't think it's different because of the mix, I think it's different because deprecating things is painful.

Technically there's nothing we are deprecating. If anything, the one not super uncommon pattern will be that people use 0/1 instead of False/True, which works as expected now and will start raising in case we'd go with np.never_copy

But now that we have good typing in Python, I think of string literals as an anti-pattern going forward.

I don't quite get that, it'll still be the most Pythonic thing to do in many API design scenarios. Adding a new class instance with unusual behavior like raising on bool() is not even a pattern, it would just be an oddity.


But, for me, the biggest reason is the silent different behaviour in old vs new NumPy. As a package maintainer this is just a nightmare.

That is the main downside indeed in this particular case.


I have a lot of sympathy for the ugliness argument, but I don't think `np.never_copy` (or e.g. `np.modes.never`) is that much worse than a string.

The point is not that `.reshape(..., copy=np.never_copy)` is uglier (it is, but indeed just a little), but that we're adding a new object to the numpy namespace that's without precedent. If we'd do that regularly the API would really blow up.

np.newaxis is not relevant here - it's a simple alias for None, is just there for code readability, and is much more widely applicable than np.never_copy would be.

Cheers,
Ralf




_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Add guaranteed no-copy to array creation and reshape?

Juan Nunez-Iglesias
On Fri, Dec 28, 2018, at 10:58 AM, Ralf Gommers wrote:

Technically there's nothing we are deprecating. If anything, the one not super uncommon pattern will be that people use 0/1 instead of False/True, which works as expected now and will start raising in case we'd go with np.never_copy

Oh, no, here I meant, I'd consider `np.modes.CLIP` to be something worth advocating for if it wasn't for deprecation cycles and extremely widespread usage.

But now that we have good typing in Python, I think of string literals as an anti-pattern going forward.

I don't quite get that, it'll still be the most Pythonic thing to do in many API design scenarios. Adding a new class instance with unusual behavior like raising on bool() is not even a pattern, it would just be an oddity.

Oh, no, I wouldn't suggest that the class raise on boolification, I suggest that the typing module warn if something other than a bool or an np.types.CopyMode is provided. Then you can have a useful typed function signature.

I have a lot of sympathy for the ugliness argument, but I don't think `np.never_copy` (or e.g. `np.modes.never`) is that much worse than a string.

The point is not that `.reshape(..., copy=np.never_copy)` is uglier (it is, but indeed just a little), but that we're adding a new object to the numpy namespace that's without precedent. If we'd do that regularly the API would really blow up.

Sure, I totally agree here, which is why I suggested `np.modes.never` as an alternative. Namespaces are a honking great idea!

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Add guaranteed no-copy to array creation and reshape?

Nathaniel Smith
In reply to this post by Juan Nunez-Iglesias
On Thu, Dec 27, 2018 at 3:27 PM Juan Nunez-Iglesias <[hidden email]> wrote:

>
> On Fri, Dec 28, 2018, at 3:40 AM, Sebastian Berg wrote:
>
> > It’s consistent with np.newaxis in spelling (modulo the _)
> > If mistyped, it can be caught easily by IDEs.
> > It’s typeable with mypy, unlike constant string literals which
> > currently aren’t
> > If code written against new numpy is run on old numpy, it will error
> > rather than doing the wrong thing
>
> I am not sure I think that the above are too strong arguments, since it
> is not what is typically done for keywords (nobody suggests np.CLIP
> instead of "clip"). Unless you feel this is different because it is a
> mix of strings and bools here?
>
>
> :+1: to Eric's list. I don't think it's different because of the mix, I think it's different because deprecating things is painful. But now that we have good typing in Python, I think of string literals as an anti-pattern going forward.

I've certainly seen people argue that it's better to use proper enum's
in this kind of case instead of strings. The 'enum' package is even
included in the stdlib on all our supported versions now:
https://docs.python.org/3/library/enum.html

I guess another possibility to throw out there would be a second
kwarg, require_view=False/True.

-n

--
Nathaniel J. Smith -- https://vorpus.org
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Add guaranteed no-copy to array creation and reshape?

Sebastian Berg
On Thu, 2018-12-27 at 17:45 -0800, Nathaniel Smith wrote:

> On Thu, Dec 27, 2018 at 3:27 PM Juan Nunez-Iglesias <
> [hidden email]> wrote:
> > On Fri, Dec 28, 2018, at 3:40 AM, Sebastian Berg wrote:
> >
> > > It’s consistent with np.newaxis in spelling (modulo the _)
> > > If mistyped, it can be caught easily by IDEs.
> > > It’s typeable with mypy, unlike constant string literals which
> > > currently aren’t
> > > If code written against new numpy is run on old numpy, it will
> > > error
> > > rather than doing the wrong thing
> >
> > I am not sure I think that the above are too strong arguments,
> > since it
> > is not what is typically done for keywords (nobody suggests np.CLIP
> > instead of "clip"). Unless you feel this is different because it is
> > a
> > mix of strings and bools here?
> >
> >
> > :+1: to Eric's list. I don't think it's different because of the
> > mix, I think it's different because deprecating things is painful.
> > But now that we have good typing in Python, I think of string
> > literals as an anti-pattern going forward.
>
> I've certainly seen people argue that it's better to use proper
> enum's
> in this kind of case instead of strings. The 'enum' package is even
> included in the stdlib on all our supported versions now:
> https://docs.python.org/3/library/enum.html
>
I am sympathetic with that, but it is something we (or scipy, etc.)
currently simply do not use, so I am not sure that I think it has much
validity at this time. That is least unless we agree to aim to use this
more generally in the future.

> I guess another possibility to throw out there would be a second
> kwarg, require_view=False/True.
>

My first gut feeling was that it is clumsy at least for `reshape`, but
one will always only use one of the two arguments at a time.
The more I look at it, the better the suggestion seems. Plus it reduces
the possible `copy=False` not meaning "never" confusion.

I think with a bit more pondering, that will become my favorite
solution.

- Sebastian


> -n
>

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Add guaranteed no-copy to array creation and reshape?

Matthias Geier
In reply to this post by Sebastian Berg
Hi Sebastian.

I don't have an opinion (yet) about this matter, but I have a question:

On Thu, Dec 27, 2018 at 12:30 AM Sebastian Berg wrote:

[...]

> new_arr = arr.reshape(new_shape)
> assert np.may_share_memory(arr, new_arr)
>
> # Which is sometimes -- but should not be -- written as:
> arr.shape = new_shape  # unnecessary container modification

[...]

Why is this discouraged?

Why do you call this "unnecessary container modification"?

I've used this idiom in the past for exactly those cases where I
wanted to make sure no copy is made.

And if we are not supposed to assign to arr.shape, why is it allowed
in the first place?

cheers,
Matthias
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Add guaranteed no-copy to array creation and reshape?

Sebastian Berg
On Sat, 2018-12-29 at 17:16 +0100, Matthias Geier wrote:

> Hi Sebastian.
>
> I don't have an opinion (yet) about this matter, but I have a
> question:
>
> On Thu, Dec 27, 2018 at 12:30 AM Sebastian Berg wrote:
>
> [...]
>
> > new_arr = arr.reshape(new_shape)
> > assert np.may_share_memory(arr, new_arr)
> >
> > # Which is sometimes -- but should not be -- written as:
> > arr.shape = new_shape  # unnecessary container modification
>
> [...]
>
> Why is this discouraged?
>
> Why do you call this "unnecessary container modification"?
>
> I've used this idiom in the past for exactly those cases where I
> wanted to make sure no copy is made.
>
> And if we are not supposed to assign to arr.shape, why is it allowed
> in the first place?
Well, this may be a matter of taste, but say you have an object that
stores an array:

class MyObject:
    def __init__(self):
        self.myarr = some_array


Now, lets say I do:

def some_func(arr):
    # Do something with the array:
    arr.shape = -1

myobject = MyObject()
some_func(myobject)

then myobject will suddenly have the wrong shape stored. In most cases
this is harmless, but I truly believe this is exactly why we have views
and why they are so awesome.
The content of arrays is mutable, but the array object itself should
not be muted normally. There may be some corner cases, but a lot of the
"than why is it allowed" questions are answered with: for history
reasons.

By the way, on error the `arr.shape = ...` code currently creates the
copy temporarily.

- Sebastian


>
> cheers,
> Matthias
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
>

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Add guaranteed no-copy to array creation and reshape?

Matthias Geier
On Sat, Dec 29, 2018 at 6:00 PM Sebastian Berg wrote:

>
> On Sat, 2018-12-29 at 17:16 +0100, Matthias Geier wrote:
> > Hi Sebastian.
> >
> > I don't have an opinion (yet) about this matter, but I have a
> > question:
> >
> > On Thu, Dec 27, 2018 at 12:30 AM Sebastian Berg wrote:
> >
> > [...]
> >
> > > new_arr = arr.reshape(new_shape)
> > > assert np.may_share_memory(arr, new_arr)
> > >
> > > # Which is sometimes -- but should not be -- written as:
> > > arr.shape = new_shape  # unnecessary container modification
> >
> > [...]
> >
> > Why is this discouraged?
> >
> > Why do you call this "unnecessary container modification"?
> >
> > I've used this idiom in the past for exactly those cases where I
> > wanted to make sure no copy is made.
> >
> > And if we are not supposed to assign to arr.shape, why is it allowed
> > in the first place?
>
> Well, this may be a matter of taste, but say you have an object that
> stores an array:
>
> class MyObject:
>     def __init__(self):
>         self.myarr = some_array
>
>
> Now, lets say I do:
>
> def some_func(arr):
>     # Do something with the array:
>     arr.shape = -1
>
> myobject = MyObject()
> some_func(myobject)
>
> then myobject will suddenly have the wrong shape stored. In most cases
> this is harmless, but I truly believe this is exactly why we have views
> and why they are so awesome.
> The content of arrays is mutable, but the array object itself should
> not be muted normally.

Thanks for the example! I don't understand its point, though.
Also, it's not working since MyObject doesn't have a .shape attribute.

> There may be some corner cases, but a lot of the
> "than why is it allowed" questions are answered with: for history
> reasons.

OK, that's a good point.

> By the way, on error the `arr.shape = ...` code currently creates the
> copy temporarily.

That's interesting and it should probably be fixed.

But it is not reason enough for me not to use it.
I find it important that is doesn't make a copy in the success case, I
don't care very much for the error case.

Would you mind elaborating on the real reasons why I shouldn't use it?

cheers,
Matthias

>
> - Sebastian
>
>
> >
> > cheers,
> > Matthias
> > _______________________________________________
> > NumPy-Discussion mailing list
> > [hidden email]
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Add guaranteed no-copy to array creation and reshape?

Sebastian Berg
On Sun, 2018-12-30 at 16:03 +0100, Matthias Geier wrote:

> On Sat, Dec 29, 2018 at 6:00 PM Sebastian Berg wrote:
> > On Sat, 2018-12-29 at 17:16 +0100, Matthias Geier wrote:
> > > Hi Sebastian.
> > >
> > > I don't have an opinion (yet) about this matter, but I have a
> > > question:
> > >
> > > On Thu, Dec 27, 2018 at 12:30 AM Sebastian Berg wrote:
> > >
> > > [...]
> > >
> > > > new_arr = arr.reshape(new_shape)
> > > > assert np.may_share_memory(arr, new_arr)
> > > >
> > > > # Which is sometimes -- but should not be -- written as:
> > > > arr.shape = new_shape  # unnecessary container modification
> > >
> > > [...]
> > >
> > > Why is this discouraged?
> > >
> > > Why do you call this "unnecessary container modification"?
> > >
> > > I've used this idiom in the past for exactly those cases where I
> > > wanted to make sure no copy is made.
> > >
> > > And if we are not supposed to assign to arr.shape, why is it
> > > allowed
> > > in the first place?
> >
> > Well, this may be a matter of taste, but say you have an object
> > that
> > stores an array:
> >
> > class MyObject:
> >     def __init__(self):
> >         self.myarr = some_array
> >
> >
> > Now, lets say I do:
> >
> > def some_func(arr):
> >     # Do something with the array:
> >     arr.shape = -1
> >
> > myobject = MyObject()
> > some_func(myobject)
> >
> > then myobject will suddenly have the wrong shape stored. In most
> > cases
> > this is harmless, but I truly believe this is exactly why we have
> > views
> > and why they are so awesome.
> > The content of arrays is mutable, but the array object itself
> > should
> > not be muted normally.
>
> Thanks for the example! I don't understand its point, though.
> Also, it's not working since MyObject doesn't have a .shape
> attribute.
>
The example should have called `some_func(myobject.arr)`. The thing is
that if you have more references to the same array around, you change
all their shapes. And if those other references are there for a reason,
that is not what you want.

That does not matter much in most cases, but it could change the shape
of an array in a completely different place then intended. Creating a
new view is cheap, so I think such things should be avoided.

I admit, most code will effectively do:
arr = input_arr[...]  # create a new view
arr.shape = ...

so that there is no danger. But conceptually, I do not think there
should be a danger of magically changing the shape of a stored array in
a different part of the code.

Does that make some sense? Maybe shorter example:

arr = np.arange(10)
arr2 = arr
arr2.shape = (5, 2)

print(arr.shape)  # also (5, 2)

so the arr container (shape, dtype) is changed/muted. I think we expect
that for content here, but not for the shape.

- Sebastian


> > There may be some corner cases, but a lot of the
> > "than why is it allowed" questions are answered with: for history
> > reasons.
>
> OK, that's a good point.
>
> > By the way, on error the `arr.shape = ...` code currently creates
> > the
> > copy temporarily.
>
> That's interesting and it should probably be fixed.
>
> But it is not reason enough for me not to use it.
> I find it important that is doesn't make a copy in the success case,
> I
> don't care very much for the error case.
>
> Would you mind elaborating on the real reasons why I shouldn't use
> it?
>
> cheers,
> Matthias
>
> > - Sebastian
> >
> >
> > > cheers,
> > > Matthias
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > [hidden email]
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > [hidden email]
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
>

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Add guaranteed no-copy to array creation and reshape?

Matthias Geier
Hi Sebastian.

Thanks for the clarification.

On Sun, Dec 30, 2018 at 5:25 PM Sebastian Berg wrote:

> On Sun, 2018-12-30 at 16:03 +0100, Matthias Geier wrote:
> > On Sat, Dec 29, 2018 at 6:00 PM Sebastian Berg wrote:
> > > On Sat, 2018-12-29 at 17:16 +0100, Matthias Geier wrote:
> > > > Hi Sebastian.
> > > >
> > > > I don't have an opinion (yet) about this matter, but I have a
> > > > question:
> > > >
> > > > On Thu, Dec 27, 2018 at 12:30 AM Sebastian Berg wrote:
> > > >
> > > > [...]
> > > >
> > > > > new_arr = arr.reshape(new_shape)
> > > > > assert np.may_share_memory(arr, new_arr)
> > > > >
> > > > > # Which is sometimes -- but should not be -- written as:
> > > > > arr.shape = new_shape  # unnecessary container modification
> > > >
> > > > [...]
> > > >
> > > > Why is this discouraged?
> > > >
> > > > Why do you call this "unnecessary container modification"?
> > > >
> > > > I've used this idiom in the past for exactly those cases where I
> > > > wanted to make sure no copy is made.
> > > >
> > > > And if we are not supposed to assign to arr.shape, why is it
> > > > allowed
> > > > in the first place?
> > >
> > > Well, this may be a matter of taste, but say you have an object
> > > that
> > > stores an array:
> > >
> > > class MyObject:
> > >     def __init__(self):
> > >         self.myarr = some_array
> > >
> > >
> > > Now, lets say I do:
> > >
> > > def some_func(arr):
> > >     # Do something with the array:
> > >     arr.shape = -1
> > >
> > > myobject = MyObject()
> > > some_func(myobject)
> > >
> > > then myobject will suddenly have the wrong shape stored. In most
> > > cases
> > > this is harmless, but I truly believe this is exactly why we have
> > > views
> > > and why they are so awesome.
> > > The content of arrays is mutable, but the array object itself
> > > should
> > > not be muted normally.
> >
> > Thanks for the example! I don't understand its point, though.
> > Also, it's not working since MyObject doesn't have a .shape
> > attribute.
> >
>
> The example should have called `some_func(myobject.arr)`. The thing is
> that if you have more references to the same array around, you change
> all their shapes. And if those other references are there for a reason,
> that is not what you want.
>
> That does not matter much in most cases, but it could change the shape
> of an array in a completely different place then intended. Creating a
> new view is cheap, so I think such things should be avoided.
>
> I admit, most code will effectively do:
> arr = input_arr[...]  # create a new view
> arr.shape = ...
>
> so that there is no danger. But conceptually, I do not think there
> should be a danger of magically changing the shape of a stored array in
> a different part of the code.
>
> Does that make some sense? Maybe shorter example:
>
> arr = np.arange(10)
> arr2 = arr
> arr2.shape = (5, 2)
>
> print(arr.shape)  # also (5, 2)
>
> so the arr container (shape, dtype) is changed/muted. I think we expect
> that for content here, but not for the shape.

Thanks for the clarification, I think I now understand your example.

However, the behavior you are describing is just like the normal
reference semantics of Python itself.

If you have multiple identifiers bound to the same (mutable) object,
you'll always have this "problem".

I think every Python user should be aware of this behavior, but I
don't think it is reason to discourage assigning to arr.shape.

Coming back to the original suggestion of this thread:
Since assigning to arr.shape makes sure no copy of the array data is
made, I don't think it's necessary to add a new no-copy argument to
reshape().

But the bug you mentioned ("on error the `arr.shape = ...` code
currently creates the copy temporarily") should probably be fixed at
some point ...

cheers,
Matthias

>
> - Sebastian
>
>
> > > There may be some corner cases, but a lot of the
> > > "than why is it allowed" questions are answered with: for history
> > > reasons.
> >
> > OK, that's a good point.
> >
> > > By the way, on error the `arr.shape = ...` code currently creates
> > > the
> > > copy temporarily.
> >
> > That's interesting and it should probably be fixed.
> >
> > But it is not reason enough for me not to use it.
> > I find it important that is doesn't make a copy in the success case,
> > I
> > don't care very much for the error case.
> >
> > Would you mind elaborating on the real reasons why I shouldn't use
> > it?
> >
> > cheers,
> > Matthias
> >
> > > - Sebastian
> > >
> > >
> > > > cheers,
> > > > Matthias
> > > > _______________________________________________
> > > > NumPy-Discussion mailing list
> > > > [hidden email]
> > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > >
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > [hidden email]
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > _______________________________________________
> > NumPy-Discussion mailing list
> > [hidden email]
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Add guaranteed no-copy to array creation and reshape?

Sebastian Berg
On Wed, 2019-01-02 at 11:27 +0100, Matthias Geier wrote:
> Hi Sebastian.
>
> Thanks for the clarification.
>
<snip>

> > print(arr.shape)  # also (5, 2)
> >
> > so the arr container (shape, dtype) is changed/muted. I think we
> > expect
> > that for content here, but not for the shape.
>
> Thanks for the clarification, I think I now understand your example.
>
> However, the behavior you are describing is just like the normal
> reference semantics of Python itself.
>
> If you have multiple identifiers bound to the same (mutable) object,
> you'll always have this "problem".
>
> I think every Python user should be aware of this behavior, but I
> don't think it is reason to discourage assigning to arr.shape.
Well, I doubt I will convince you. But want to point out that a numpy
array is:

  * underlying data
  * shape/strides (pointing to the exact data)
  * data type (interpret the data)

Arrays are mutable, but this is only half true from my perspective.
Everyone using numpy should be aware of "views", i.e. that the content
of the underlying data can change.

However, if I have a read-only array, and pass it around, I would not
expect it to change. That is because while the underlying data is
muted, how this data is accessed and interpreted is not.

In other words, I see array objects as having two sides to them [0]:

  * Underlying data   -> normally mutable and often muted
  * container:        -> not muted by almost all code
      * shape/strides
      * data type

I realize that in some cases muting the container metadata happens. But
I do believe it should be as minimal as possible. And frankly, probably
one could do away with it completely.

Another example for where it is bad would be a threaded environment. If
a python function temporarily changes the shape of an array to read
from it without creating a view first, this will break multi-threaded
access to that array.

- Sebastian


[0] I tried to find other examples for such a split. Maybe a
categorical/state object which is allowed change value/state. But the
list of possible states cannot change.


> Coming back to the original suggestion of this thread:
> Since assigning to arr.shape makes sure no copy of the array data is
> made, I don't think it's necessary to add a new no-copy argument to
> reshape().
>
> But the bug you mentioned ("on error the `arr.shape = ...` code
> currently creates the copy temporarily") should probably be fixed at
> some point ...
>
> cheers,
> Matthias
>
> > - Sebastian
> >
> >
> > > > There may be some corner cases, but a lot of the
> > > > "than why is it allowed" questions are answered with: for
> > > > history
> > > > reasons.
> > >
> > > OK, that's a good point.
> > >
> > > > By the way, on error the `arr.shape = ...` code currently
> > > > creates
> > > > the
> > > > copy temporarily.
> > >
> > > That's interesting and it should probably be fixed.
> > >
> > > But it is not reason enough for me not to use it.
> > > I find it important that is doesn't make a copy in the success
> > > case,
> > > I
> > > don't care very much for the error case.
> > >
> > > Would you mind elaborating on the real reasons why I shouldn't
> > > use
> > > it?
> > >
> > > cheers,
> > > Matthias
> > >
> > > > - Sebastian
> > > >
> > > >
> > > > > cheers,
> > > > > Matthias
> > > > > _______________________________________________
> > > > > NumPy-Discussion mailing list
> > > > > [hidden email]
> > > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > > >
> > > > _______________________________________________
> > > > NumPy-Discussion mailing list
> > > > [hidden email]
> > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > [hidden email]
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > [hidden email]
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
>

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Add guaranteed no-copy to array creation and reshape?

Matthias Geier
On Wed, Jan 2, 2019 at 2:24 PM Sebastian Berg wrote:

>
> On Wed, 2019-01-02 at 11:27 +0100, Matthias Geier wrote:
> > Hi Sebastian.
> >
> > Thanks for the clarification.
> >
> <snip>
> > > print(arr.shape)  # also (5, 2)
> > >
> > > so the arr container (shape, dtype) is changed/muted. I think we
> > > expect
> > > that for content here, but not for the shape.
> >
> > Thanks for the clarification, I think I now understand your example.
> >
> > However, the behavior you are describing is just like the normal
> > reference semantics of Python itself.
> >
> > If you have multiple identifiers bound to the same (mutable) object,
> > you'll always have this "problem".
> >
> > I think every Python user should be aware of this behavior, but I
> > don't think it is reason to discourage assigning to arr.shape.
>
> Well, I doubt I will convince you.

I think we actually have quite little disagreement.

I agree with you on what should be done *most of the time*, but I
wouldn't totally discourage mutating NumPy array shapes, because I
think in the right circumstances it can be very useful.

> But want to point out that a numpy
> array is:
>
>   * underlying data
>   * shape/strides (pointing to the exact data)
>   * data type (interpret the data)
>
> Arrays are mutable, but this is only half true from my perspective.
> Everyone using numpy should be aware of "views", i.e. that the content
> of the underlying data can change.

I agree, everyone should be aware of that.

> However, if I have a read-only array, and pass it around, I would not
> expect it to change. That is because while the underlying data is
> muted, how this data is accessed and interpreted is not.
>
> In other words, I see array objects as having two sides to them [0]:
>
>   * Underlying data   -> normally mutable and often muted
>   * container:        -> not muted by almost all code
>       * shape/strides
>       * data type

Exactly: "almost all code".

Most of the time I would not assign to arr.shape, but in some rare
occasions I find it very useful.

And one of those rare occasions is when you want guaranteed no-copy behavior.

There are also some (most likely significantly rarer) cases where I
would modify arr.strides.

> I realize that in some cases muting the container metadata happens. But
> I do believe it should be as minimal as possible. And frankly, probably
> one could do away with it completely.

I guess that's the only point where we disagree.

I wouldn't completely discourage it and I would definitely not remove
the functionality.

> Another example for where it is bad would be a threaded environment. If
> a python function temporarily changes the shape of an array to read
> from it without creating a view first, this will break multi-threaded
> access to that array.

Sure, let's not use it while multi-threading then.

I still think that's not at all a reason to remove the feature.

There are some things that are problematic when multi-threading, but
that's typically not reason enough to completely disallow them.

cheers,
Matthias

>
> - Sebastian
>
>
> [0] I tried to find other examples for such a split. Maybe a
> categorical/state object which is allowed change value/state. But the
> list of possible states cannot change.
>
>
> > Coming back to the original suggestion of this thread:
> > Since assigning to arr.shape makes sure no copy of the array data is
> > made, I don't think it's necessary to add a new no-copy argument to
> > reshape().
> >
> > But the bug you mentioned ("on error the `arr.shape = ...` code
> > currently creates the copy temporarily") should probably be fixed at
> > some point ...
> >
> > cheers,
> > Matthias
> >
> > > - Sebastian
> > >
> > >
> > > > > There may be some corner cases, but a lot of the
> > > > > "than why is it allowed" questions are answered with: for
> > > > > history
> > > > > reasons.
> > > >
> > > > OK, that's a good point.
> > > >
> > > > > By the way, on error the `arr.shape = ...` code currently
> > > > > creates
> > > > > the
> > > > > copy temporarily.
> > > >
> > > > That's interesting and it should probably be fixed.
> > > >
> > > > But it is not reason enough for me not to use it.
> > > > I find it important that is doesn't make a copy in the success
> > > > case,
> > > > I
> > > > don't care very much for the error case.
> > > >
> > > > Would you mind elaborating on the real reasons why I shouldn't
> > > > use
> > > > it?
> > > >
> > > > cheers,
> > > > Matthias
> > > >
> > > > > - Sebastian
> > > > >
> > > > >
> > > > > > cheers,
> > > > > > Matthias
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Add guaranteed no-copy to array creation and reshape?

Feng Yu
In reply to this post by Sebastian Berg
Hi,

Was it ever brought up the possibility of a new array class (ndrefonly, ndview) that is strictly no copy?

All operations on ndrefonly will return ndrefonly and if the operation cannot be completed without making a copy, it shall throw an error.

On the implementation there are two choices if we use subclasses:

- ndrefonly can be a subclass of ndarray. The pattern would be subclass limiting functionality of super, but ndrefonly is a ndarray.
- ndarray as a subclass of ndarray. Subclass supplements functionality of super. : ndarray will not throw an error when a copy is necessary. However ndarray is not a ndarray.

If we want to be wild they do not even need to be subclasses of each other, or maybe they shall both be subclasses of something more fundamental.

- Yu


On Fri, Dec 28, 2018 at 5:45 AM Sebastian Berg <[hidden email]> wrote:
On Thu, 2018-12-27 at 17:45 -0800, Nathaniel Smith wrote:
> On Thu, Dec 27, 2018 at 3:27 PM Juan Nunez-Iglesias <
> [hidden email]> wrote:
> > On Fri, Dec 28, 2018, at 3:40 AM, Sebastian Berg wrote:
> >
> > > It’s consistent with np.newaxis in spelling (modulo the _)
> > > If mistyped, it can be caught easily by IDEs.
> > > It’s typeable with mypy, unlike constant string literals which
> > > currently aren’t
> > > If code written against new numpy is run on old numpy, it will
> > > error
> > > rather than doing the wrong thing
> >
> > I am not sure I think that the above are too strong arguments,
> > since it
> > is not what is typically done for keywords (nobody suggests np.CLIP
> > instead of "clip"). Unless you feel this is different because it is
> > a
> > mix of strings and bools here?
> >
> >
> > :+1: to Eric's list. I don't think it's different because of the
> > mix, I think it's different because deprecating things is painful.
> > But now that we have good typing in Python, I think of string
> > literals as an anti-pattern going forward.
>
> I've certainly seen people argue that it's better to use proper
> enum's
> in this kind of case instead of strings. The 'enum' package is even
> included in the stdlib on all our supported versions now:
> https://docs.python.org/3/library/enum.html
>

I am sympathetic with that, but it is something we (or scipy, etc.)
currently simply do not use, so I am not sure that I think it has much
validity at this time. That is least unless we agree to aim to use this
more generally in the future.

> I guess another possibility to throw out there would be a second
> kwarg, require_view=False/True.
>

My first gut feeling was that it is clumsy at least for `reshape`, but
one will always only use one of the two arguments at a time.
The more I look at it, the better the suggestion seems. Plus it reduces
the possible `copy=False` not meaning "never" confusion.

I think with a bit more pondering, that will become my favorite
solution.

- Sebastian


> -n
>
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Add guaranteed no-copy to array creation and reshape?

Eric Wieser
In reply to this post by Matthias Geier

@Matthias:

Most of the time I would not assign to arr.shape, but in some rare occasions I find it very useful.

And one of those rare occasions is when you want guaranteed no-copy behavior.

Can you come up with any other example?
The only real argument you seem to have here is “my code uses arr.shape = ...“ and I don’t want it to break. That’s a fair argument, but all it really means is we should start emitting DeprecationWarning("Use arr = arr.reshape(..., copy=np.never_copy) instead of arr.shape = ..."), and consider having a long deprecation.
If necessary we could compromise on just putting a warning in the docs, and not notifying the user at all.

@Ralf

np.newaxis is not relevant here - it’s a simple alias for None, is just there for code readability, and is much more widely applicable than np.never_copy would be.

Is there any particular reason we chose to use None? If I were designing it again, I’d consider a singleton object with a better __repr__

@Nathaniel

I guess another possibility to throw out there would be a second kwarg, require_view=False/True.

The downside of this approach is that array-likes will definitely need updating to support this new behavior, whereas many may work out of the box if we extend the copy argument (like, say, maskedarray). This also ties into the __bool__ override - that will ensure that subclasses which don’t have a trivial reshape crash.

@Sebastian:

Unless we replace the string when dispatching, which seems strange on first sight.

I’m envisaging cases where we don’t have a dispatcher at all:

  • Duck arrays implementing methods matching ndarray
  • Something like my_custom_function(arr, copy=...) that forwards its copy argument to reshape

Eric


On Mon, 7 Jan 2019 at 11:05 Matthias Geier <[hidden email]> wrote:
On Wed, Jan 2, 2019 at 2:24 PM Sebastian Berg wrote:
>
> On Wed, 2019-01-02 at 11:27 +0100, Matthias Geier wrote:
> > Hi Sebastian.
> >
> > Thanks for the clarification.
> >
> <snip>
> > > print(arr.shape)  # also (5, 2)
> > >
> > > so the arr container (shape, dtype) is changed/muted. I think we
> > > expect
> > > that for content here, but not for the shape.
> >
> > Thanks for the clarification, I think I now understand your example.
> >
> > However, the behavior you are describing is just like the normal
> > reference semantics of Python itself.
> >
> > If you have multiple identifiers bound to the same (mutable) object,
> > you'll always have this "problem".
> >
> > I think every Python user should be aware of this behavior, but I
> > don't think it is reason to discourage assigning to arr.shape.
>
> Well, I doubt I will convince you.

I think we actually have quite little disagreement.

I agree with you on what should be done *most of the time*, but I
wouldn't totally discourage mutating NumPy array shapes, because I
think in the right circumstances it can be very useful.

> But want to point out that a numpy
> array is:
>
>   * underlying data
>   * shape/strides (pointing to the exact data)
>   * data type (interpret the data)
>
> Arrays are mutable, but this is only half true from my perspective.
> Everyone using numpy should be aware of "views", i.e. that the content
> of the underlying data can change.

I agree, everyone should be aware of that.

> However, if I have a read-only array, and pass it around, I would not
> expect it to change. That is because while the underlying data is
> muted, how this data is accessed and interpreted is not.
>
> In other words, I see array objects as having two sides to them [0]:
>
>   * Underlying data   -> normally mutable and often muted
>   * container:        -> not muted by almost all code
>       * shape/strides
>       * data type

Exactly: "almost all code".

Most of the time I would not assign to arr.shape, but in some rare
occasions I find it very useful.

And one of those rare occasions is when you want guaranteed no-copy behavior.

There are also some (most likely significantly rarer) cases where I
would modify arr.strides.

> I realize that in some cases muting the container metadata happens. But
> I do believe it should be as minimal as possible. And frankly, probably
> one could do away with it completely.

I guess that's the only point where we disagree.

I wouldn't completely discourage it and I would definitely not remove
the functionality.

> Another example for where it is bad would be a threaded environment. If
> a python function temporarily changes the shape of an array to read
> from it without creating a view first, this will break multi-threaded
> access to that array.

Sure, let's not use it while multi-threading then.

I still think that's not at all a reason to remove the feature.

There are some things that are problematic when multi-threading, but
that's typically not reason enough to completely disallow them.

cheers,
Matthias

>
> - Sebastian
>
>
> [0] I tried to find other examples for such a split. Maybe a
> categorical/state object which is allowed change value/state. But the
> list of possible states cannot change.
>
>
> > Coming back to the original suggestion of this thread:
> > Since assigning to arr.shape makes sure no copy of the array data is
> > made, I don't think it's necessary to add a new no-copy argument to
> > reshape().
> >
> > But the bug you mentioned ("on error the `arr.shape = ...` code
> > currently creates the copy temporarily") should probably be fixed at
> > some point ...
> >
> > cheers,
> > Matthias
> >
> > > - Sebastian
> > >
> > >
> > > > > There may be some corner cases, but a lot of the
> > > > > "than why is it allowed" questions are answered with: for
> > > > > history
> > > > > reasons.
> > > >
> > > > OK, that's a good point.
> > > >
> > > > > By the way, on error the `arr.shape = ...` code currently
> > > > > creates
> > > > > the
> > > > > copy temporarily.
> > > >
> > > > That's interesting and it should probably be fixed.
> > > >
> > > > But it is not reason enough for me not to use it.
> > > > I find it important that is doesn't make a copy in the success
> > > > case,
> > > > I
> > > > don't care very much for the error case.
> > > >
> > > > Would you mind elaborating on the real reasons why I shouldn't
> > > > use
> > > > it?
> > > >
> > > > cheers,
> > > > Matthias
> > > >
> > > > > - Sebastian
> > > > >
> > > > >
> > > > > > cheers,
> > > > > > Matthias
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Add guaranteed no-copy to array creation and reshape?

Sebastian Berg
In reply to this post by Matthias Geier
On Mon, 2019-01-07 at 20:04 +0100, Matthias Geier wrote:

> On Wed, Jan 2, 2019 at 2:24 PM Sebastian Berg wrote:
> > On Wed, 2019-01-02 at 11:27 +0100, Matthias Geier wrote:
> > > Hi Sebastian.
> > >
> > > Thanks for the clarification.
> > >
> > <snip>
> > > > print(arr.shape)  # also (5, 2)
> > > >
> > > > so the arr container (shape, dtype) is changed/muted. I think
> > > > we
> > > > expect
> > > > that for content here, but not for the shape.
> > >
> > > Thanks for the clarification, I think I now understand your
> > > example.
> > >
> > > However, the behavior you are describing is just like the normal
> > > reference semantics of Python itself.
> > >
> > > If you have multiple identifiers bound to the same (mutable)
> > > object,
> > > you'll always have this "problem".
> > >
> > > I think every Python user should be aware of this behavior, but I
> > > don't think it is reason to discourage assigning to arr.shape.
> >
> > Well, I doubt I will convince you.
>
> I think we actually have quite little disagreement.
>

I am very sorry, that was very badly phrased. What I meant is just that
saying that arrays simply are mutable objects does not seem all that
wrong to me. However, I would very much prefer to move towards changing
it for the container part.

> I agree with you on what should be done *most of the time*, but I
> wouldn't totally discourage mutating NumPy array shapes, because I
> think in the right circumstances it can be very useful.
>
> > But want to point out that a numpy
> > array is:
> >
> >   * underlying data
> >   * shape/strides (pointing to the exact data)
> >   * data type (interpret the data)
> >
> > Arrays are mutable, but this is only half true from my perspective.
<snip>

> > In other words, I see array objects as having two sides to them
> > [0]:
> >
> >   * Underlying data   -> normally mutable and often muted
> >   * container:        -> not muted by almost all code
> >       * shape/strides
> >       * data type
>
> Exactly: "almost all code".
>
> Most of the time I would not assign to arr.shape, but in some rare
> occasions I find it very useful.
>
> And one of those rare occasions is when you want guaranteed no-copy
> behavior.

`arr.shape = ...` is somewhat easier on the eye when compared to
`arr = arr.reshape(..., ensure_view=True)` (or whatever we would end up
doing).

Other than that, do you have a technical reason for it (aside from
ensuring that no copy will occur)?

The thing is that I have seen the suggestion to use `arr.shape` as a
"best practices". And I really disagree that it is very good practice.

> There are also some (most likely significantly rarer) cases where I
> would modify arr.strides.
>

The same again, there should be no reason why you should have to do
this. In fact, this will cause hard crashes for object arrays, so even
technically this is not a "harmless" convenience, it is a bug to allow
for object arrays – which own their data – at all:

arr = np.arange(100000, dtype=object)                  
arr.strides = (0,)      
del arr
free(): invalid pointer
zsh: abort (core dumped)

Note that using the alternatives is completely fine here (as long as
you take care all point to valid objects).


> > I realize that in some cases muting the container metadata happens.
> > But
> > I do believe it should be as minimal as possible. And frankly,
> > probably
> > one could do away with it completely.
>
> I guess that's the only point where we disagree.
>
> I wouldn't completely discourage it and I would definitely not remove
> the functionality.
>
> > Another example for where it is bad would be a threaded
> > environment. If
> > a python function temporarily changes the shape of an array to read
> > from it without creating a view first, this will break multi-
> > threaded
> > access to that array.
>
> Sure, let's not use it while multi-threading then.
>
> I still think that's not at all a reason to remove the feature.
>
While I wouldn't mind mind moving to deprecate it fully, that is not my
intention right now. I would discourage it in the documentation with a
pointer to a safe alternative. Maybe at some point we realize that
nobody really has any good reason to keep using it, then we may move
ahead slowly.

- Sebastian


> There are some things that are problematic when multi-threading, but
> that's typically not reason enough to completely disallow them.
>
> cheers,
> Matthias
>
> > - Sebastian
> >
> >
> > [0] I tried to find other examples for such a split. Maybe a
> > categorical/state object which is allowed change value/state. But
> > the
> > list of possible states cannot change.
> >
> >
> > > Coming back to the original suggestion of this thread:
> > > Since assigning to arr.shape makes sure no copy of the array data
> > > is
> > > made, I don't think it's necessary to add a new no-copy argument
> > > to
> > > reshape().
> > >
> > > But the bug you mentioned ("on error the `arr.shape = ...` code
> > > currently creates the copy temporarily") should probably be fixed
> > > at
> > > some point ...
> > >
> > > cheers,
> > > Matthias
> > >
> > > > - Sebastian
> > > >
> > > >
> > > > > > There may be some corner cases, but a lot of the
> > > > > > "than why is it allowed" questions are answered with: for
> > > > > > history
> > > > > > reasons.
> > > > >
> > > > > OK, that's a good point.
> > > > >
> > > > > > By the way, on error the `arr.shape = ...` code currently
> > > > > > creates
> > > > > > the
> > > > > > copy temporarily.
> > > > >
> > > > > That's interesting and it should probably be fixed.
> > > > >
> > > > > But it is not reason enough for me not to use it.
> > > > > I find it important that is doesn't make a copy in the
> > > > > success
> > > > > case,
> > > > > I
> > > > > don't care very much for the error case.
> > > > >
> > > > > Would you mind elaborating on the real reasons why I
> > > > > shouldn't
> > > > > use
> > > > > it?
> > > > >
> > > > > cheers,
> > > > > Matthias
> > > > >
> > > > > > - Sebastian
> > > > > >
> > > > > >
> > > > > > > cheers,
> > > > > > > Matthias
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
>

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (849 bytes) Download Attachment
12