Add guaranteed no-copy to array creation and reshape?

classic Classic list List threaded Threaded
33 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: Add guaranteed no-copy to array creation and reshape?

ralfgommers


On Mon, Jan 7, 2019 at 11:30 AM Eric Wieser <[hidden email]> wrote:

@Ralf

np.newaxis is not relevant here - it’s a simple alias for None, is just there for code readability, and is much more widely applicable than np.never_copy would be.

Is there any particular reason we chose to use None? If I were designing it again, I’d consider a singleton object with a better __repr__

It stems from Numeric: https://mail.python.org/pipermail/python-list/2009-September/552203.html. Note that the Python builtin slice also uses None, but that's probably due to Numeric using it first.

Agree that a singleton with a nice repr could be a better choice than None. The more important part of my comment was "widely applicable" though. Slicing is a lot more important than some keyword. And design-wise, filling the numpy namespace with singletons for keyword to other things in that same namespace just makes no sense to me.

Cheers,
Ralf




_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Add guaranteed no-copy to array creation and reshape?

Eric Wieser

Slicing is a lot more important than some keyword. And design-wise, filling the numpy namespace with singletons for keyword to other things in that same namespace just makes no sense to me.

At least from the perspective of discoverability, you could argue that string constants form a namespace of their won, and so growing the “string” namespace is not inherently better than growing any other. The main flaw in that comparison is that picking np.never_copy to be a singleton forever prevents us reusing that name to become a function.

Perhaps the solution is to use np.NEVER_COPY instead - that’s never going to clash with a function name we want to add in future, and using upper attributes as arguments in that way is pretty typical for python (subprocess.PIPE, socket.SOCK_STREAM, etc…)

You could fairly argue that this approach is outdated in the face of enum.Enum - in which case we could go for the more heavy-handed np.CopyMode.NEVER, which still has a unique enough case for name clashes with functions never to be an issue.

Eric


On Wed, 9 Jan 2019 at 22:25 Ralf Gommers <[hidden email]> wrote:
On Mon, Jan 7, 2019 at 11:30 AM Eric Wieser <[hidden email]> wrote:

@Ralf

np.newaxis is not relevant here - it’s a simple alias for None, is just there for code readability, and is much more widely applicable than np.never_copy would be.

Is there any particular reason we chose to use None? If I were designing it again, I’d consider a singleton object with a better __repr__

It stems from Numeric: https://mail.python.org/pipermail/python-list/2009-September/552203.html. Note that the Python builtin slice also uses None, but that's probably due to Numeric using it first.

Agree that a singleton with a nice repr could be a better choice than None. The more important part of my comment was "widely applicable" though. Slicing is a lot more important than some keyword. And design-wise, filling the numpy namespace with singletons for keyword to other things in that same namespace just makes no sense to me.

Cheers,
Ralf



_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Add guaranteed no-copy to array creation and reshape?

Todd
In reply to this post by Feng Yu
On Mon, Jan 7, 2019, 14:22 Feng Yu <[hidden email] wrote:
Hi,

Was it ever brought up the possibility of a new array class (ndrefonly, ndview) that is strictly no copy?

All operations on ndrefonly will return ndrefonly and if the operation cannot be completed without making a copy, it shall throw an error.

On the implementation there are two choices if we use subclasses:

- ndrefonly can be a subclass of ndarray. The pattern would be subclass limiting functionality of super, but ndrefonly is a ndarray.
- ndarray as a subclass of ndarray. Subclass supplements functionality of super. : ndarray will not throw an error when a copy is necessary. However ndarray is not a ndarray.

If we want to be wild they do not even need to be subclasses of each other, or maybe they shall both be subclasses of something more fundamental.

- Yu

I would prefer a flag for this.  Someone can make an array read-only by setting `arr.flags.writable=False`.  So along those lines, we could have a `arr.flags.copyable` flag that if set to `False` would result in an error of any operation tried to copy the data.

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Add guaranteed no-copy to array creation and reshape?

Todd
In reply to this post by Sebastian Berg
On Wed, Dec 26, 2018, 18:29 Sebastian Berg <[hidden email] wrote:
Hi all,

In https://github.com/numpy/numpy/pull/11897 I am looking into the
addition of a `copy=np.never_copy` argument to:
  * np.array
  * arr.reshape/np.reshape
  * arr.astype

Which would cause an error to be raised when numpy cannot guarantee
that the returned array is a view of the input array.
The motivation is to easier avoid accidental copies of large data, or
ensure that in-place manipulation will be meaningful.

The copy flag API would be:
  * `copy=True` forces a copy
  * `copy=False` allows numpy to copy if necessary
  * `copy=np.never_copy` will error if a copy would be necessary
  * (almost) all other input will be deprecated.

Unfortunately using `copy="never"` is tricky, because currently
`np.array(..., copy="never")` behaves exactly the same as
`np.array(..., copy=bool("never"))`. So that the wrong result would be
given on old numpy versions and it would be a behaviour change.

Some things that are a not so nice maybe:
 * adding/using `np.never_copy` is not very nice
 * Scalars need a copy and so will not be allowed
 * For rare array-likes numpy may not be able to guarantee no-copy,
   although it could happen (but should not).


The history is that a long while ago I considered adding a copy flag to
`reshape` so that it is possible to do `copy=np.never_copy` (or
similar) to ensure that no copy is made. In these, you may want
something like an assertion:

```
new_arr = arr.reshape(new_shape)
assert np.may_share_memory(arr, new_arr)

# Which is sometimes -- but should not be -- written as:
arr.shape = new_shape  # unnecessary container modification

# Or:
view = np.array(arr, order="F")
assert np.may_share_memory(arr, new_arr)
```

but is more readable and will not cause an intermediate copy on error.


So what do you think? Other variants would be to not expose this for
`np.array` and probably limit `copy="never"` to the reshape method. Or
just to not do it at all. Or to also accept "never" for `reshape`,
although I think I would prefer to keep it in sync and wait for a few
years to consider that.

Best,

Sebastian

Could this approach be used to deprecate `ravel` and let us just use `flatten`?

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Add guaranteed no-copy to array creation and reshape?

Todd
In reply to this post by Eric Wieser
On Thu, Jan 10, 2019, 01:55 Eric Wieser <[hidden email] wrote:

Slicing is a lot more important than some keyword. And design-wise, filling the numpy namespace with singletons for keyword to other things in that same namespace just makes no sense to me.

At least from the perspective of discoverability, you could argue that string constants form a namespace of their won, and so growing the “string” namespace is not inherently better than growing any other. The main flaw in that comparison is that picking np.never_copy to be a singleton forever prevents us reusing that name to become a function.

Perhaps the solution is to use np.NEVER_COPY instead - that’s never going to clash with a function name we want to add in future, and using upper attributes as arguments in that way is pretty typical for python (subprocess.PIPE, socket.SOCK_STREAM, etc…)


What about a namespace as someone mentioned earlier.  Perhaps `np.flags.NO_COPY` 

You could fairly argue that this approach is outdated in the face of enum.Enum - in which case we could go for the more heavy-handed np.CopyMode.NEVER, which still has a unique enough case for name clashes with functions never to be an issue.

Eric


Would all three conditions be supported this way or only `NEVER`?

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Add guaranteed no-copy to array creation and reshape?

Feng Yu
In reply to this post by Todd
Hi Todd,

I agree a flag is more suitable than classes.

I would add another bonus of a flag than a function argument is to avoid massive contamination of function signatures for a global variation of behavior that affects many functions. 

Yu

On Wed, Jan 9, 2019 at 11:34 PM Todd <[hidden email]> wrote:
On Mon, Jan 7, 2019, 14:22 Feng Yu <[hidden email] wrote:
Hi,

Was it ever brought up the possibility of a new array class (ndrefonly, ndview) that is strictly no copy?

All operations on ndrefonly will return ndrefonly and if the operation cannot be completed without making a copy, it shall throw an error.

On the implementation there are two choices if we use subclasses:

- ndrefonly can be a subclass of ndarray. The pattern would be subclass limiting functionality of super, but ndrefonly is a ndarray.
- ndarray as a subclass of ndarray. Subclass supplements functionality of super. : ndarray will not throw an error when a copy is necessary. However ndarray is not a ndarray.

If we want to be wild they do not even need to be subclasses of each other, or maybe they shall both be subclasses of something more fundamental.

- Yu

I would prefer a flag for this.  Someone can make an array read-only by setting `arr.flags.writable=False`.  So along those lines, we could have a `arr.flags.copyable` flag that if set to `False` would result in an error of any operation tried to copy the data.
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Add guaranteed no-copy to array creation and reshape?

Neal Becker
constants are easier to support for autocompletion than strings.  My current env (emacs) will (usually) autocomplete the former, but not the latter.

On Thu, Jan 10, 2019 at 2:21 PM Feng Yu <[hidden email]> wrote:
Hi Todd,

I agree a flag is more suitable than classes.

I would add another bonus of a flag than a function argument is to avoid massive contamination of function signatures for a global variation of behavior that affects many functions. 

Yu

On Wed, Jan 9, 2019 at 11:34 PM Todd <[hidden email]> wrote:
On Mon, Jan 7, 2019, 14:22 Feng Yu <[hidden email] wrote:
Hi,

Was it ever brought up the possibility of a new array class (ndrefonly, ndview) that is strictly no copy?

All operations on ndrefonly will return ndrefonly and if the operation cannot be completed without making a copy, it shall throw an error.

On the implementation there are two choices if we use subclasses:

- ndrefonly can be a subclass of ndarray. The pattern would be subclass limiting functionality of super, but ndrefonly is a ndarray.
- ndarray as a subclass of ndarray. Subclass supplements functionality of super. : ndarray will not throw an error when a copy is necessary. However ndarray is not a ndarray.

If we want to be wild they do not even need to be subclasses of each other, or maybe they shall both be subclasses of something more fundamental.

- Yu

I would prefer a flag for this.  Someone can make an array read-only by setting `arr.flags.writable=False`.  So along those lines, we could have a `arr.flags.copyable` flag that if set to `False` would result in an error of any operation tried to copy the data.
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Add guaranteed no-copy to array creation and reshape?

Juan Nunez-Iglesias
In reply to this post by Todd


On 10 Jan 2019, at 6:35 pm, Todd <[hidden email]> wrote:

Could this approach be used to deprecate `ravel` and let us just use `flatten`?


Could we not? `.ravel()` is everywhere and it matches `ravel_multi_index` and `unravel_index`.
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Add guaranteed no-copy to array creation and reshape?

ralfgommers
In reply to this post by Eric Wieser


On Wed, Jan 9, 2019 at 10:55 PM Eric Wieser <[hidden email]> wrote:

Slicing is a lot more important than some keyword. And design-wise, filling the numpy namespace with singletons for keyword to other things in that same namespace just makes no sense to me.

At least from the perspective of discoverability, you could argue that string constants form a namespace of their won, and so growing the “string” namespace is not inherently better than growing any other.

I don't really think those are valid arguments. Strings are not a namespace, and if you want to make the analogy then it's at best a namespace within the functions/methods that the strings apply to. Discoverability: what is valid input for a keyword is discovered pretty much exclusively by reading the docstring or html doc page of the function, and not the docstring of some random object in the numpy namespace.

Ralf

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Add guaranteed no-copy to array creation and reshape?

ralfgommers
In reply to this post by Feng Yu


On Thu, Jan 10, 2019 at 11:21 AM Feng Yu <[hidden email]> wrote:
Hi Todd,

I agree a flag is more suitable than classes.

I would add another bonus of a flag than a function argument is to avoid massive contamination of function signatures for a global variation of behavior that affects many functions. 

I like this suggestion. Copy behavior fits very nicely with existing flags (e.g. UPDATEIFCOPY, WRITEABLE) and avoids both namespace pollution and complication docstrings.

Ralf


Yu

On Wed, Jan 9, 2019 at 11:34 PM Todd <[hidden email]> wrote:
On Mon, Jan 7, 2019, 14:22 Feng Yu <[hidden email] wrote:
Hi,

Was it ever brought up the possibility of a new array class (ndrefonly, ndview) that is strictly no copy?

All operations on ndrefonly will return ndrefonly and if the operation cannot be completed without making a copy, it shall throw an error.

On the implementation there are two choices if we use subclasses:

- ndrefonly can be a subclass of ndarray. The pattern would be subclass limiting functionality of super, but ndrefonly is a ndarray.
- ndarray as a subclass of ndarray. Subclass supplements functionality of super. : ndarray will not throw an error when a copy is necessary. However ndarray is not a ndarray.

If we want to be wild they do not even need to be subclasses of each other, or maybe they shall both be subclasses of something more fundamental.

- Yu

I would prefer a flag for this.  Someone can make an array read-only by setting `arr.flags.writable=False`.  So along those lines, we could have a `arr.flags.copyable` flag that if set to `False` would result in an error of any operation tried to copy the data.
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Add guaranteed no-copy to array creation and reshape?

Sebastian Berg
In reply to this post by Juan Nunez-Iglesias
On Fri, 2019-01-11 at 13:57 +1100, Juan Nunez-Iglesias wrote:

>
>
> > On 10 Jan 2019, at 6:35 pm, Todd <[hidden email]> wrote:
> >
> > Could this approach be used to deprecate `ravel` and let us just
> > use `flatten`?
>
>
> Could we not? `.ravel()` is everywhere and it matches
> `ravel_multi_index` and `unravel_index`.
True, I suppose either of those functions could get such an argument as
well. Just side note `arr.ravel()` is also not quite equivalent to
`arr.reshape(-1)`, since it actually provides some additional
assurances on the result contiguity.

- Sebastian


> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Add guaranteed no-copy to array creation and reshape?

Eric Wieser
In reply to this post by ralfgommers

I don’t think a NEVERCOPY entry in arr.flags would make much sense.
Is this really a sensible limitation to put on how data gets used? Isn’t it up to the algorithm to decide whether to copy its data, not the original owner of the data?

It also leads to some tricky questions of precedence - would np.array(arr, copy=True) respect the flag or the argument? How about np.array(arr)? Is arr + 0 considered a copy?
By keeping it as a value passed in via a copy= kwarg, we don’t need to answer any of those questions.

Eric

On Thu, 10 Jan 2019 at 20:28 Ralf Gommers ralf.gommers@... wrote:

On Thu, Jan 10, 2019 at 11:21 AM Feng Yu <[hidden email]> wrote:
Hi Todd,

I agree a flag is more suitable than classes.

I would add another bonus of a flag than a function argument is to avoid massive contamination of function signatures for a global variation of behavior that affects many functions. 

I like this suggestion. Copy behavior fits very nicely with existing flags (e.g. UPDATEIFCOPY, WRITEABLE) and avoids both namespace pollution and complication docstrings.

Ralf


Yu

On Wed, Jan 9, 2019 at 11:34 PM Todd <[hidden email]> wrote:
On Mon, Jan 7, 2019, 14:22 Feng Yu <[hidden email] wrote:
Hi,

Was it ever brought up the possibility of a new array class (ndrefonly, ndview) that is strictly no copy?

All operations on ndrefonly will return ndrefonly and if the operation cannot be completed without making a copy, it shall throw an error.

On the implementation there are two choices if we use subclasses:

- ndrefonly can be a subclass of ndarray. The pattern would be subclass limiting functionality of super, but ndrefonly is a ndarray.
- ndarray as a subclass of ndarray. Subclass supplements functionality of super. : ndarray will not throw an error when a copy is necessary. However ndarray is not a ndarray.

If we want to be wild they do not even need to be subclasses of each other, or maybe they shall both be subclasses of something more fundamental.

- Yu

I would prefer a flag for this.  Someone can make an array read-only by setting `arr.flags.writable=False`.  So along those lines, we could have a `arr.flags.copyable` flag that if set to `False` would result in an error of any operation tried to copy the data.
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Add guaranteed no-copy to array creation and reshape?

Feng Yu
Eric,

I agree these are questions that shall be answered. I think issues you raised are toward functions always make a copy -- combining them with NEVERCOPY does sound sane.

Your argument is that If the atom of the new behavior is per method, then there is no need to worry about those functions that does not sound sane when combined with NEVERCOPY.

Here is a proposal that may address your concerns:

1. The name of the flag does not need to be NEVERCOPY -- especially it sounds insane combined with many methods.
    I think something like KEEPBASE may be easier to reason.
   The name KEEPBASE alwsy suggests it shall not be set on an object where base is None; this feature may simplify the matter.

2. Methods that creates copy shall always create a copy, regardless of a flag -- that's the definition of copy. (np.copy)
    KEEPBASE shall only affect methods that creates a new object object, which may reference the provided base or create a new base.

3. Functions that are not a ndarray method, shall ignore the flag, unless specified otherwise. (np.array)

4. arr + 0 when arr is KEEPBASE. It depends on if we think this triggered np.add(), or ndarray.__add__. I think ndarray.__add__ is a shortcut to call the ufunc np.add(). Thus the result of arr + 0 shall be the behavior on the ufunc np.add, which shall ignore the flag.

Notice that in memory tight situations, users can already use inplace operations to avoid copies. This verbose but highly controlled syntax may be preferable than inferring an automated behavior that relies on KEEPBASE of arguments.

5. I think on the operation level, the difference between subclassing(flags) and function arguments would be:
   
array.view(keepbase=True).reshape(-1) vs array.reshape(-1, keepbase=True)

The gain is that no array method needs to be modified. The control can still be at the level of the algorithm.



On Sat, Jan 12, 2019 at 11:22 PM Eric Wieser <[hidden email]> wrote:

I don’t think a NEVERCOPY entry in arr.flags would make much sense.
Is this really a sensible limitation to put on how data gets used? Isn’t it up to the algorithm to decide whether to copy its data, not the original owner of the data?

It also leads to some tricky questions of precedence - would np.array(arr, copy=True) respect the flag or the argument? How about np.array(arr)? Is arr + 0 considered a copy?
By keeping it as a value passed in via a copy= kwarg, we don’t need to answer any of those questions.

Eric

On Thu, 10 Jan 2019 at 20:28 Ralf Gommers ralf.gommers@... wrote:

On Thu, Jan 10, 2019 at 11:21 AM Feng Yu <[hidden email]> wrote:
Hi Todd,

I agree a flag is more suitable than classes.

I would add another bonus of a flag than a function argument is to avoid massive contamination of function signatures for a global variation of behavior that affects many functions. 

I like this suggestion. Copy behavior fits very nicely with existing flags (e.g. UPDATEIFCOPY, WRITEABLE) and avoids both namespace pollution and complication docstrings.

Ralf


Yu

On Wed, Jan 9, 2019 at 11:34 PM Todd <[hidden email]> wrote:
On Mon, Jan 7, 2019, 14:22 Feng Yu <[hidden email] wrote:
Hi,

Was it ever brought up the possibility of a new array class (ndrefonly, ndview) that is strictly no copy?

All operations on ndrefonly will return ndrefonly and if the operation cannot be completed without making a copy, it shall throw an error.

On the implementation there are two choices if we use subclasses:

- ndrefonly can be a subclass of ndarray. The pattern would be subclass limiting functionality of super, but ndrefonly is a ndarray.
- ndarray as a subclass of ndarray. Subclass supplements functionality of super. : ndarray will not throw an error when a copy is necessary. However ndarray is not a ndarray.

If we want to be wild they do not even need to be subclasses of each other, or maybe they shall both be subclasses of something more fundamental.

- Yu

I would prefer a flag for this.  Someone can make an array read-only by setting `arr.flags.writable=False`.  So along those lines, we could have a `arr.flags.copyable` flag that if set to `False` would result in an error of any operation tried to copy the data.
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
12