Request for enhancement to numpy.random.shuffle

classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Request for enhancement to numpy.random.shuffle

Warren Weckesser-2
I created an issue on github for an enhancement
to numpy.random.shuffle:
    https://github.com/numpy/numpy/issues/5173
I'd like to get some feedback on the idea.

Currently, `shuffle` shuffles the first dimension of an array
in-place.  For example, shuffling a 2D array shuffles the rows:

In [227]: a
Out[227]:
array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

In [228]: np.random.shuffle(a)

In [229]: a
Out[229]:
array([[ 0,  1,  2],
       [ 9, 10, 11],
       [ 3,  4,  5],
       [ 6,  7,  8]])



To add an axis keyword, we could (in effect) apply `shuffle` to
`a.swapaxes(axis, 0)`.  For a 2-D array, `axis=1` would shuffles
the columns:

In [232]: a = np.arange(15).reshape(3,5)

In [233]: a
Out[233]:
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [234]: axis = 1

In [235]: np.random.shuffle(a.swapaxes(axis, 0))

In [236]: a
Out[236]:
array([[ 3,  2,  4,  0,  1],
       [ 8,  7,  9,  5,  6],
       [13, 12, 14, 10, 11]])


So that's the first part--adding an `axis` keyword.

The other part of the enhancement request is to add a shuffle
behavior that shuffles the 1-d slices *independently*.  That is,
for a 2-d array, shuffling with `axis=0` would apply a different
shuffle to each column.  In the github issue, I defined a
function called `disarrange` that implements this behavior:

In [240]: a
Out[240]:
array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [241]: disarrange(a, axis=0)

In [242]: a
Out[242]:
array([[ 6,  1,  2],
       [ 3, 13, 14],
       [ 9, 10,  5],
       [12,  7,  8],
       [ 0,  4, 11]])


Note that each column has been shuffled independently.

This behavior is analogous to how `sort` handles the `axis`
keyword.  `sort` sorts the 1-d slices along the given axis
independently.

In the github issue, I suggested the following signature
for `shuffle` (but I'm not too fond of the name `independent`):

  def shuffle(a, independent=False, axis=0)

If `independent` is False, the current behavior of `shuffle`
is used.  If `independent` is True, each 1-d slice is shuffled
independently (in the same way that `sort` sorts each 1-d
slice).

Like most functions that take an `axis` argument, `axis=None`
means to shuffle the flattened array.  With `independent=True`,
it would act like `np.random.shuffle(a.flat)`, e.g.

In [247]: a
Out[247]:
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [248]: np.random.shuffle(a.flat)

In [249]: a
Out[249]:
array([[ 0, 14,  9,  1, 13],
       [ 2,  8,  5,  3,  4],
       [ 6, 10,  7, 12, 11]])



A small wart in this API is the meaning of

  shuffle(a, independent=False, axis=None)

It could be argued that the correct behavior is to leave the
array unchanged. (The current behavior can be interpreted as
shuffling a 1-d sequence of monolithic blobs; the axis argument
specifies which axis of the array corresponds to the
sequence index.  Then `axis=None` means the argument is
a single monolithic blob, so there is nothing to shuffle.)
Or an error could be raised.

What do you think?

Warren


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Request for enhancement to numpy.random.shuffle

John Zwinck
On Sun, Oct 12, 2014 at 6:51 AM, Warren Weckesser
<[hidden email]> wrote:
> I created an issue on github for an enhancement
> to numpy.random.shuffle:
>     https://github.com/numpy/numpy/issues/5173

I like this idea.  I was a bit surprised there wasn't something like
this already.

> A small wart in this API is the meaning of
>
>   shuffle(a, independent=False, axis=None)
>
> It could be argued that the correct behavior is to leave the
> array unchanged. (The current behavior can be interpreted as
> shuffling a 1-d sequence of monolithic blobs; the axis argument
> specifies which axis of the array corresponds to the
> sequence index.  Then `axis=None` means the argument is
> a single monolithic blob, so there is nothing to shuffle.)
> Or an error could be raised.

Let's think about it from the other direction: if a user wants to
shuffle all the elements as if it were 1-d, as you point out they
could do this:

  shuffle(a, axis=None, independent=True)

But that's a lot of typing.  Maybe we should just let this do the same thing:

  shuffle(a, axis=None)

That seems to be in keeping with the other APIs taking axis as you
mentioned.  To me, "independent" has no relevance when the array is
1-d, it can simply be ignored.

John Zwinck
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Request for enhancement to numpy.random.shuffle

Eelco Hoogendoorn
Thanks Warren, I think these are sensible additions.

I would argue to treat the None-False condition as an error. Indeed I agree one might argue the correcr behavior is to 'shuffle' the singleton block of data, which does nothing; but its more likely to come up as an unintended error than as a natural outcome of parametrized behavior.

On Sun, Oct 12, 2014 at 3:31 AM, John Zwinck <[hidden email]> wrote:
On Sun, Oct 12, 2014 at 6:51 AM, Warren Weckesser
<[hidden email]> wrote:
> I created an issue on github for an enhancement
> to numpy.random.shuffle:
>     https://github.com/numpy/numpy/issues/5173

I like this idea.  I was a bit surprised there wasn't something like
this already.

> A small wart in this API is the meaning of
>
>   shuffle(a, independent=False, axis=None)
>
> It could be argued that the correct behavior is to leave the
> array unchanged. (The current behavior can be interpreted as
> shuffling a 1-d sequence of monolithic blobs; the axis argument
> specifies which axis of the array corresponds to the
> sequence index.  Then `axis=None` means the argument is
> a single monolithic blob, so there is nothing to shuffle.)
> Or an error could be raised.

Let's think about it from the other direction: if a user wants to
shuffle all the elements as if it were 1-d, as you point out they
could do this:

  shuffle(a, axis=None, independent=True)

But that's a lot of typing.  Maybe we should just let this do the same thing:

  shuffle(a, axis=None)

That seems to be in keeping with the other APIs taking axis as you
mentioned.  To me, "independent" has no relevance when the array is
1-d, it can simply be ignored.

John Zwinck
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Request for enhancement to numpy.random.shuffle

John Zwinck
On Sun, Oct 12, 2014 at 3:51 PM, Eelco Hoogendoorn
<[hidden email]> wrote:
> I would argue to treat the None-False condition as an error. Indeed I agree
> one might argue the correcr behavior is to 'shuffle' the singleton block of
> data, which does nothing; but its more likely to come up as an unintended
> error than as a natural outcome of parametrized behavior.

I'm interested to know why you think axis=None should raise an error
if independent=False when independent=False is the default.  What I
mean is, if someone uses this function and wants axis=None (which
seems not totally unusual), why force them to always type in the
boilerplate independent=True to make it work?

John Zwinck
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Request for enhancement to numpy.random.shuffle

Stéfan van der Walt
In reply to this post by Warren Weckesser-2
Hi Warren

On 2014-10-12 00:51:56, Warren Weckesser <[hidden email]> wrote:
> A small wart in this API is the meaning of
>
>   shuffle(a, independent=False, axis=None)
>
> It could be argued that the correct behavior is to leave the
> array unchanged.

I like the suggested changes.  Since "independent" loses its meaning
when axis is None, I would expect this to have the same effect as
`shuffle(a, independent=True, axis=None)`.  I think a shuffle function
that doesn't shuffle will confuse a lot of people!

Stéfan
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Request for enhancement to numpy.random.shuffle

Eelco Hoogendoorn
yeah, a shuffle function that does not shuffle indeed seems like a major source of bugs to me.

Indeed one could argue that setting axis=None should suffice to give a clear enough declaration of intent; though I wouldn't mind typing the extra bit to ensure consistent semantics.

On Sun, Oct 12, 2014 at 10:56 AM, Stefan van der Walt <[hidden email]> wrote:
Hi Warren

On 2014-10-12 00:51:56, Warren Weckesser <[hidden email]> wrote:
> A small wart in this API is the meaning of
>
>   shuffle(a, independent=False, axis=None)
>
> It could be argued that the correct behavior is to leave the
> array unchanged.

I like the suggested changes.  Since "independent" loses its meaning
when axis is None, I would expect this to have the same effect as
`shuffle(a, independent=True, axis=None)`.  I think a shuffle function
that doesn't shuffle will confuse a lot of people!

Stéfan
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Request for enhancement to numpy.random.shuffle

Robert Kern-2
In reply to this post by Warren Weckesser-2
On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser
<[hidden email]> wrote:

> A small wart in this API is the meaning of
>
>   shuffle(a, independent=False, axis=None)
>
> It could be argued that the correct behavior is to leave the
> array unchanged. (The current behavior can be interpreted as
> shuffling a 1-d sequence of monolithic blobs; the axis argument
> specifies which axis of the array corresponds to the
> sequence index.  Then `axis=None` means the argument is
> a single monolithic blob, so there is nothing to shuffle.)
> Or an error could be raised.
>
> What do you think?

It seems to me a perfectly good reason to have two methods instead of
one. I can't imagine when I wouldn't be using a literal True or False
for this, so it really should be two different methods.

That said, I would just make the axis=None behavior the same for both
methods. axis=None does *not* mean "treat this like a single
monolithic blob" in any of the axis=-having methods; it means "flatten
the array and do the operation on the single flattened axis". I think
the latter behavior is a reasonable interpretation of axis=None for
both methods.

--
Robert Kern
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Request for enhancement to numpy.random.shuffle

Warren Weckesser-2


On Sun, Oct 12, 2014 at 7:57 AM, Robert Kern <[hidden email]> wrote:
On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser
<[hidden email]> wrote:

> A small wart in this API is the meaning of
>
>   shuffle(a, independent=False, axis=None)
>
> It could be argued that the correct behavior is to leave the
> array unchanged. (The current behavior can be interpreted as
> shuffling a 1-d sequence of monolithic blobs; the axis argument
> specifies which axis of the array corresponds to the
> sequence index.  Then `axis=None` means the argument is
> a single monolithic blob, so there is nothing to shuffle.)
> Or an error could be raised.
>
> What do you think?

It seems to me a perfectly good reason to have two methods instead of
one. I can't imagine when I wouldn't be using a literal True or False
for this, so it really should be two different methods.



I agree, and my first inclination was to propose a different method (and I had the bikeshedding conversation with myself about the name: "disarrange", "scramble", "disorder", "randomize", "ashuffle", some other variation of the word "shuffle", ...), but I figured the first thing folks would say is "Why not just add options to shuffle?"  So, choose your battles and all that.

What do other folks think of making a separate method?

 
That said, I would just make the axis=None behavior the same for both
methods. axis=None does *not* mean "treat this like a single
monolithic blob" in any of the axis=-having methods; it means "flatten
the array and do the operation on the single flattened axis". I think
the latter behavior is a reasonable interpretation of axis=None for
both methods.


Sounds good to me.

Warren

 

--
Robert Kern
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Request for enhancement to numpy.random.shuffle

josef.pktd
On Sun, Oct 12, 2014 at 10:54 AM, Warren Weckesser
<[hidden email]> wrote:

>
>
> On Sun, Oct 12, 2014 at 7:57 AM, Robert Kern <[hidden email]> wrote:
>>
>> On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser
>> <[hidden email]> wrote:
>>
>> > A small wart in this API is the meaning of
>> >
>> >   shuffle(a, independent=False, axis=None)
>> >
>> > It could be argued that the correct behavior is to leave the
>> > array unchanged. (The current behavior can be interpreted as
>> > shuffling a 1-d sequence of monolithic blobs; the axis argument
>> > specifies which axis of the array corresponds to the
>> > sequence index.  Then `axis=None` means the argument is
>> > a single monolithic blob, so there is nothing to shuffle.)
>> > Or an error could be raised.
>> >
>> > What do you think?
>>
>> It seems to me a perfectly good reason to have two methods instead of
>> one. I can't imagine when I wouldn't be using a literal True or False
>> for this, so it really should be two different methods.
>>
>
>
> I agree, and my first inclination was to propose a different method (and I
> had the bikeshedding conversation with myself about the name: "disarrange",
> "scramble", "disorder", "randomize", "ashuffle", some other variation of the
> word "shuffle", ...), but I figured the first thing folks would say is "Why
> not just add options to shuffle?"  So, choose your battles and all that.
>
> What do other folks think of making a separate method?

I'm not a fan of many similar functions.

What's the difference between permute, shuffle and scramble?
And how do I find or remember which is which?


>
>
>>
>> That said, I would just make the axis=None behavior the same for both
>> methods. axis=None does *not* mean "treat this like a single
>> monolithic blob" in any of the axis=-having methods; it means "flatten
>> the array and do the operation on the single flattened axis". I think
>> the latter behavior is a reasonable interpretation of axis=None for
>> both methods.
>
>
>
> Sounds good to me.

+1 (since all the arguments have been already given


Josef
- Why does sort treat columns independently instead of sorting rows?
- because there is lexsort
- Oh, lexsort, I haven thought about it in 5 years. It's not even next
to sort in the pop up code completion


>
> Warren
>
>
>>
>>
>> --
>> Robert Kern
>> _______________________________________________
>> NumPy-Discussion mailing list
>> [hidden email]
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Request for enhancement to numpy.random.shuffle

Warren Weckesser-2


On Sun, Oct 12, 2014 at 11:20 AM, <[hidden email]> wrote:
On Sun, Oct 12, 2014 at 10:54 AM, Warren Weckesser
<[hidden email]> wrote:
>
>
> On Sun, Oct 12, 2014 at 7:57 AM, Robert Kern <[hidden email]> wrote:
>>
>> On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser
>> <[hidden email]> wrote:
>>
>> > A small wart in this API is the meaning of
>> >
>> >   shuffle(a, independent=False, axis=None)
>> >
>> > It could be argued that the correct behavior is to leave the
>> > array unchanged. (The current behavior can be interpreted as
>> > shuffling a 1-d sequence of monolithic blobs; the axis argument
>> > specifies which axis of the array corresponds to the
>> > sequence index.  Then `axis=None` means the argument is
>> > a single monolithic blob, so there is nothing to shuffle.)
>> > Or an error could be raised.
>> >
>> > What do you think?
>>
>> It seems to me a perfectly good reason to have two methods instead of
>> one. I can't imagine when I wouldn't be using a literal True or False
>> for this, so it really should be two different methods.
>>
>
>
> I agree, and my first inclination was to propose a different method (and I
> had the bikeshedding conversation with myself about the name: "disarrange",
> "scramble", "disorder", "randomize", "ashuffle", some other variation of the
> word "shuffle", ...), but I figured the first thing folks would say is "Why
> not just add options to shuffle?"  So, choose your battles and all that.
>
> What do other folks think of making a separate method?

I'm not a fan of many similar functions.

What's the difference between permute, shuffle and scramble?


The difference between `shuffle` and the new method being proposed is explained in the first email in this thread.
`np.random.permutation` with an array argument returns a shuffled copy of the array; it does not modify its argument. (It should also get an `axis` argument when `shuffle` gets an `axis` argument.)


And how do I find or remember which is which?


You could start with `doc(np.random)` (or `np.random?` in ipython).

Warren

 


>
>
>>
>> That said, I would just make the axis=None behavior the same for both
>> methods. axis=None does *not* mean "treat this like a single
>> monolithic blob" in any of the axis=-having methods; it means "flatten
>> the array and do the operation on the single flattened axis". I think
>> the latter behavior is a reasonable interpretation of axis=None for
>> both methods.
>
>
>
> Sounds good to me.

+1 (since all the arguments have been already given


Josef
- Why does sort treat columns independently instead of sorting rows?
- because there is lexsort
- Oh, lexsort, I haven thought about it in 5 years. It's not even next
to sort in the pop up code completion


>
> Warren
>
>
>>
>>
>> --
>> Robert Kern
>> _______________________________________________
>> NumPy-Discussion mailing list
>> [hidden email]
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Request for enhancement to numpy.random.shuffle

josef.pktd
On Sun, Oct 12, 2014 at 11:33 AM, Warren Weckesser
<[hidden email]> wrote:

>
>
> On Sun, Oct 12, 2014 at 11:20 AM, <[hidden email]> wrote:
>>
>> On Sun, Oct 12, 2014 at 10:54 AM, Warren Weckesser
>> <[hidden email]> wrote:
>> >
>> >
>> > On Sun, Oct 12, 2014 at 7:57 AM, Robert Kern <[hidden email]>
>> > wrote:
>> >>
>> >> On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser
>> >> <[hidden email]> wrote:
>> >>
>> >> > A small wart in this API is the meaning of
>> >> >
>> >> >   shuffle(a, independent=False, axis=None)
>> >> >
>> >> > It could be argued that the correct behavior is to leave the
>> >> > array unchanged. (The current behavior can be interpreted as
>> >> > shuffling a 1-d sequence of monolithic blobs; the axis argument
>> >> > specifies which axis of the array corresponds to the
>> >> > sequence index.  Then `axis=None` means the argument is
>> >> > a single monolithic blob, so there is nothing to shuffle.)
>> >> > Or an error could be raised.
>> >> >
>> >> > What do you think?
>> >>
>> >> It seems to me a perfectly good reason to have two methods instead of
>> >> one. I can't imagine when I wouldn't be using a literal True or False
>> >> for this, so it really should be two different methods.
>> >>
>> >
>> >
>> > I agree, and my first inclination was to propose a different method (and
>> > I
>> > had the bikeshedding conversation with myself about the name:
>> > "disarrange",
>> > "scramble", "disorder", "randomize", "ashuffle", some other variation of
>> > the
>> > word "shuffle", ...), but I figured the first thing folks would say is
>> > "Why
>> > not just add options to shuffle?"  So, choose your battles and all that.
>> >
>> > What do other folks think of making a separate method?
>>
>> I'm not a fan of many similar functions.
>>
>> What's the difference between permute, shuffle and scramble?
>
>
>
> The difference between `shuffle` and the new method being proposed is
> explained in the first email in this thread.
> `np.random.permutation` with an array argument returns a shuffled copy of
> the array; it does not modify its argument. (It should also get an `axis`
> argument when `shuffle` gets an `axis` argument.)
>
>
>> And how do I find or remember which is which?
>
>
>
> You could start with `doc(np.random)` (or `np.random?` in ipython).

If you have to check the docstring each time, then there is something wrong.
In my opinion all docstrings should be read only once.

It's like a Windows program where the GUI menus are not **self-explanatory**.

What did Save-As do ?

Josef


>
> Warren
>
>
>>
>>
>>
>> >
>> >
>> >>
>> >> That said, I would just make the axis=None behavior the same for both
>> >> methods. axis=None does *not* mean "treat this like a single
>> >> monolithic blob" in any of the axis=-having methods; it means "flatten
>> >> the array and do the operation on the single flattened axis". I think
>> >> the latter behavior is a reasonable interpretation of axis=None for
>> >> both methods.
>> >
>> >
>> >
>> > Sounds good to me.
>>
>> +1 (since all the arguments have been already given
>>
>>
>> Josef
>> - Why does sort treat columns independently instead of sorting rows?
>> - because there is lexsort
>> - Oh, lexsort, I haven thought about it in 5 years. It's not even next
>> to sort in the pop up code completion
>>
>>
>> >
>> > Warren
>> >
>> >
>> >>
>> >>
>> >> --
>> >> Robert Kern
>> >> _______________________________________________
>> >> NumPy-Discussion mailing list
>> >> [hidden email]
>> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >
>> >
>> >
>> > _______________________________________________
>> > NumPy-Discussion mailing list
>> > [hidden email]
>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >
>> _______________________________________________
>> NumPy-Discussion mailing list
>> [hidden email]
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Request for enhancement to numpy.random.shuffle

Warren Weckesser-2
In reply to this post by Warren Weckesser-2


On Sat, Oct 11, 2014 at 6:51 PM, Warren Weckesser <[hidden email]> wrote:
I created an issue on github for an enhancement
to numpy.random.shuffle:
    https://github.com/numpy/numpy/issues/5173
I'd like to get some feedback on the idea.

Currently, `shuffle` shuffles the first dimension of an array
in-place.  For example, shuffling a 2D array shuffles the rows:

In [227]: a
Out[227]:
array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

In [228]: np.random.shuffle(a)

In [229]: a
Out[229]:
array([[ 0,  1,  2],
       [ 9, 10, 11],
       [ 3,  4,  5],
       [ 6,  7,  8]])



To add an axis keyword, we could (in effect) apply `shuffle` to
`a.swapaxes(axis, 0)`.  For a 2-D array, `axis=1` would shuffles
the columns:

In [232]: a = np.arange(15).reshape(3,5)

In [233]: a
Out[233]:
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [234]: axis = 1

In [235]: np.random.shuffle(a.swapaxes(axis, 0))

In [236]: a
Out[236]:
array([[ 3,  2,  4,  0,  1],
       [ 8,  7,  9,  5,  6],
       [13, 12, 14, 10, 11]])


So that's the first part--adding an `axis` keyword.

The other part of the enhancement request is to add a shuffle
behavior that shuffles the 1-d slices *independently*.  That is,
for a 2-d array, shuffling with `axis=0` would apply a different
shuffle to each column.  In the github issue, I defined a
function called `disarrange` that implements this behavior:

In [240]: a
Out[240]:
array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [241]: disarrange(a, axis=0)

In [242]: a
Out[242]:
array([[ 6,  1,  2],
       [ 3, 13, 14],
       [ 9, 10,  5],
       [12,  7,  8],
       [ 0,  4, 11]])


Note that each column has been shuffled independently.

This behavior is analogous to how `sort` handles the `axis`
keyword.  `sort` sorts the 1-d slices along the given axis
independently.

In the github issue, I suggested the following signature
for `shuffle` (but I'm not too fond of the name `independent`):

  def shuffle(a, independent=False, axis=0)

If `independent` is False, the current behavior of `shuffle`
is used.  If `independent` is True, each 1-d slice is shuffled
independently (in the same way that `sort` sorts each 1-d
slice).

Like most functions that take an `axis` argument, `axis=None`
means to shuffle the flattened array.  With `independent=True`,
it would act like `np.random.shuffle(a.flat)`, e.g.

In [247]: a
Out[247]:
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [248]: np.random.shuffle(a.flat)

In [249]: a
Out[249]:
array([[ 0, 14,  9,  1, 13],
       [ 2,  8,  5,  3,  4],
       [ 6, 10,  7, 12, 11]])



A small wart in this API is the meaning of

  shuffle(a, independent=False, axis=None)

It could be argued that the correct behavior is to leave the
array unchanged. (The current behavior can be interpreted as
shuffling a 1-d sequence of monolithic blobs; the axis argument
specifies which axis of the array corresponds to the
sequence index.  Then `axis=None` means the argument is
a single monolithic blob, so there is nothing to shuffle.)
Or an error could be raised.

What do you think?

Warren




It is clear from the comments so far that, when `axis` is None, the result should be a shuffle of all the elements in the array, for both methods of shuffling (whether implemented as a new method or with a boolean argument to `shuffle`).  Forget I ever suggested doing nothing or raising an error. :)

Josef's comment reminded me that `numpy.random.permutation` returns a shuffled copy of the array (when its argument is an array).  This function should also get an `axis` argument.  `permutation` shuffles the same way `shuffle` does--it simply makes a copy and then calls `shuffle` on the copy.  If a new method is added for the new shuffling style, then it would be consistent to also add a new method that uses the new shuffling style and returns a copy of the shuffled array.   Then we would then have four methods:

                       In-place    Copy
Current shuffle style  shuffle     permutation
New shuffle style      (name TBD)  (name TBD)


(All of them will have an `axis` argument.)

I suspect this will make some folks prefer the approach of adding a boolean argument to `shuffle` and `permutation`.

Warren


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Request for enhancement to numpy.random.shuffle

Sebastian Wagner
In reply to this post by Warren Weckesser-2

On 2014-10-12 16:54, Warren Weckesser wrote:

>
>
> On Sun, Oct 12, 2014 at 7:57 AM, Robert Kern <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser
>     <[hidden email] <mailto:[hidden email]>>
>     wrote:
>
>     > A small wart in this API is the meaning of
>     >
>     >   shuffle(a, independent=False, axis=None)
>     >
>     > It could be argued that the correct behavior is to leave the
>     > array unchanged. (The current behavior can be interpreted as
>     > shuffling a 1-d sequence of monolithic blobs; the axis argument
>     > specifies which axis of the array corresponds to the
>     > sequence index.  Then `axis=None` means the argument is
>     > a single monolithic blob, so there is nothing to shuffle.)
>     > Or an error could be raised.
>     >
>     > What do you think?
>
>     It seems to me a perfectly good reason to have two methods instead of
>     one. I can't imagine when I wouldn't be using a literal True or False
>     for this, so it really should be two different methods.
>
>
>
> I agree, and my first inclination was to propose a different method
> (and I had the bikeshedding conversation with myself about the name:
> "disarrange", "scramble", "disorder", "randomize", "ashuffle", some
> other variation of the word "shuffle", ...), but I figured the first
> thing folks would say is "Why not just add options to shuffle?"  So,
> choose your battles and all that.
>
> What do other folks think of making a separate method
I'm not a fan of more methods with similar functionality in Numpy. It's
already hard to overlook the existing functions and all their possible
applications and variants. The axis=None proposal for shuffling all
items is very intuitive.

I think we don't want to take the path of matlab: a huge amount of
powerful functions, but few people know of their powerful possibilities.

regards,
Sebastian


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Request for enhancement to numpy.random.shuffle

josef.pktd
In reply to this post by Warren Weckesser-2
On Sun, Oct 12, 2014 at 12:14 PM, Warren Weckesser
<[hidden email]> wrote:

>
>
> On Sat, Oct 11, 2014 at 6:51 PM, Warren Weckesser
> <[hidden email]> wrote:
>>
>> I created an issue on github for an enhancement
>> to numpy.random.shuffle:
>>     https://github.com/numpy/numpy/issues/5173
>> I'd like to get some feedback on the idea.
>>
>> Currently, `shuffle` shuffles the first dimension of an array
>> in-place.  For example, shuffling a 2D array shuffles the rows:
>>
>> In [227]: a
>> Out[227]:
>> array([[ 0,  1,  2],
>>        [ 3,  4,  5],
>>        [ 6,  7,  8],
>>        [ 9, 10, 11]])
>>
>> In [228]: np.random.shuffle(a)
>>
>> In [229]: a
>> Out[229]:
>> array([[ 0,  1,  2],
>>        [ 9, 10, 11],
>>        [ 3,  4,  5],
>>        [ 6,  7,  8]])
>>
>>
>> To add an axis keyword, we could (in effect) apply `shuffle` to
>> `a.swapaxes(axis, 0)`.  For a 2-D array, `axis=1` would shuffles
>> the columns:
>>
>> In [232]: a = np.arange(15).reshape(3,5)
>>
>> In [233]: a
>> Out[233]:
>> array([[ 0,  1,  2,  3,  4],
>>        [ 5,  6,  7,  8,  9],
>>        [10, 11, 12, 13, 14]])
>>
>> In [234]: axis = 1
>>
>> In [235]: np.random.shuffle(a.swapaxes(axis, 0))
>>
>> In [236]: a
>> Out[236]:
>> array([[ 3,  2,  4,  0,  1],
>>        [ 8,  7,  9,  5,  6],
>>        [13, 12, 14, 10, 11]])
>>
>> So that's the first part--adding an `axis` keyword.
>>
>> The other part of the enhancement request is to add a shuffle
>> behavior that shuffles the 1-d slices *independently*.  That is,
>> for a 2-d array, shuffling with `axis=0` would apply a different
>> shuffle to each column.  In the github issue, I defined a
>> function called `disarrange` that implements this behavior:
>>
>> In [240]: a
>> Out[240]:
>> array([[ 0,  1,  2],
>>        [ 3,  4,  5],
>>        [ 6,  7,  8],
>>        [ 9, 10, 11],
>>        [12, 13, 14]])
>>
>> In [241]: disarrange(a, axis=0)
>>
>> In [242]: a
>> Out[242]:
>> array([[ 6,  1,  2],
>>        [ 3, 13, 14],
>>        [ 9, 10,  5],
>>        [12,  7,  8],
>>        [ 0,  4, 11]])
>>
>> Note that each column has been shuffled independently.
>>
>> This behavior is analogous to how `sort` handles the `axis`
>> keyword.  `sort` sorts the 1-d slices along the given axis
>> independently.
>>
>> In the github issue, I suggested the following signature
>> for `shuffle` (but I'm not too fond of the name `independent`):
>>
>>   def shuffle(a, independent=False, axis=0)
>>
>> If `independent` is False, the current behavior of `shuffle`
>> is used.  If `independent` is True, each 1-d slice is shuffled
>> independently (in the same way that `sort` sorts each 1-d
>> slice).
>>
>> Like most functions that take an `axis` argument, `axis=None`
>> means to shuffle the flattened array.  With `independent=True`,
>> it would act like `np.random.shuffle(a.flat)`, e.g.
>>
>> In [247]: a
>> Out[247]:
>> array([[ 0,  1,  2,  3,  4],
>>        [ 5,  6,  7,  8,  9],
>>        [10, 11, 12, 13, 14]])
>>
>> In [248]: np.random.shuffle(a.flat)
>>
>> In [249]: a
>> Out[249]:
>> array([[ 0, 14,  9,  1, 13],
>>        [ 2,  8,  5,  3,  4],
>>        [ 6, 10,  7, 12, 11]])
>>
>>
>> A small wart in this API is the meaning of
>>
>>   shuffle(a, independent=False, axis=None)
>>
>> It could be argued that the correct behavior is to leave the
>> array unchanged. (The current behavior can be interpreted as
>> shuffling a 1-d sequence of monolithic blobs; the axis argument
>> specifies which axis of the array corresponds to the
>> sequence index.  Then `axis=None` means the argument is
>> a single monolithic blob, so there is nothing to shuffle.)
>> Or an error could be raised.
>>
>> What do you think?
>>
>> Warren
>>
>
>
>
> It is clear from the comments so far that, when `axis` is None, the result
> should be a shuffle of all the elements in the array, for both methods of
> shuffling (whether implemented as a new method or with a boolean argument to
> `shuffle`).  Forget I ever suggested doing nothing or raising an error. :)
>
> Josef's comment reminded me that `numpy.random.permutation`

which kind of proofs my point

I sometimes have problems finding `shuffle` because I want a function
that does permutation.

Josef

returns a

> shuffled copy of the array (when its argument is an array).  This function
> should also get an `axis` argument.  `permutation` shuffles the same way
> `shuffle` does--it simply makes a copy and then calls `shuffle` on the copy.
> If a new method is added for the new shuffling style, then it would be
> consistent to also add a new method that uses the new shuffling style and
> returns a copy of the shuffled array.   Then we would then have four
> methods:
>
>                        In-place    Copy
> Current shuffle style  shuffle     permutation
> New shuffle style      (name TBD)  (name TBD)
>
> (All of them will have an `axis` argument.)
>
> I suspect this will make some folks prefer the approach of adding a boolean
> argument to `shuffle` and `permutation`.
>
> Warren
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Request for enhancement to numpy.random.shuffle

Warren Weckesser-2
In reply to this post by Warren Weckesser-2


On Sun, Oct 12, 2014 at 12:14 PM, Warren Weckesser <[hidden email]> wrote:


On Sat, Oct 11, 2014 at 6:51 PM, Warren Weckesser <[hidden email]> wrote:
I created an issue on github for an enhancement
to numpy.random.shuffle:
    https://github.com/numpy/numpy/issues/5173
I'd like to get some feedback on the idea.

Currently, `shuffle` shuffles the first dimension of an array
in-place.  For example, shuffling a 2D array shuffles the rows:

In [227]: a
Out[227]:
array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

In [228]: np.random.shuffle(a)

In [229]: a
Out[229]:
array([[ 0,  1,  2],
       [ 9, 10, 11],
       [ 3,  4,  5],
       [ 6,  7,  8]])



To add an axis keyword, we could (in effect) apply `shuffle` to
`a.swapaxes(axis, 0)`.  For a 2-D array, `axis=1` would shuffles
the columns:

In [232]: a = np.arange(15).reshape(3,5)

In [233]: a
Out[233]:
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [234]: axis = 1

In [235]: np.random.shuffle(a.swapaxes(axis, 0))

In [236]: a
Out[236]:
array([[ 3,  2,  4,  0,  1],
       [ 8,  7,  9,  5,  6],
       [13, 12, 14, 10, 11]])


So that's the first part--adding an `axis` keyword.

The other part of the enhancement request is to add a shuffle
behavior that shuffles the 1-d slices *independently*.  That is,
for a 2-d array, shuffling with `axis=0` would apply a different
shuffle to each column.  In the github issue, I defined a
function called `disarrange` that implements this behavior:

In [240]: a
Out[240]:
array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [241]: disarrange(a, axis=0)

In [242]: a
Out[242]:
array([[ 6,  1,  2],
       [ 3, 13, 14],
       [ 9, 10,  5],
       [12,  7,  8],
       [ 0,  4, 11]])


Note that each column has been shuffled independently.

This behavior is analogous to how `sort` handles the `axis`
keyword.  `sort` sorts the 1-d slices along the given axis
independently.

In the github issue, I suggested the following signature
for `shuffle` (but I'm not too fond of the name `independent`):

  def shuffle(a, independent=False, axis=0)

If `independent` is False, the current behavior of `shuffle`
is used.  If `independent` is True, each 1-d slice is shuffled
independently (in the same way that `sort` sorts each 1-d
slice).

Like most functions that take an `axis` argument, `axis=None`
means to shuffle the flattened array.  With `independent=True`,
it would act like `np.random.shuffle(a.flat)`, e.g.

In [247]: a
Out[247]:
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [248]: np.random.shuffle(a.flat)

In [249]: a
Out[249]:
array([[ 0, 14,  9,  1, 13],
       [ 2,  8,  5,  3,  4],
       [ 6, 10,  7, 12, 11]])



A small wart in this API is the meaning of

  shuffle(a, independent=False, axis=None)

It could be argued that the correct behavior is to leave the
array unchanged. (The current behavior can be interpreted as
shuffling a 1-d sequence of monolithic blobs; the axis argument
specifies which axis of the array corresponds to the
sequence index.  Then `axis=None` means the argument is
a single monolithic blob, so there is nothing to shuffle.)
Or an error could be raised.

What do you think?

Warren




It is clear from the comments so far that, when `axis` is None, the result should be a shuffle of all the elements in the array, for both methods of shuffling (whether implemented as a new method or with a boolean argument to `shuffle`).  Forget I ever suggested doing nothing or raising an error. :)

Josef's comment reminded me that `numpy.random.permutation` returns a shuffled copy of the array (when its argument is an array).  This function should also get an `axis` argument.  `permutation` shuffles the same way `shuffle` does--it simply makes a copy and then calls `shuffle` on the copy.  If a new method is added for the new shuffling style, then it would be consistent to also add a new method that uses the new shuffling style and returns a copy of the shuffled array.   Then we would then have four methods:

                       In-place    Copy
Current shuffle style  shuffle     permutation
New shuffle style      (name TBD)  (name TBD)


(All of them will have an `axis` argument.)



That table makes me think that, *if* we go with new methods, the names should be `shuffleXXX` and `permutationXXX`, where `XXX` is a common suffix that is to be determined.  That will ensure that the names appear together in alphabetical lists, and should show up together as options in tab-completion or code-completion.

Warren
 
I suspect this will make some folks prefer the approach of adding a boolean argument to `shuffle` and `permutation`.

Warren



_______________________________________________
NumPy-Discussion mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Request for enhancement to numpy.random.shuffle

Jaime Fernández del Río
On Sun, Oct 12, 2014 at 9:29 AM, Warren Weckesser <[hidden email]> wrote:


On Sun, Oct 12, 2014 at 12:14 PM, Warren Weckesser <[hidden email]> wrote:


On Sat, Oct 11, 2014 at 6:51 PM, Warren Weckesser <[hidden email]> wrote:
I created an issue on github for an enhancement
to numpy.random.shuffle:
    https://github.com/numpy/numpy/issues/5173
I'd like to get some feedback on the idea.

Currently, `shuffle` shuffles the first dimension of an array
in-place.  For example, shuffling a 2D array shuffles the rows:

In [227]: a
Out[227]:
array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

In [228]: np.random.shuffle(a)

In [229]: a
Out[229]:
array([[ 0,  1,  2],
       [ 9, 10, 11],
       [ 3,  4,  5],
       [ 6,  7,  8]])



To add an axis keyword, we could (in effect) apply `shuffle` to
`a.swapaxes(axis, 0)`.  For a 2-D array, `axis=1` would shuffles
the columns:

In [232]: a = np.arange(15).reshape(3,5)

In [233]: a
Out[233]:
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [234]: axis = 1

In [235]: np.random.shuffle(a.swapaxes(axis, 0))

In [236]: a
Out[236]:
array([[ 3,  2,  4,  0,  1],
       [ 8,  7,  9,  5,  6],
       [13, 12, 14, 10, 11]])


So that's the first part--adding an `axis` keyword.

The other part of the enhancement request is to add a shuffle
behavior that shuffles the 1-d slices *independently*.  That is,
for a 2-d array, shuffling with `axis=0` would apply a different
shuffle to each column.  In the github issue, I defined a
function called `disarrange` that implements this behavior:

In [240]: a
Out[240]:
array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [241]: disarrange(a, axis=0)

In [242]: a
Out[242]:
array([[ 6,  1,  2],
       [ 3, 13, 14],
       [ 9, 10,  5],
       [12,  7,  8],
       [ 0,  4, 11]])


Note that each column has been shuffled independently.

This behavior is analogous to how `sort` handles the `axis`
keyword.  `sort` sorts the 1-d slices along the given axis
independently.

In the github issue, I suggested the following signature
for `shuffle` (but I'm not too fond of the name `independent`):

  def shuffle(a, independent=False, axis=0)

If `independent` is False, the current behavior of `shuffle`
is used.  If `independent` is True, each 1-d slice is shuffled
independently (in the same way that `sort` sorts each 1-d
slice).

Like most functions that take an `axis` argument, `axis=None`
means to shuffle the flattened array.  With `independent=True`,
it would act like `np.random.shuffle(a.flat)`, e.g.

In [247]: a
Out[247]:
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [248]: np.random.shuffle(a.flat)

In [249]: a
Out[249]:
array([[ 0, 14,  9,  1, 13],
       [ 2,  8,  5,  3,  4],
       [ 6, 10,  7, 12, 11]])



A small wart in this API is the meaning of

  shuffle(a, independent=False, axis=None)

It could be argued that the correct behavior is to leave the
array unchanged. (The current behavior can be interpreted as
shuffling a 1-d sequence of monolithic blobs; the axis argument
specifies which axis of the array corresponds to the
sequence index.  Then `axis=None` means the argument is
a single monolithic blob, so there is nothing to shuffle.)
Or an error could be raised.

What do you think?

Warren




It is clear from the comments so far that, when `axis` is None, the result should be a shuffle of all the elements in the array, for both methods of shuffling (whether implemented as a new method or with a boolean argument to `shuffle`).  Forget I ever suggested doing nothing or raising an error. :)

Josef's comment reminded me that `numpy.random.permutation` returns a shuffled copy of the array (when its argument is an array).  This function should also get an `axis` argument.  `permutation` shuffles the same way `shuffle` does--it simply makes a copy and then calls `shuffle` on the copy.  If a new method is added for the new shuffling style, then it would be consistent to also add a new method that uses the new shuffling style and returns a copy of the shuffled array.   Then we would then have four methods:

                       In-place    Copy
Current shuffle style  shuffle     permutation
New shuffle style      (name TBD)  (name TBD)


(All of them will have an `axis` argument.)



That table makes me think that, *if* we go with new methods, the names should be `shuffleXXX` and `permutationXXX`, where `XXX` is a common suffix that is to be determined.  That will ensure that the names appear together in alphabetical lists, and should show up together as options in tab-completion or code-completion.

Just to add some noise to a productive conversation: if you add a 'copy' flag to shuffle, then all the functionality is in one place, and 'permutation' can either be deprecated, or trivially implemented in terms of the new 'shuffle'.

Jaime
 


Warren
 
I suspect this will make some folks prefer the approach of adding a boolean argument to `shuffle` and `permutation`.

Warren



_______________________________________________
NumPy-Discussion mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion




--
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Request for enhancement to numpy.random.shuffle

Stephan Hoyer-2
On Sun, Oct 12, 2014 at 10:56 AM, Jaime Fernández del Río <[hidden email]> wrote:
Just to add some noise to a productive conversation: if you add a 'copy' flag to shuffle, then all the functionality is in one place, and 'permutation' can either be deprecated, or trivially implemented in terms of the new 'shuffle'.

+1

Unfortunately, shuffle has the better name, but permutation has the better default behavior.

(also, I think "inplace" might be a less ambiguous name for the argument than "copy")

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Request for enhancement to numpy.random.shuffle

Nathaniel Smith
In reply to this post by Sebastian Wagner
On Sun, Oct 12, 2014 at 5:14 PM, Sebastian <[hidden email]> wrote:

>
> On 2014-10-12 16:54, Warren Weckesser wrote:
>>
>>
>> On Sun, Oct 12, 2014 at 7:57 AM, Robert Kern <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>>     On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser
>>     <[hidden email] <mailto:[hidden email]>>
>>     wrote:
>>
>>     > A small wart in this API is the meaning of
>>     >
>>     >   shuffle(a, independent=False, axis=None)
>>     >
>>     > It could be argued that the correct behavior is to leave the
>>     > array unchanged. (The current behavior can be interpreted as
>>     > shuffling a 1-d sequence of monolithic blobs; the axis argument
>>     > specifies which axis of the array corresponds to the
>>     > sequence index.  Then `axis=None` means the argument is
>>     > a single monolithic blob, so there is nothing to shuffle.)
>>     > Or an error could be raised.
>>     >
>>     > What do you think?
>>
>>     It seems to me a perfectly good reason to have two methods instead of
>>     one. I can't imagine when I wouldn't be using a literal True or False
>>     for this, so it really should be two different methods.
>>
>>
>>
>> I agree, and my first inclination was to propose a different method
>> (and I had the bikeshedding conversation with myself about the name:
>> "disarrange", "scramble", "disorder", "randomize", "ashuffle", some
>> other variation of the word "shuffle", ...), but I figured the first
>> thing folks would say is "Why not just add options to shuffle?"  So,
>> choose your battles and all that.
>>
>> What do other folks think of making a separate method
> I'm not a fan of more methods with similar functionality in Numpy. It's
> already hard to overlook the existing functions and all their possible
> applications and variants. The axis=None proposal for shuffling all
> items is very intuitive.
>
> I think we don't want to take the path of matlab: a huge amount of
> powerful functions, but few people know of their powerful possibilities.

I totally agree with this principle, but I think this is an exception
to the rule, b/c unfortunately in this case the function that we *do*
have is weird and inconsistent with how most other functions in numpy
work. It doesn't vectorize! Cf. 'sort' or how a 'shuffle' gufunc
(k,)->(k,) would work. Also, it's easy to implement the current
'shuffle' in terms of any 1d shuffle function, with no explicit loops,
Warren's disarrange requires an explicit loop. So, we really
implemented the wrong one, oops. What this means going forward,
though, is that our only options are either to implement both
behaviours with two functions, or else to give up on have the more
natural behaviour altogether. I think the former is the lesser of two
evils.

Regarding names: shuffle/permutation is a terrible naming convention
IMHO and shouldn't be propagated further. We already have a good
naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs.
reversed, etc.

So, how about:

scramble + scrambled shuffle individual entries within each
row/column/..., as in Warren's suggestion.

shuffle + shuffled to do what shuffle, permutation do now (mnemonic:
these break a 2d array into a bunch of 1d "cards", and then shuffle
those cards).

permuted remains indefinitely, with the docstring: "Deprecated alias
for 'shuffled'."

-n

--
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Request for enhancement to numpy.random.shuffle

Warren Weckesser-2


On Sun, Oct 12, 2014 at 9:13 PM, Nathaniel Smith <[hidden email]> wrote:
On Sun, Oct 12, 2014 at 5:14 PM, Sebastian <[hidden email]> wrote:
>
> On 2014-10-12 16:54, Warren Weckesser wrote:
>>
>>
>> On Sun, Oct 12, 2014 at 7:57 AM, Robert Kern <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>>     On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser
>>     <[hidden email] <mailto:[hidden email]>>
>>     wrote:
>>
>>     > A small wart in this API is the meaning of
>>     >
>>     >   shuffle(a, independent=False, axis=None)
>>     >
>>     > It could be argued that the correct behavior is to leave the
>>     > array unchanged. (The current behavior can be interpreted as
>>     > shuffling a 1-d sequence of monolithic blobs; the axis argument
>>     > specifies which axis of the array corresponds to the
>>     > sequence index.  Then `axis=None` means the argument is
>>     > a single monolithic blob, so there is nothing to shuffle.)
>>     > Or an error could be raised.
>>     >
>>     > What do you think?
>>
>>     It seems to me a perfectly good reason to have two methods instead of
>>     one. I can't imagine when I wouldn't be using a literal True or False
>>     for this, so it really should be two different methods.
>>
>>
>>
>> I agree, and my first inclination was to propose a different method
>> (and I had the bikeshedding conversation with myself about the name:
>> "disarrange", "scramble", "disorder", "randomize", "ashuffle", some
>> other variation of the word "shuffle", ...), but I figured the first
>> thing folks would say is "Why not just add options to shuffle?"  So,
>> choose your battles and all that.
>>
>> What do other folks think of making a separate method
> I'm not a fan of more methods with similar functionality in Numpy. It's
> already hard to overlook the existing functions and all their possible
> applications and variants. The axis=None proposal for shuffling all
> items is very intuitive.
>
> I think we don't want to take the path of matlab: a huge amount of
> powerful functions, but few people know of their powerful possibilities.

I totally agree with this principle, but I think this is an exception
to the rule, b/c unfortunately in this case the function that we *do*
have is weird and inconsistent with how most other functions in numpy
work. It doesn't vectorize! Cf. 'sort' or how a 'shuffle' gufunc
(k,)->(k,) would work. Also, it's easy to implement the current
'shuffle' in terms of any 1d shuffle function, with no explicit loops,
Warren's disarrange requires an explicit loop. So, we really
implemented the wrong one, oops. What this means going forward,
though, is that our only options are either to implement both
behaviours with two functions, or else to give up on have the more
natural behaviour altogether. I think the former is the lesser of two
evils.

Regarding names: shuffle/permutation is a terrible naming convention
IMHO and shouldn't be propagated further. We already have a good
naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs.
reversed, etc.

So, how about:

scramble + scrambled shuffle individual entries within each
row/column/..., as in Warren's suggestion.

shuffle + shuffled to do what shuffle, permutation do now (mnemonic:
these break a 2d array into a bunch of 1d "cards", and then shuffle
those cards).

permuted remains indefinitely, with the docstring: "Deprecated alias
for 'shuffled'."



That sounds good to me.  (I might go with 'randomize' instead of 'scramble', but that's a second-order decision for the API.)

Warren


-n

--
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Request for enhancement to numpy.random.shuffle

Jaime Fernández del Río
On Thu, Oct 16, 2014 at 8:39 AM, Warren Weckesser <[hidden email]> wrote:


On Sun, Oct 12, 2014 at 9:13 PM, Nathaniel Smith <[hidden email]> wrote:
On Sun, Oct 12, 2014 at 5:14 PM, Sebastian <[hidden email]> wrote:
>
> On 2014-10-12 16:54, Warren Weckesser wrote:
>>
>>
>> On Sun, Oct 12, 2014 at 7:57 AM, Robert Kern <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>>     On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser
>>     <[hidden email] <mailto:[hidden email]>>
>>     wrote:
>>
>>     > A small wart in this API is the meaning of
>>     >
>>     >   shuffle(a, independent=False, axis=None)
>>     >
>>     > It could be argued that the correct behavior is to leave the
>>     > array unchanged. (The current behavior can be interpreted as
>>     > shuffling a 1-d sequence of monolithic blobs; the axis argument
>>     > specifies which axis of the array corresponds to the
>>     > sequence index.  Then `axis=None` means the argument is
>>     > a single monolithic blob, so there is nothing to shuffle.)
>>     > Or an error could be raised.
>>     >
>>     > What do you think?
>>
>>     It seems to me a perfectly good reason to have two methods instead of
>>     one. I can't imagine when I wouldn't be using a literal True or False
>>     for this, so it really should be two different methods.
>>
>>
>>
>> I agree, and my first inclination was to propose a different method
>> (and I had the bikeshedding conversation with myself about the name:
>> "disarrange", "scramble", "disorder", "randomize", "ashuffle", some
>> other variation of the word "shuffle", ...), but I figured the first
>> thing folks would say is "Why not just add options to shuffle?"  So,
>> choose your battles and all that.
>>
>> What do other folks think of making a separate method
> I'm not a fan of more methods with similar functionality in Numpy. It's
> already hard to overlook the existing functions and all their possible
> applications and variants. The axis=None proposal for shuffling all
> items is very intuitive.
>
> I think we don't want to take the path of matlab: a huge amount of
> powerful functions, but few people know of their powerful possibilities.

I totally agree with this principle, but I think this is an exception
to the rule, b/c unfortunately in this case the function that we *do*
have is weird and inconsistent with how most other functions in numpy
work. It doesn't vectorize! Cf. 'sort' or how a 'shuffle' gufunc
(k,)->(k,) would work. Also, it's easy to implement the current
'shuffle' in terms of any 1d shuffle function, with no explicit loops,
Warren's disarrange requires an explicit loop. So, we really
implemented the wrong one, oops. What this means going forward,
though, is that our only options are either to implement both
behaviours with two functions, or else to give up on have the more
natural behaviour altogether. I think the former is the lesser of two
evils.

Regarding names: shuffle/permutation is a terrible naming convention
IMHO and shouldn't be propagated further. We already have a good
naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs.
reversed, etc.

So, how about:

scramble + scrambled shuffle individual entries within each
row/column/..., as in Warren's suggestion.

shuffle + shuffled to do what shuffle, permutation do now (mnemonic:
these break a 2d array into a bunch of 1d "cards", and then shuffle
those cards).

permuted remains indefinitely, with the docstring: "Deprecated alias
for 'shuffled'."



That sounds good to me.  (I might go with 'randomize' instead of 'scramble', but that's a second-order decision for the API.)

So the only little detail left is someone actually rolling up his/her sleeves and creating a PR... ;-)

The current shuffle and permutation are implemented here:


It's in Cython, so it is a good candidate for anyone wanting to contribute to numpy, but wary of C code.

Jaime

 


Warren


-n

--
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion




--
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion
12