# Request for enhancement to numpy.random.shuffle

26 messages
12
Open this post in threaded view
|

## Request for enhancement to numpy.random.shuffle

 I created an issue on github for an enhancementto numpy.random.shuffle:    https://github.com/numpy/numpy/issues/5173I'd like to get some feedback on the idea.Currently, `shuffle` shuffles the first dimension of an arrayin-place.  For example, shuffling a 2D array shuffles the rows:In [227]: aOut[227]: array([[ 0,  1,  2],       [ 3,  4,  5],       [ 6,  7,  8],       [ 9, 10, 11]])In [228]: np.random.shuffle(a)In [229]: aOut[229]: array([[ 0,  1,  2],       [ 9, 10, 11],       [ 3,  4,  5],       [ 6,  7,  8]])To add an axis keyword, we could (in effect) apply `shuffle` to`a.swapaxes(axis, 0)`.  For a 2-D array, `axis=1` would shufflesthe columns:In [232]: a = np.arange(15).reshape(3,5)In [233]: aOut[233]: array([[ 0,  1,  2,  3,  4],       [ 5,  6,  7,  8,  9],       [10, 11, 12, 13, 14]])In [234]: axis = 1In [235]: np.random.shuffle(a.swapaxes(axis, 0))In [236]: aOut[236]: array([[ 3,  2,  4,  0,  1],       [ 8,  7,  9,  5,  6],       [13, 12, 14, 10, 11]])So that's the first part--adding an `axis` keyword.The other part of the enhancement request is to add a shufflebehavior that shuffles the 1-d slices *independently*.  That is,for a 2-d array, shuffling with `axis=0` would apply a differentshuffle to each column.  In the github issue, I defined afunction called `disarrange` that implements this behavior:In [240]: aOut[240]: array([[ 0,  1,  2],       [ 3,  4,  5],       [ 6,  7,  8],       [ 9, 10, 11],       [12, 13, 14]])In [241]: disarrange(a, axis=0)In [242]: aOut[242]: array([[ 6,  1,  2],       [ 3, 13, 14],       [ 9, 10,  5],       [12,  7,  8],       [ 0,  4, 11]])Note that each column has been shuffled independently.This behavior is analogous to how `sort` handles the `axis`keyword.  `sort` sorts the 1-d slices along the given axisindependently.In the github issue, I suggested the following signaturefor `shuffle` (but I'm not too fond of the name `independent`):  def shuffle(a, independent=False, axis=0)If `independent` is False, the current behavior of `shuffle`is used.  If `independent` is True, each 1-d slice is shuffledindependently (in the same way that `sort` sorts each 1-dslice).Like most functions that take an `axis` argument, `axis=None`means to shuffle the flattened array.  With `independent=True`,it would act like `np.random.shuffle(a.flat)`, e.g.In [247]: aOut[247]: array([[ 0,  1,  2,  3,  4],       [ 5,  6,  7,  8,  9],       [10, 11, 12, 13, 14]])In [248]: np.random.shuffle(a.flat)In [249]: aOut[249]: array([[ 0, 14,  9,  1, 13],       [ 2,  8,  5,  3,  4],       [ 6, 10,  7, 12, 11]])A small wart in this API is the meaning of  shuffle(a, independent=False, axis=None)It could be argued that the correct behavior is to leave thearray unchanged. (The current behavior can be interpreted asshuffling a 1-d sequence of monolithic blobs; the axis argumentspecifies which axis of the array corresponds to thesequence index.  Then `axis=None` means the argument isa single monolithic blob, so there is nothing to shuffle.)Or an error could be raised.What do you think?Warren _______________________________________________ NumPy-Discussion mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: Request for enhancement to numpy.random.shuffle

 On Sun, Oct 12, 2014 at 6:51 AM, Warren Weckesser <[hidden email]> wrote: > I created an issue on github for an enhancement > to numpy.random.shuffle: >     https://github.com/numpy/numpy/issues/5173I like this idea.  I was a bit surprised there wasn't something like this already. > A small wart in this API is the meaning of > >   shuffle(a, independent=False, axis=None) > > It could be argued that the correct behavior is to leave the > array unchanged. (The current behavior can be interpreted as > shuffling a 1-d sequence of monolithic blobs; the axis argument > specifies which axis of the array corresponds to the > sequence index.  Then `axis=None` means the argument is > a single monolithic blob, so there is nothing to shuffle.) > Or an error could be raised. Let's think about it from the other direction: if a user wants to shuffle all the elements as if it were 1-d, as you point out they could do this:   shuffle(a, axis=None, independent=True) But that's a lot of typing.  Maybe we should just let this do the same thing:   shuffle(a, axis=None) That seems to be in keeping with the other APIs taking axis as you mentioned.  To me, "independent" has no relevance when the array is 1-d, it can simply be ignored. John Zwinck _______________________________________________ NumPy-Discussion mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: Request for enhancement to numpy.random.shuffle

 Thanks Warren, I think these are sensible additions.I would argue to treat the None-False condition as an error. Indeed I agree one might argue the correcr behavior is to 'shuffle' the singleton block of data, which does nothing; but its more likely to come up as an unintended error than as a natural outcome of parametrized behavior.On Sun, Oct 12, 2014 at 3:31 AM, John Zwinck wrote:On Sun, Oct 12, 2014 at 6:51 AM, Warren Weckesser <[hidden email]> wrote: > I created an issue on github for an enhancement > to numpy.random.shuffle: >     https://github.com/numpy/numpy/issues/5173 I like this idea.  I was a bit surprised there wasn't something like this already. > A small wart in this API is the meaning of > >   shuffle(a, independent=False, axis=None) > > It could be argued that the correct behavior is to leave the > array unchanged. (The current behavior can be interpreted as > shuffling a 1-d sequence of monolithic blobs; the axis argument > specifies which axis of the array corresponds to the > sequence index.  Then `axis=None` means the argument is > a single monolithic blob, so there is nothing to shuffle.) > Or an error could be raised. Let's think about it from the other direction: if a user wants to shuffle all the elements as if it were 1-d, as you point out they could do this:   shuffle(a, axis=None, independent=True) But that's a lot of typing.  Maybe we should just let this do the same thing:   shuffle(a, axis=None) That seems to be in keeping with the other APIs taking axis as you mentioned.  To me, "independent" has no relevance when the array is 1-d, it can simply be ignored. John Zwinck _______________________________________________ NumPy-Discussion mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: Request for enhancement to numpy.random.shuffle

 On Sun, Oct 12, 2014 at 3:51 PM, Eelco Hoogendoorn <[hidden email]> wrote: > I would argue to treat the None-False condition as an error. Indeed I agree > one might argue the correcr behavior is to 'shuffle' the singleton block of > data, which does nothing; but its more likely to come up as an unintended > error than as a natural outcome of parametrized behavior. I'm interested to know why you think axis=None should raise an error if independent=False when independent=False is the default.  What I mean is, if someone uses this function and wants axis=None (which seems not totally unusual), why force them to always type in the boilerplate independent=True to make it work? John Zwinck _______________________________________________ NumPy-Discussion mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: Request for enhancement to numpy.random.shuffle

 In reply to this post by Warren Weckesser-2 Hi Warren On 2014-10-12 00:51:56, Warren Weckesser <[hidden email]> wrote: > A small wart in this API is the meaning of > >   shuffle(a, independent=False, axis=None) > > It could be argued that the correct behavior is to leave the > array unchanged. I like the suggested changes.  Since "independent" loses its meaning when axis is None, I would expect this to have the same effect as `shuffle(a, independent=True, axis=None)`.  I think a shuffle function that doesn't shuffle will confuse a lot of people! Stéfan _______________________________________________ NumPy-Discussion mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: Request for enhancement to numpy.random.shuffle

 yeah, a shuffle function that does not shuffle indeed seems like a major source of bugs to me.Indeed one could argue that setting axis=None should suffice to give a clear enough declaration of intent; though I wouldn't mind typing the extra bit to ensure consistent semantics.On Sun, Oct 12, 2014 at 10:56 AM, Stefan van der Walt wrote:Hi Warren On 2014-10-12 00:51:56, Warren Weckesser <[hidden email]> wrote: > A small wart in this API is the meaning of > >   shuffle(a, independent=False, axis=None) > > It could be argued that the correct behavior is to leave the > array unchanged. I like the suggested changes.  Since "independent" loses its meaning when axis is None, I would expect this to have the same effect as `shuffle(a, independent=True, axis=None)`.  I think a shuffle function that doesn't shuffle will confuse a lot of people! Stéfan _______________________________________________ NumPy-Discussion mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: Request for enhancement to numpy.random.shuffle

 In reply to this post by Warren Weckesser-2 On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser <[hidden email]> wrote: > A small wart in this API is the meaning of > >   shuffle(a, independent=False, axis=None) > > It could be argued that the correct behavior is to leave the > array unchanged. (The current behavior can be interpreted as > shuffling a 1-d sequence of monolithic blobs; the axis argument > specifies which axis of the array corresponds to the > sequence index.  Then `axis=None` means the argument is > a single monolithic blob, so there is nothing to shuffle.) > Or an error could be raised. > > What do you think? It seems to me a perfectly good reason to have two methods instead of one. I can't imagine when I wouldn't be using a literal True or False for this, so it really should be two different methods. That said, I would just make the axis=None behavior the same for both methods. axis=None does *not* mean "treat this like a single monolithic blob" in any of the axis=-having methods; it means "flatten the array and do the operation on the single flattened axis". I think the latter behavior is a reasonable interpretation of axis=None for both methods. -- Robert Kern _______________________________________________ NumPy-Discussion mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: Request for enhancement to numpy.random.shuffle

 On Sun, Oct 12, 2014 at 7:57 AM, Robert Kern wrote:On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser <[hidden email]> wrote: > A small wart in this API is the meaning of > >   shuffle(a, independent=False, axis=None) > > It could be argued that the correct behavior is to leave the > array unchanged. (The current behavior can be interpreted as > shuffling a 1-d sequence of monolithic blobs; the axis argument > specifies which axis of the array corresponds to the > sequence index.  Then `axis=None` means the argument is > a single monolithic blob, so there is nothing to shuffle.) > Or an error could be raised. > > What do you think? It seems to me a perfectly good reason to have two methods instead of one. I can't imagine when I wouldn't be using a literal True or False for this, so it really should be two different methods. I agree, and my first inclination was to propose a different method (and I had the bikeshedding conversation with myself about the name: "disarrange", "scramble", "disorder", "randomize", "ashuffle", some other variation of the word "shuffle", ...), but I figured the first thing folks would say is "Why not just add options to shuffle?"  So, choose your battles and all that.What do other folks think of making a separate method?  That said, I would just make the axis=None behavior the same for both methods. axis=None does *not* mean "treat this like a single monolithic blob" in any of the axis=-having methods; it means "flatten the array and do the operation on the single flattened axis". I think the latter behavior is a reasonable interpretation of axis=None for both methods.Sounds good to me.Warren  -- Robert Kern _______________________________________________ NumPy-Discussion mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: Request for enhancement to numpy.random.shuffle

 On Sun, Oct 12, 2014 at 10:54 AM, Warren Weckesser <[hidden email]> wrote: > > > On Sun, Oct 12, 2014 at 7:57 AM, Robert Kern <[hidden email]> wrote: >> >> On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser >> <[hidden email]> wrote: >> >> > A small wart in this API is the meaning of >> > >> >   shuffle(a, independent=False, axis=None) >> > >> > It could be argued that the correct behavior is to leave the >> > array unchanged. (The current behavior can be interpreted as >> > shuffling a 1-d sequence of monolithic blobs; the axis argument >> > specifies which axis of the array corresponds to the >> > sequence index.  Then `axis=None` means the argument is >> > a single monolithic blob, so there is nothing to shuffle.) >> > Or an error could be raised. >> > >> > What do you think? >> >> It seems to me a perfectly good reason to have two methods instead of >> one. I can't imagine when I wouldn't be using a literal True or False >> for this, so it really should be two different methods. >> > > > I agree, and my first inclination was to propose a different method (and I > had the bikeshedding conversation with myself about the name: "disarrange", > "scramble", "disorder", "randomize", "ashuffle", some other variation of the > word "shuffle", ...), but I figured the first thing folks would say is "Why > not just add options to shuffle?"  So, choose your battles and all that. > > What do other folks think of making a separate method? I'm not a fan of many similar functions. What's the difference between permute, shuffle and scramble? And how do I find or remember which is which? > > >> >> That said, I would just make the axis=None behavior the same for both >> methods. axis=None does *not* mean "treat this like a single >> monolithic blob" in any of the axis=-having methods; it means "flatten >> the array and do the operation on the single flattened axis". I think >> the latter behavior is a reasonable interpretation of axis=None for >> both methods. > > > > Sounds good to me. +1 (since all the arguments have been already given Josef - Why does sort treat columns independently instead of sorting rows? - because there is lexsort - Oh, lexsort, I haven thought about it in 5 years. It's not even next to sort in the pop up code completion > > Warren > > >> >> >> -- >> Robert Kern >> _______________________________________________ >> NumPy-Discussion mailing list >> [hidden email] >> http://mail.scipy.org/mailman/listinfo/numpy-discussion> > > > _______________________________________________ > NumPy-Discussion mailing list > [hidden email] > http://mail.scipy.org/mailman/listinfo/numpy-discussion> _______________________________________________ NumPy-Discussion mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: Request for enhancement to numpy.random.shuffle

Open this post in threaded view
|

## Re: Request for enhancement to numpy.random.shuffle

Open this post in threaded view
|

## Re: Request for enhancement to numpy.random.shuffle

 In reply to this post by Warren Weckesser-2 On Sat, Oct 11, 2014 at 6:51 PM, Warren Weckesser wrote:I created an issue on github for an enhancementto numpy.random.shuffle:    https://github.com/numpy/numpy/issues/5173I'd like to get some feedback on the idea.Currently, `shuffle` shuffles the first dimension of an arrayin-place.  For example, shuffling a 2D array shuffles the rows:In [227]: aOut[227]: array([[ 0,  1,  2],       [ 3,  4,  5],       [ 6,  7,  8],       [ 9, 10, 11]])In [228]: np.random.shuffle(a)In [229]: aOut[229]: array([[ 0,  1,  2],       [ 9, 10, 11],       [ 3,  4,  5],       [ 6,  7,  8]])To add an axis keyword, we could (in effect) apply `shuffle` to`a.swapaxes(axis, 0)`.  For a 2-D array, `axis=1` would shufflesthe columns:In [232]: a = np.arange(15).reshape(3,5)In [233]: aOut[233]: array([[ 0,  1,  2,  3,  4],       [ 5,  6,  7,  8,  9],       [10, 11, 12, 13, 14]])In [234]: axis = 1In [235]: np.random.shuffle(a.swapaxes(axis, 0))In [236]: aOut[236]: array([[ 3,  2,  4,  0,  1],       [ 8,  7,  9,  5,  6],       [13, 12, 14, 10, 11]])So that's the first part--adding an `axis` keyword.The other part of the enhancement request is to add a shufflebehavior that shuffles the 1-d slices *independently*.  That is,for a 2-d array, shuffling with `axis=0` would apply a differentshuffle to each column.  In the github issue, I defined afunction called `disarrange` that implements this behavior:In [240]: aOut[240]: array([[ 0,  1,  2],       [ 3,  4,  5],       [ 6,  7,  8],       [ 9, 10, 11],       [12, 13, 14]])In [241]: disarrange(a, axis=0)In [242]: aOut[242]: array([[ 6,  1,  2],       [ 3, 13, 14],       [ 9, 10,  5],       [12,  7,  8],       [ 0,  4, 11]])Note that each column has been shuffled independently.This behavior is analogous to how `sort` handles the `axis`keyword.  `sort` sorts the 1-d slices along the given axisindependently.In the github issue, I suggested the following signaturefor `shuffle` (but I'm not too fond of the name `independent`):  def shuffle(a, independent=False, axis=0)If `independent` is False, the current behavior of `shuffle`is used.  If `independent` is True, each 1-d slice is shuffledindependently (in the same way that `sort` sorts each 1-dslice).Like most functions that take an `axis` argument, `axis=None`means to shuffle the flattened array.  With `independent=True`,it would act like `np.random.shuffle(a.flat)`, e.g.In [247]: aOut[247]: array([[ 0,  1,  2,  3,  4],       [ 5,  6,  7,  8,  9],       [10, 11, 12, 13, 14]])In [248]: np.random.shuffle(a.flat)In [249]: aOut[249]: array([[ 0, 14,  9,  1, 13],       [ 2,  8,  5,  3,  4],       [ 6, 10,  7, 12, 11]])A small wart in this API is the meaning of  shuffle(a, independent=False, axis=None)It could be argued that the correct behavior is to leave thearray unchanged. (The current behavior can be interpreted asshuffling a 1-d sequence of monolithic blobs; the axis argumentspecifies which axis of the array corresponds to thesequence index.  Then `axis=None` means the argument isa single monolithic blob, so there is nothing to shuffle.)Or an error could be raised.What do you think?Warren It is clear from the comments so far that, when `axis` is None, the result should be a shuffle of all the elements in the array, for both methods of shuffling (whether implemented as a new method or with a boolean argument to `shuffle`).  Forget I ever suggested doing nothing or raising an error. :)Josef's comment reminded me that `numpy.random.permutation` returns a shuffled copy of the array (when its argument is an array).  This function should also get an `axis` argument.  `permutation` shuffles the same way `shuffle` does--it simply makes a copy and then calls `shuffle` on the copy.  If a new method is added for the new shuffling style, then it would be consistent to also add a new method that uses the new shuffling style and returns a copy of the shuffled array.   Then we would then have four methods:                       In-place    CopyCurrent shuffle style  shuffle     permutationNew shuffle style      (name TBD)  (name TBD)(All of them will have an `axis` argument.)I suspect this will make some folks prefer the approach of adding a boolean argument to `shuffle` and `permutation`.Warren _______________________________________________ NumPy-Discussion mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: Request for enhancement to numpy.random.shuffle

 In reply to this post by Warren Weckesser-2 On 2014-10-12 16:54, Warren Weckesser wrote: > > > On Sun, Oct 12, 2014 at 7:57 AM, Robert Kern <[hidden email] > > wrote: > >     On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser >     <[hidden email] > >     wrote: > >     > A small wart in this API is the meaning of >     > >     >   shuffle(a, independent=False, axis=None) >     > >     > It could be argued that the correct behavior is to leave the >     > array unchanged. (The current behavior can be interpreted as >     > shuffling a 1-d sequence of monolithic blobs; the axis argument >     > specifies which axis of the array corresponds to the >     > sequence index.  Then `axis=None` means the argument is >     > a single monolithic blob, so there is nothing to shuffle.) >     > Or an error could be raised. >     > >     > What do you think? > >     It seems to me a perfectly good reason to have two methods instead of >     one. I can't imagine when I wouldn't be using a literal True or False >     for this, so it really should be two different methods. > > > > I agree, and my first inclination was to propose a different method > (and I had the bikeshedding conversation with myself about the name: > "disarrange", "scramble", "disorder", "randomize", "ashuffle", some > other variation of the word "shuffle", ...), but I figured the first > thing folks would say is "Why not just add options to shuffle?"  So, > choose your battles and all that. > > What do other folks think of making a separate method I'm not a fan of more methods with similar functionality in Numpy. It's already hard to overlook the existing functions and all their possible applications and variants. The axis=None proposal for shuffling all items is very intuitive. I think we don't want to take the path of matlab: a huge amount of powerful functions, but few people know of their powerful possibilities. regards, Sebastian _______________________________________________ NumPy-Discussion mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: Request for enhancement to numpy.random.shuffle

 In reply to this post by Warren Weckesser-2 On Sun, Oct 12, 2014 at 12:14 PM, Warren Weckesser <[hidden email]> wrote: > > > On Sat, Oct 11, 2014 at 6:51 PM, Warren Weckesser > <[hidden email]> wrote: >> >> I created an issue on github for an enhancement >> to numpy.random.shuffle: >>     https://github.com/numpy/numpy/issues/5173>> I'd like to get some feedback on the idea. >> >> Currently, `shuffle` shuffles the first dimension of an array >> in-place.  For example, shuffling a 2D array shuffles the rows: >> >> In [227]: a >> Out[227]: >> array([[ 0,  1,  2], >>        [ 3,  4,  5], >>        [ 6,  7,  8], >>        [ 9, 10, 11]]) >> >> In [228]: np.random.shuffle(a) >> >> In [229]: a >> Out[229]: >> array([[ 0,  1,  2], >>        [ 9, 10, 11], >>        [ 3,  4,  5], >>        [ 6,  7,  8]]) >> >> >> To add an axis keyword, we could (in effect) apply `shuffle` to >> `a.swapaxes(axis, 0)`.  For a 2-D array, `axis=1` would shuffles >> the columns: >> >> In [232]: a = np.arange(15).reshape(3,5) >> >> In [233]: a >> Out[233]: >> array([[ 0,  1,  2,  3,  4], >>        [ 5,  6,  7,  8,  9], >>        [10, 11, 12, 13, 14]]) >> >> In [234]: axis = 1 >> >> In [235]: np.random.shuffle(a.swapaxes(axis, 0)) >> >> In [236]: a >> Out[236]: >> array([[ 3,  2,  4,  0,  1], >>        [ 8,  7,  9,  5,  6], >>        [13, 12, 14, 10, 11]]) >> >> So that's the first part--adding an `axis` keyword. >> >> The other part of the enhancement request is to add a shuffle >> behavior that shuffles the 1-d slices *independently*.  That is, >> for a 2-d array, shuffling with `axis=0` would apply a different >> shuffle to each column.  In the github issue, I defined a >> function called `disarrange` that implements this behavior: >> >> In [240]: a >> Out[240]: >> array([[ 0,  1,  2], >>        [ 3,  4,  5], >>        [ 6,  7,  8], >>        [ 9, 10, 11], >>        [12, 13, 14]]) >> >> In [241]: disarrange(a, axis=0) >> >> In [242]: a >> Out[242]: >> array([[ 6,  1,  2], >>        [ 3, 13, 14], >>        [ 9, 10,  5], >>        [12,  7,  8], >>        [ 0,  4, 11]]) >> >> Note that each column has been shuffled independently. >> >> This behavior is analogous to how `sort` handles the `axis` >> keyword.  `sort` sorts the 1-d slices along the given axis >> independently. >> >> In the github issue, I suggested the following signature >> for `shuffle` (but I'm not too fond of the name `independent`): >> >>   def shuffle(a, independent=False, axis=0) >> >> If `independent` is False, the current behavior of `shuffle` >> is used.  If `independent` is True, each 1-d slice is shuffled >> independently (in the same way that `sort` sorts each 1-d >> slice). >> >> Like most functions that take an `axis` argument, `axis=None` >> means to shuffle the flattened array.  With `independent=True`, >> it would act like `np.random.shuffle(a.flat)`, e.g. >> >> In [247]: a >> Out[247]: >> array([[ 0,  1,  2,  3,  4], >>        [ 5,  6,  7,  8,  9], >>        [10, 11, 12, 13, 14]]) >> >> In [248]: np.random.shuffle(a.flat) >> >> In [249]: a >> Out[249]: >> array([[ 0, 14,  9,  1, 13], >>        [ 2,  8,  5,  3,  4], >>        [ 6, 10,  7, 12, 11]]) >> >> >> A small wart in this API is the meaning of >> >>   shuffle(a, independent=False, axis=None) >> >> It could be argued that the correct behavior is to leave the >> array unchanged. (The current behavior can be interpreted as >> shuffling a 1-d sequence of monolithic blobs; the axis argument >> specifies which axis of the array corresponds to the >> sequence index.  Then `axis=None` means the argument is >> a single monolithic blob, so there is nothing to shuffle.) >> Or an error could be raised. >> >> What do you think? >> >> Warren >> > > > > It is clear from the comments so far that, when `axis` is None, the result > should be a shuffle of all the elements in the array, for both methods of > shuffling (whether implemented as a new method or with a boolean argument to > `shuffle`).  Forget I ever suggested doing nothing or raising an error. :) > > Josef's comment reminded me that `numpy.random.permutation` which kind of proofs my point I sometimes have problems finding `shuffle` because I want a function that does permutation. Josef returns a > shuffled copy of the array (when its argument is an array).  This function > should also get an `axis` argument.  `permutation` shuffles the same way > `shuffle` does--it simply makes a copy and then calls `shuffle` on the copy. > If a new method is added for the new shuffling style, then it would be > consistent to also add a new method that uses the new shuffling style and > returns a copy of the shuffled array.   Then we would then have four > methods: > >                        In-place    Copy > Current shuffle style  shuffle     permutation > New shuffle style      (name TBD)  (name TBD) > > (All of them will have an `axis` argument.) > > I suspect this will make some folks prefer the approach of adding a boolean > argument to `shuffle` and `permutation`. > > Warren > > > _______________________________________________ > NumPy-Discussion mailing list > [hidden email] > http://mail.scipy.org/mailman/listinfo/numpy-discussion> _______________________________________________ NumPy-Discussion mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: Request for enhancement to numpy.random.shuffle

 In reply to this post by Warren Weckesser-2 On Sun, Oct 12, 2014 at 12:14 PM, Warren Weckesser wrote:On Sat, Oct 11, 2014 at 6:51 PM, Warren Weckesser wrote:I created an issue on github for an enhancementto numpy.random.shuffle:    https://github.com/numpy/numpy/issues/5173I'd like to get some feedback on the idea.Currently, `shuffle` shuffles the first dimension of an arrayin-place.  For example, shuffling a 2D array shuffles the rows:In [227]: aOut[227]: array([[ 0,  1,  2],       [ 3,  4,  5],       [ 6,  7,  8],       [ 9, 10, 11]])In [228]: np.random.shuffle(a)In [229]: aOut[229]: array([[ 0,  1,  2],       [ 9, 10, 11],       [ 3,  4,  5],       [ 6,  7,  8]])To add an axis keyword, we could (in effect) apply `shuffle` to`a.swapaxes(axis, 0)`.  For a 2-D array, `axis=1` would shufflesthe columns:In [232]: a = np.arange(15).reshape(3,5)In [233]: aOut[233]: array([[ 0,  1,  2,  3,  4],       [ 5,  6,  7,  8,  9],       [10, 11, 12, 13, 14]])In [234]: axis = 1In [235]: np.random.shuffle(a.swapaxes(axis, 0))In [236]: aOut[236]: array([[ 3,  2,  4,  0,  1],       [ 8,  7,  9,  5,  6],       [13, 12, 14, 10, 11]])So that's the first part--adding an `axis` keyword.The other part of the enhancement request is to add a shufflebehavior that shuffles the 1-d slices *independently*.  That is,for a 2-d array, shuffling with `axis=0` would apply a differentshuffle to each column.  In the github issue, I defined afunction called `disarrange` that implements this behavior:In [240]: aOut[240]: array([[ 0,  1,  2],       [ 3,  4,  5],       [ 6,  7,  8],       [ 9, 10, 11],       [12, 13, 14]])In [241]: disarrange(a, axis=0)In [242]: aOut[242]: array([[ 6,  1,  2],       [ 3, 13, 14],       [ 9, 10,  5],       [12,  7,  8],       [ 0,  4, 11]])Note that each column has been shuffled independently.This behavior is analogous to how `sort` handles the `axis`keyword.  `sort` sorts the 1-d slices along the given axisindependently.In the github issue, I suggested the following signaturefor `shuffle` (but I'm not too fond of the name `independent`):  def shuffle(a, independent=False, axis=0)If `independent` is False, the current behavior of `shuffle`is used.  If `independent` is True, each 1-d slice is shuffledindependently (in the same way that `sort` sorts each 1-dslice).Like most functions that take an `axis` argument, `axis=None`means to shuffle the flattened array.  With `independent=True`,it would act like `np.random.shuffle(a.flat)`, e.g.In [247]: aOut[247]: array([[ 0,  1,  2,  3,  4],       [ 5,  6,  7,  8,  9],       [10, 11, 12, 13, 14]])In [248]: np.random.shuffle(a.flat)In [249]: aOut[249]: array([[ 0, 14,  9,  1, 13],       [ 2,  8,  5,  3,  4],       [ 6, 10,  7, 12, 11]])A small wart in this API is the meaning of  shuffle(a, independent=False, axis=None)It could be argued that the correct behavior is to leave thearray unchanged. (The current behavior can be interpreted asshuffling a 1-d sequence of monolithic blobs; the axis argumentspecifies which axis of the array corresponds to thesequence index.  Then `axis=None` means the argument isa single monolithic blob, so there is nothing to shuffle.)Or an error could be raised.What do you think?Warren It is clear from the comments so far that, when `axis` is None, the result should be a shuffle of all the elements in the array, for both methods of shuffling (whether implemented as a new method or with a boolean argument to `shuffle`).  Forget I ever suggested doing nothing or raising an error. :)Josef's comment reminded me that `numpy.random.permutation` returns a shuffled copy of the array (when its argument is an array).  This function should also get an `axis` argument.  `permutation` shuffles the same way `shuffle` does--it simply makes a copy and then calls `shuffle` on the copy.  If a new method is added for the new shuffling style, then it would be consistent to also add a new method that uses the new shuffling style and returns a copy of the shuffled array.   Then we would then have four methods:                       In-place    CopyCurrent shuffle style  shuffle     permutationNew shuffle style      (name TBD)  (name TBD)(All of them will have an `axis` argument.)That table makes me think that, *if* we go with new methods, the names should be `shuffleXXX` and `permutationXXX`, where `XXX` is a common suffix that is to be determined.  That will ensure that the names appear together in alphabetical lists, and should show up together as options in tab-completion or code-completion.Warren I suspect this will make some folks prefer the approach of adding a boolean argument to `shuffle` and `permutation`.Warren _______________________________________________ NumPy-Discussion mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: Request for enhancement to numpy.random.shuffle

 On Sun, Oct 12, 2014 at 9:29 AM, Warren Weckesser wrote:On Sun, Oct 12, 2014 at 12:14 PM, Warren Weckesser wrote:On Sat, Oct 11, 2014 at 6:51 PM, Warren Weckesser wrote:I created an issue on github for an enhancementto numpy.random.shuffle:    https://github.com/numpy/numpy/issues/5173I'd like to get some feedback on the idea.Currently, `shuffle` shuffles the first dimension of an arrayin-place.  For example, shuffling a 2D array shuffles the rows:In [227]: aOut[227]: array([[ 0,  1,  2],       [ 3,  4,  5],       [ 6,  7,  8],       [ 9, 10, 11]])In [228]: np.random.shuffle(a)In [229]: aOut[229]: array([[ 0,  1,  2],       [ 9, 10, 11],       [ 3,  4,  5],       [ 6,  7,  8]])To add an axis keyword, we could (in effect) apply `shuffle` to`a.swapaxes(axis, 0)`.  For a 2-D array, `axis=1` would shufflesthe columns:In [232]: a = np.arange(15).reshape(3,5)In [233]: aOut[233]: array([[ 0,  1,  2,  3,  4],       [ 5,  6,  7,  8,  9],       [10, 11, 12, 13, 14]])In [234]: axis = 1In [235]: np.random.shuffle(a.swapaxes(axis, 0))In [236]: aOut[236]: array([[ 3,  2,  4,  0,  1],       [ 8,  7,  9,  5,  6],       [13, 12, 14, 10, 11]])So that's the first part--adding an `axis` keyword.The other part of the enhancement request is to add a shufflebehavior that shuffles the 1-d slices *independently*.  That is,for a 2-d array, shuffling with `axis=0` would apply a differentshuffle to each column.  In the github issue, I defined afunction called `disarrange` that implements this behavior:In [240]: aOut[240]: array([[ 0,  1,  2],       [ 3,  4,  5],       [ 6,  7,  8],       [ 9, 10, 11],       [12, 13, 14]])In [241]: disarrange(a, axis=0)In [242]: aOut[242]: array([[ 6,  1,  2],       [ 3, 13, 14],       [ 9, 10,  5],       [12,  7,  8],       [ 0,  4, 11]])Note that each column has been shuffled independently.This behavior is analogous to how `sort` handles the `axis`keyword.  `sort` sorts the 1-d slices along the given axisindependently.In the github issue, I suggested the following signaturefor `shuffle` (but I'm not too fond of the name `independent`):  def shuffle(a, independent=False, axis=0)If `independent` is False, the current behavior of `shuffle`is used.  If `independent` is True, each 1-d slice is shuffledindependently (in the same way that `sort` sorts each 1-dslice).Like most functions that take an `axis` argument, `axis=None`means to shuffle the flattened array.  With `independent=True`,it would act like `np.random.shuffle(a.flat)`, e.g.In [247]: aOut[247]: array([[ 0,  1,  2,  3,  4],       [ 5,  6,  7,  8,  9],       [10, 11, 12, 13, 14]])In [248]: np.random.shuffle(a.flat)In [249]: aOut[249]: array([[ 0, 14,  9,  1, 13],       [ 2,  8,  5,  3,  4],       [ 6, 10,  7, 12, 11]])A small wart in this API is the meaning of  shuffle(a, independent=False, axis=None)It could be argued that the correct behavior is to leave thearray unchanged. (The current behavior can be interpreted asshuffling a 1-d sequence of monolithic blobs; the axis argumentspecifies which axis of the array corresponds to thesequence index.  Then `axis=None` means the argument isa single monolithic blob, so there is nothing to shuffle.)Or an error could be raised.What do you think?Warren It is clear from the comments so far that, when `axis` is None, the result should be a shuffle of all the elements in the array, for both methods of shuffling (whether implemented as a new method or with a boolean argument to `shuffle`).  Forget I ever suggested doing nothing or raising an error. :)Josef's comment reminded me that `numpy.random.permutation` returns a shuffled copy of the array (when its argument is an array).  This function should also get an `axis` argument.  `permutation` shuffles the same way `shuffle` does--it simply makes a copy and then calls `shuffle` on the copy.  If a new method is added for the new shuffling style, then it would be consistent to also add a new method that uses the new shuffling style and returns a copy of the shuffled array.   Then we would then have four methods:                       In-place    CopyCurrent shuffle style  shuffle     permutationNew shuffle style      (name TBD)  (name TBD)(All of them will have an `axis` argument.)That table makes me think that, *if* we go with new methods, the names should be `shuffleXXX` and `permutationXXX`, where `XXX` is a common suffix that is to be determined.  That will ensure that the names appear together in alphabetical lists, and should show up together as options in tab-completion or code-completion.Just to add some noise to a productive conversation: if you add a 'copy' flag to shuffle, then all the functionality is in one place, and 'permutation' can either be deprecated, or trivially implemented in terms of the new 'shuffle'.Jaime Warren I suspect this will make some folks prefer the approach of adding a boolean argument to `shuffle` and `permutation`.Warren _______________________________________________ NumPy-Discussion mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/numpy-discussion -- (\__/)( O.o)( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial. _______________________________________________ NumPy-Discussion mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: Request for enhancement to numpy.random.shuffle

 On Sun, Oct 12, 2014 at 10:56 AM, Jaime Fernández del Río wrote:Just to add some noise to a productive conversation: if you add a 'copy' flag to shuffle, then all the functionality is in one place, and 'permutation' can either be deprecated, or trivially implemented in terms of the new 'shuffle'.+1Unfortunately, shuffle has the better name, but permutation has the better default behavior.(also, I think "inplace" might be a less ambiguous name for the argument than "copy") _______________________________________________ NumPy-Discussion mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: Request for enhancement to numpy.random.shuffle

 In reply to this post by Sebastian Wagner On Sun, Oct 12, 2014 at 5:14 PM, Sebastian <[hidden email]> wrote: > > On 2014-10-12 16:54, Warren Weckesser wrote: >> >> >> On Sun, Oct 12, 2014 at 7:57 AM, Robert Kern <[hidden email] >> > wrote: >> >>     On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser >>     <[hidden email] > >>     wrote: >> >>     > A small wart in this API is the meaning of >>     > >>     >   shuffle(a, independent=False, axis=None) >>     > >>     > It could be argued that the correct behavior is to leave the >>     > array unchanged. (The current behavior can be interpreted as >>     > shuffling a 1-d sequence of monolithic blobs; the axis argument >>     > specifies which axis of the array corresponds to the >>     > sequence index.  Then `axis=None` means the argument is >>     > a single monolithic blob, so there is nothing to shuffle.) >>     > Or an error could be raised. >>     > >>     > What do you think? >> >>     It seems to me a perfectly good reason to have two methods instead of >>     one. I can't imagine when I wouldn't be using a literal True or False >>     for this, so it really should be two different methods. >> >> >> >> I agree, and my first inclination was to propose a different method >> (and I had the bikeshedding conversation with myself about the name: >> "disarrange", "scramble", "disorder", "randomize", "ashuffle", some >> other variation of the word "shuffle", ...), but I figured the first >> thing folks would say is "Why not just add options to shuffle?"  So, >> choose your battles and all that. >> >> What do other folks think of making a separate method > I'm not a fan of more methods with similar functionality in Numpy. It's > already hard to overlook the existing functions and all their possible > applications and variants. The axis=None proposal for shuffling all > items is very intuitive. > > I think we don't want to take the path of matlab: a huge amount of > powerful functions, but few people know of their powerful possibilities. I totally agree with this principle, but I think this is an exception to the rule, b/c unfortunately in this case the function that we *do* have is weird and inconsistent with how most other functions in numpy work. It doesn't vectorize! Cf. 'sort' or how a 'shuffle' gufunc (k,)->(k,) would work. Also, it's easy to implement the current 'shuffle' in terms of any 1d shuffle function, with no explicit loops, Warren's disarrange requires an explicit loop. So, we really implemented the wrong one, oops. What this means going forward, though, is that our only options are either to implement both behaviours with two functions, or else to give up on have the more natural behaviour altogether. I think the former is the lesser of two evils. Regarding names: shuffle/permutation is a terrible naming convention IMHO and shouldn't be propagated further. We already have a good naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs. reversed, etc. So, how about: scramble + scrambled shuffle individual entries within each row/column/..., as in Warren's suggestion. shuffle + shuffled to do what shuffle, permutation do now (mnemonic: these break a 2d array into a bunch of 1d "cards", and then shuffle those cards). permuted remains indefinitely, with the docstring: "Deprecated alias for 'shuffled'." -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org_______________________________________________ NumPy-Discussion mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: Request for enhancement to numpy.random.shuffle

 On Sun, Oct 12, 2014 at 9:13 PM, Nathaniel Smith wrote:On Sun, Oct 12, 2014 at 5:14 PM, Sebastian <[hidden email]> wrote: > > On 2014-10-12 16:54, Warren Weckesser wrote: >> >> >> On Sun, Oct 12, 2014 at 7:57 AM, Robert Kern <[hidden email] >> > wrote: >> >>     On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser >>     <[hidden email] > >>     wrote: >> >>     > A small wart in this API is the meaning of >>     > >>     >   shuffle(a, independent=False, axis=None) >>     > >>     > It could be argued that the correct behavior is to leave the >>     > array unchanged. (The current behavior can be interpreted as >>     > shuffling a 1-d sequence of monolithic blobs; the axis argument >>     > specifies which axis of the array corresponds to the >>     > sequence index.  Then `axis=None` means the argument is >>     > a single monolithic blob, so there is nothing to shuffle.) >>     > Or an error could be raised. >>     > >>     > What do you think? >> >>     It seems to me a perfectly good reason to have two methods instead of >>     one. I can't imagine when I wouldn't be using a literal True or False >>     for this, so it really should be two different methods. >> >> >> >> I agree, and my first inclination was to propose a different method >> (and I had the bikeshedding conversation with myself about the name: >> "disarrange", "scramble", "disorder", "randomize", "ashuffle", some >> other variation of the word "shuffle", ...), but I figured the first >> thing folks would say is "Why not just add options to shuffle?"  So, >> choose your battles and all that. >> >> What do other folks think of making a separate method > I'm not a fan of more methods with similar functionality in Numpy. It's > already hard to overlook the existing functions and all their possible > applications and variants. The axis=None proposal for shuffling all > items is very intuitive. > > I think we don't want to take the path of matlab: a huge amount of > powerful functions, but few people know of their powerful possibilities. I totally agree with this principle, but I think this is an exception to the rule, b/c unfortunately in this case the function that we *do* have is weird and inconsistent with how most other functions in numpy work. It doesn't vectorize! Cf. 'sort' or how a 'shuffle' gufunc (k,)->(k,) would work. Also, it's easy to implement the current 'shuffle' in terms of any 1d shuffle function, with no explicit loops, Warren's disarrange requires an explicit loop. So, we really implemented the wrong one, oops. What this means going forward, though, is that our only options are either to implement both behaviours with two functions, or else to give up on have the more natural behaviour altogether. I think the former is the lesser of two evils. Regarding names: shuffle/permutation is a terrible naming convention IMHO and shouldn't be propagated further. We already have a good naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs. reversed, etc. So, how about: scramble + scrambled shuffle individual entries within each row/column/..., as in Warren's suggestion. shuffle + shuffled to do what shuffle, permutation do now (mnemonic: these break a 2d array into a bunch of 1d "cards", and then shuffle those cards). permuted remains indefinitely, with the docstring: "Deprecated alias for 'shuffled'." That sounds good to me.  (I might go with 'randomize' instead of 'scramble', but that's a second-order decision for the API.)Warren -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org _______________________________________________ NumPy-Discussion mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/numpy-discussion