lexsort

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

lexsort

Eleanor-9
>>> a = numpy.array([[1,2,6], [2,2,8], [2,1,7],[1,1,5]])
>>> a
array([[1, 2, 6],
       [2, 2, 8],
       [2, 1, 7],
       [1, 1, 5]])
>>> indices = numpy.lexsort(a.T)
>>> a.T.take(indices,axis=-1).T
array([[1, 1, 5],
       [1, 2, 6],
       [2, 1, 7],
       [2, 2, 8]])


The above does what I want, equivalent to sorting on column A then
column B in Excel, but al the transposes are ungainly. I've stared at it a while
but can't come up with a more elegant solution. Any ideas?

cheers Eleanor

_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: lexsort

Anne Archibald
2008/5/6 Eleanor <[hidden email]>:

> >>> a = numpy.array([[1,2,6], [2,2,8], [2,1,7],[1,1,5]])
>  >>> a
>  array([[1, 2, 6],
>        [2, 2, 8],
>        [2, 1, 7],
>        [1, 1, 5]])
>  >>> indices = numpy.lexsort(a.T)
>  >>> a.T.take(indices,axis=-1).T
>  array([[1, 1, 5],
>        [1, 2, 6],
>        [2, 1, 7],
>        [2, 2, 8]])
>
>
>  The above does what I want, equivalent to sorting on column A then
>  column B in Excel, but al the transposes are ungainly. I've stared at it a while
>  but can't come up with a more elegant solution. Any ideas?

It appears that lexsort is broken in several ways, and its docstring
is misleading.

First of all, this code is not doing quite what you describe. The
primary key here is the [5,6,7,8] column, followed by the middle and
then by the first. This is almost exactly the opposite of what you
describe (and of what I expected). To get this to sort the way you
describe, the clearest way is to write a sequence:

In [34]: indices = np.lexsort( (a[:,1],a[:,0]) )

In [35]: a[indices,:]
Out[35]:
array([[1, 1, 5],
       [1, 2, 6],
       [2, 1, 7],
       [2, 2, 8]])

In other words,sort over a[:,1], then sort again over a[:,0], making
a[:,0] the primary key. I have used "fancy indexing" to pull the array
into the right order, but it can also be done with take():

In [40]: np.take(a,indices,axis=0)
Out[40]:
array([[1, 1, 5],
       [1, 2, 6],
       [2, 1, 7],
       [2, 2, 8]])


As for why I say lexsort() is broken, well, it simply returns 0 for
higher-rank arrays (rather than either sorting or raising an
exception), it raises an exception when handed axis=None rather than
flattening as the docstring claims, and whatever the axis argument is
supposed to do, it doesn't seem to do it:

In [44]: np.lexsort(a,axis=0)
Out[44]: array([1, 0, 2])

In [45]: np.lexsort(a,axis=-1)
Out[45]: array([1, 0, 2])

In [46]: np.lexsort(a,axis=1)
---------------------------------------------------------------------------
<type 'exceptions.ValueError'>            Traceback (most recent call last)

/home/peridot/<ipython console> in <module>()

<type 'exceptions.ValueError'>: axis(=1) out of bounds


Anne
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: lexsort

Eleanor-9
Anne Archibald <peridot.faceted <at> gmail.com> writes:

>
> It appears that lexsort is broken in several ways, and its docstring
> is misleading.
>
> First of all, this code is not doing quite what you describe. The
> primary key here is the [5,6,7,8] column, followed by the middle and
> then by the first. This is almost exactly the opposite of what you
> want.


Ouch!

That would've got me bad.

>>>a = numpy.array([[1,2,60], [2,2,800], [2,1,7],[1,1,50]])
>>> a
array([[  1,   2,  60],
       [  2,   2, 800],
       [  2,   1,   7],
       [  1,   1,  50]])
>>> a[numpy.lexsort( (a[:,1],a[:,0]) ),:]
array([[  1,   1,  50],
       [  1,   2,  60],
       [  2,   1,   7],
       [  2,   2, 800]])
This is correct
versus

>>> a[numpy.lexsort(a.T),:]
array([[  2,   1,   7],
       [  1,   1,  50],
       [  1,   2,  60],
       [  2,   2, 800]])

which isn't.

Thanks very much Eleanor



_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion