workaround for searchsorted with strings?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

workaround for searchsorted with strings?

Lewis Hyatt-2
Hello-

I see from this thread:
http://article.gmane.org/gmane.comp.python.numeric.general/18746/

that searchsorted does not work correctly with strings. Is there a workaround,
though, that I can use with 1.0.4 until there is a new official numpy release
that includes the fix mentioned in the reference above? Using the latest SVN
version is not an option for me.

My understanding was that searchsorted works OK if the strings are all the same
data type, but that does not appear to be the case:

p >>> x=array(['0', '1', '2', '12'])
p >>> y=array(['0', '0', '2', '3', '123'])
p >>> x.searchsorted(y)
 array([0, 0, 0, 2, 0])
p >>> x.astype(y.dtype).searchsorted(y)
 array([0, 0, 2, 4, 2])

I understand that the first call to searchsorted fails because y has type S3 and
x  has type S2. But it seems that changing the type of x produces still
incorrect (albeit) different results. Is there something similar I can do to
make this work for now? Thanks very much.

-Lewis

_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: workaround for searchsorted with strings?

Lewis Hyatt-2
Oh sorry, my example was dumb, never mind. It looks like this way does work
after all. Can someone please confirm for me, though, that the workaround I am
using (just changing to the wider string type) is reliable? Thanks, sorry for
the noise.

-Lewis



_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: workaround for searchsorted with strings?

Charles R Harris
In reply to this post by Lewis Hyatt-2


On Thu, May 22, 2008 at 12:29 PM, Lewis Hyatt <[hidden email]> wrote:
Hello-

I see from this thread:
http://article.gmane.org/gmane.comp.python.numeric.general/18746/

that searchsorted does not work correctly with strings. Is there a workaround,
though, that I can use with 1.0.4 until there is a new official numpy release
that includes the fix mentioned in the reference above? Using the latest SVN
version is not an option for me.

My understanding was that searchsorted works OK if the strings are all the same
data type, but that does not appear to be the case:

p >>> x=array(['0', '1', '2', '12'])
p >>> y=array(['0', '0', '2', '3', '123'])
p >>> x.searchsorted(y)
 array([0, 0, 0, 2, 0])
p >>> x.astype(y.dtype).searchsorted(y)
 array([0, 0, 2, 4, 2])

I understand that the first call to searchsorted fails because y has type S3 and
x  has type S2. But it seems that changing the type of x produces still
incorrect (albeit) different results. Is there something similar I can do to
make this work for now? Thanks very much.

The x array is not sorted. Try

x = array(['0', '1', '12', '2'])



_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: workaround for searchsorted with strings?

Charles R Harris
In reply to this post by Lewis Hyatt-2


On Thu, May 22, 2008 at 12:36 PM, Lewis Hyatt <[hidden email]> wrote:
Oh sorry, my example was dumb, never mind. It looks like this way does work
after all. Can someone please confirm for me, though, that the workaround I am
using (just changing to the wider string type) is reliable? Thanks, sorry for
the noise.

You can still have problems because the numpy strings will be filled out with zeros. The string compare in 1.0.4 doesn't handle zeros correctly and this might cause some problems.

Chuck



_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion