# Numpify this?

10 messages
Open this post in threaded view
|

## Numpify this?

 Hey, I'm new to numpy but not new to python or programming in general. I was wondering if there's a way of using numpy to do the following or whether I've got what I've got and that's as good as it's going to get. I have two 2d arrays and I want to create another 2d array that contains the values from the 2nd column of the first two arrays where the values in the 1st column match. To elaborate with an example - if I had an array a: array([[2834, 1], [3282, 3], [6850, 2], [9458, 2]]) and an array b: array([[2834, 3], [3282, 5], [4444, 5], [9458, 3], [9999, 4], [11111, 5], [12345, 1]]) then I'd want the result to be array([[1, 3],   # from 2834        [3, 5],    # from 3282        [2, 3]])   # from 9458 This is what I have at the moment: results = [] while aind < amax and bind < bmax:     if a[aind, 0] < b[bind, 0]:             aind += 1     elif a[aind, 0] > b[bind, 0]:             bind += 1     else:             results.append([a[aind, 1], b[bind, 1]])             aind += 1             bind += 1 results = array(results) Where aind = bind = 0, amax = a.shape[0] and bmax = b.shape[0]. Any tips/pointers/speedups? Cheers, Matt _______________________________________________ Numpy-discussion mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: Numpify this?

 On Sun, May 18, 2008 at 2:04 AM, Matt Crane <[hidden email]> wrote: > Hey, > > I'm new to numpy but not new to python or programming in general. I > was wondering if there's a way of using numpy to do the following or > whether I've got what I've got and that's as good as it's going to > get. > > I have two 2d arrays and I want to create another 2d array that > contains the values from the 2nd column of the first two arrays where > the values in the 1st column match. To elaborate with an example - if > I had an array a: > > array([[2834, 1], [3282, 3], [6850, 2], [9458, 2]]) > and an array b: > > array([[2834, 3], [3282, 5], [4444, 5], [9458, 3], [9999, 4], [11111, > 5], [12345, 1]]) > > then I'd want the result to be > > array([[1, 3],   # from 2834 >       [3, 5],    # from 3282 >       [2, 3]])   # from 9458 Are the matching rows always going to be the same row in each? I.e. you want rows i such that a[i,0]==b[i,0] rather than trying to find all i,j such that a[i,0]==b[j,0]? If so, then I would do the following: In [1]: from numpy import * In [2]: a = array([[2834, 1], [3282, 3], [6850, 2], [9458, 2]]) In [3]: b = array([[2834, 3], [3282, 5], [4444, 5], [9458, 3], [9999, 4], [11111,    ...: 5], [12345, 1]]) In [4]: minlength = min(a.shape[0], b.shape[0]) In [5]: matching = nonzero(a[:minlength,0] == b[:minlength,0])[0] In [6]: matching Out[6]: array([0, 1, 3]) In [7]: column_stack([a[matching,1], b[matching,1]]) Out[7]: array([[1, 3],        [3, 5],        [2, 3]]) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth."  -- Umberto Eco _______________________________________________ Numpy-discussion mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: Numpify this?

 On Sun, May 18, 2008 at 7:19 PM, Robert Kern <[hidden email]> wrote: > Are the matching rows always going to be the same row in each? I.e. > you want rows i such that a[i,0]==b[i,0] rather than trying to find > all i,j such that a[i,0]==b[j,0]? > > If so, then I would do the following: > > > In [1]: from numpy import * > > In [2]: a = array([[2834, 1], [3282, 3], [6850, 2], [9458, 2]]) > > In [3]: b = array([[2834, 3], [3282, 5], [4444, 5], [9458, 3], [9999, > 4], [11111, >   ...: 5], [12345, 1]]) > > In [4]: minlength = min(a.shape[0], b.shape[0]) > > In [5]: matching = nonzero(a[:minlength,0] == b[:minlength,0])[0] > > In [6]: matching > Out[6]: array([0, 1, 3]) > > In [7]: column_stack([a[matching,1], b[matching,1]]) > Out[7]: > array([[1, 3], >       [3, 5], >       [2, 3]]) > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." >  -- Umberto Eco > _______________________________________________ > Numpy-discussion mailing list > [hidden email] > http://projects.scipy.org/mailman/listinfo/numpy-discussion> Sorry, I should have mentioned that no, the matching rows won't always be in the same position. _______________________________________________ Numpy-discussion mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: Numpify this?

 On Sun, May 18, 2008 at 2:59 AM, Matt Crane <[hidden email]> wrote: > Sorry, I should have mentioned that no, the matching rows won't always > be in the same position. Okay, then it's just a little bit more complicated. In [18]: from numpy import * In [19]: a = array([[1, 10], [2, 20], [3, 30], [3, 40], [4, 50]]) In [20]: b = array([[2, 60], [1, 70], [5, 80], [6, 90], [7, 100], [3, 110]]) In [21]: m = a[:,0] == b[:,0][:,newaxis] In [22]: m Out[22]: array([[False,  True, False, False, False],        [ True, False, False, False, False],        [False, False, False, False, False],        [False, False, False, False, False],        [False, False, False, False, False],        [False, False,  True,  True, False]], dtype=bool) In [23]: i, j = nonzero(m) In [24]: i Out[24]: array([0, 1, 5, 5]) In [25]: j Out[25]: array([1, 0, 2, 3]) In [26]: column_stack([a[j,1], b[i,1]]) Out[26]: array([[ 20,  60],        [ 10,  70],        [ 30, 110],        [ 40, 110]]) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth."  -- Umberto Eco _______________________________________________ Numpy-discussion mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: Numpify this?

 On Sun, May 18, 2008 at 8:08 PM, Robert Kern <[hidden email]> wrote: > Okay, then it's just a little bit more complicated. Thanks, and that's going to be faster - the method that I posted is linear in terms of the length of the two lists? Given that the values in the first column are monotonically increasing (again something I should have mentioned -- I blame a lack of caffeine) - could we make it faster? Thanks, for everything up to this point though. Matt _______________________________________________ Numpy-discussion mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: Numpify this?

 On Sun, May 18, 2008 at 3:29 AM, Matt Crane <[hidden email]> wrote: > On Sun, May 18, 2008 at 8:08 PM, Robert Kern <[hidden email]> wrote: >> Okay, then it's just a little bit more complicated. > > Thanks, and that's going to be faster - the method that I posted is > linear in terms of the length of the two lists? It depends on the sizes. > Given that the values > in the first column are monotonically increasing (again something I > should have mentioned -- I blame a lack of caffeine) - could we make > it faster? Are there repeats? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth."  -- Umberto Eco _______________________________________________ Numpy-discussion mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: Numpify this?

 On Sun, May 18, 2008 at 8:52 PM, Robert Kern <[hidden email]> wrote: > It depends on the sizes. The sizes could range from 3 to 240000 with an average of around 5500. > Are there repeats? No, no repeats in the first column. I'm going to go get a cup of coffee before I forget to leave out any potentially vital information again. It's going to be a long day. Thanks, Matt _______________________________________________ Numpy-discussion mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: Numpify this?

 On Sun, May 18, 2008 at 4:02 AM, Matt Crane <[hidden email]> wrote: > On Sun, May 18, 2008 at 8:52 PM, Robert Kern <[hidden email]> wrote: >> It depends on the sizes. > The sizes could range from 3 to 240000 with an average of around 5500. A 240000x240000 boolean matrix will probably be too slow. >> Are there repeats? > No, no repeats in the first column. Great! So let's use searchsorted() to find potential indices where the two first columns are equal. We pull out the values at those indices and actually do the comparison to get a boolean mask where there is an equality. Do both a.searchsorted(b) and b.searchsorted(a) to get the appropriate masks on b and a respectively. The number of True elements will be the same for both. Now just apply the masks to the second columns. In [20]: a = array([[2, 10], [4, 20], [6, 30], [8, 40], [10, 50]]) In [21]: b = array([[2, 60], [3, 70], [4, 80], [5, 90], [8, 100], [10, 110]]) In [22]: a[b[b[:,0].searchsorted(a[:,0]),0] == a[:,0], 1] Out[22]: array([10, 20, 40, 50]) In [23]: b[a[a[:,0].searchsorted(b[:,0]),0] == b[:,0], 1] Out[23]: array([ 60,  80, 100, 110]) In [24]: column_stack([Out[22], Out[23]]) Out[24]: array([[ 10,  60],        [ 20,  80],        [ 40, 100],        [ 50, 110]]) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth."  -- Umberto Eco _______________________________________________ Numpy-discussion mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/numpy-discussion