Inverted indices

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Inverted indices

Nicolas P. Rougier

Hi,

I've a small problem for which I cannot find a solution and I'm quite sure there is an obvious one:

I've an array Z (any dtype) with some data.
I've a (sorted) array I (of integer, same size as Z) that tells me the  index of Z[i]  (if necessary, the index can be stored in Z).

Now, I have an arbitrary sequence S of indices (in the sense of I), how do I build the corresponding data ?

Here is a small example:

Z = [(0,0), (1,1), (2,2), (3,3), (4,4))
I  = [0, 20, 23, 24, 37]

S = [ 20,20,0,24]
-> Result should be [(1,1), (1,1), (0,0),(3,3)]

S = [15,15]
-> Wrong (15 not in I) but ideally, I would like this to be converted to [(0,0), (0,0)]


Any idea ?



Nicolas
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Inverted indices

Stéfan van der Walt
Hi Nicolas

On Thu, Aug 7, 2014 at 1:16 PM, Nicolas P. Rougier
<[hidden email]> wrote:

> Here is a small example:
>
> Z = [(0,0), (1,1), (2,2), (3,3), (4,4))
> I  = [0, 20, 23, 24, 37]
>
> S = [ 20,20,0,24]
> -> Result should be [(1,1), (1,1), (0,0),(3,3)]
>
> S = [15,15]
> -> Wrong (15 not in I) but ideally, I would like this to be converted to [(0,0), (0,0)]

First try:

Z = np.array([(0,0), (1,1), (2,2), (3,3), (4,4)])
I = np.array([0, 20, 23, 24, 37])
S = np.array([ 20,20,0,24,15])

out = np.zeros((len(S), len(Z[0])))
mask = (S[:, np.newaxis] == I)
item, coord = np.where(mask)
out[item, :] = Z[coord]

Perhaps there's a neater way of doing it!

Stéfan
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Inverted indices

Nicolas P. Rougier


Nice ! Thanks Stéfan.
I will add it to the numpy 100 problems.


Nicolas



On 07 Aug 2014, at 13:31, Stéfan van der Walt <[hidden email]> wrote:

> Hi Nicolas
>
> On Thu, Aug 7, 2014 at 1:16 PM, Nicolas P. Rougier
> <[hidden email]> wrote:
>> Here is a small example:
>>
>> Z = [(0,0), (1,1), (2,2), (3,3), (4,4))
>> I  = [0, 20, 23, 24, 37]
>>
>> S = [ 20,20,0,24]
>> -> Result should be [(1,1), (1,1), (0,0),(3,3)]
>>
>> S = [15,15]
>> -> Wrong (15 not in I) but ideally, I would like this to be converted to [(0,0), (0,0)]
>
> First try:
>
> Z = np.array([(0,0), (1,1), (2,2), (3,3), (4,4)])
> I = np.array([0, 20, 23, 24, 37])
> S = np.array([ 20,20,0,24,15])
>
> out = np.zeros((len(S), len(Z[0])))
> mask = (S[:, np.newaxis] == I)
> item, coord = np.where(mask)
> out[item, :] = Z[coord]
>
> Perhaps there's a neater way of doing it!
>
> Stéfan
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Inverted indices

Gregor Thalhammer-2
In reply to this post by Nicolas P. Rougier

Am 07.08.2014 um 13:16 schrieb Nicolas P. Rougier <[hidden email]>:

>
> Hi,
>
> I've a small problem for which I cannot find a solution and I'm quite sure there is an obvious one:
>
> I've an array Z (any dtype) with some data.
> I've a (sorted) array I (of integer, same size as Z) that tells me the  index of Z[i]  (if necessary, the index can be stored in Z).
>
> Now, I have an arbitrary sequence S of indices (in the sense of I), how do I build the corresponding data ?
>
> Here is a small example:
>
> Z = [(0,0), (1,1), (2,2), (3,3), (4,4))
> I  = [0, 20, 23, 24, 37]
>
> S = [ 20,20,0,24]
> -> Result should be [(1,1), (1,1), (0,0),(3,3)]
>
> S = [15,15]
> -> Wrong (15 not in I) but ideally, I would like this to be converted to [(0,0), (0,0)]
>
>
> Any idea ?
>

If I is sorted, I would propose to use a bisection algorithm, faster than linear search:

Z = array([(0,0), (1,1), (2,2), (3,3), (4,4)])
I = array([0, 20, 23, 24, 37])
S = array([ 20,20,0,24,15,27])

a = zeros(S.shape,dtype=int)
b = a + S.shape[0]-1
for i in range(int(log2(S.shape[0]))+2):
    c = (a+b)>>1
    sel = I[c]<=S
    a[sel] = c[sel]
    b[~sel] = c[~sel]
Z[c]

If I[c] != S, then there is no corresponding index entry in I to match S.

Gregor

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Inverted indices

Gregor Thalhammer-2

Am 07.08.2014 um 13:59 schrieb Gregor Thalhammer <[hidden email]>:


Am 07.08.2014 um 13:16 schrieb Nicolas P. Rougier <[hidden email]>:


Hi,

I've a small problem for which I cannot find a solution and I'm quite sure there is an obvious one:

I've an array Z (any dtype) with some data.
I've a (sorted) array I (of integer, same size as Z) that tells me the  index of Z[i]  (if necessary, the index can be stored in Z).

Now, I have an arbitrary sequence S of indices (in the sense of I), how do I build the corresponding data ?

Here is a small example:

Z = [(0,0), (1,1), (2,2), (3,3), (4,4))
I  = [0, 20, 23, 24, 37]

S = [ 20,20,0,24]
-> Result should be [(1,1), (1,1), (0,0),(3,3)]

S = [15,15]
-> Wrong (15 not in I) but ideally, I would like this to be converted to [(0,0), (0,0)]


Any idea ?


If I is sorted, I would propose to use a bisection algorithm, faster than linear search:

Z = array([(0,0), (1,1), (2,2), (3,3), (4,4)])
I = array([0, 20, 23, 24, 37])
S = array([ 20,20,0,24,15,27])

a = zeros(S.shape,dtype=int)
b = a + S.shape[0]-1
for i in range(int(log2(S.shape[0]))+2):
   c = (a+b)>>1
   sel = I[c]<=S
   a[sel] = c[sel]
   b[~sel] = c[~sel]

or even simpler:
c = searchsorted(I, S)
Z[c]

Gregor

Z[c]

If I[c] != S, then there is no corresponding index entry in I to match S.

Gregor


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Inverted indices

Nicolas P. Rougier

Oh thanks, I would never have imagined a one-line solution...


Here are the benchmarks:

In [2]: %timeit stefan(S)
100000 loops, best of 3: 10.8 µs per loop

In [3]: %timeit gregor(S)
10000 loops, best of 3: 48.1 µs per loop

In [4]: %timeit gregor2(S)
100000 loops, best of 3: 3.23 µs per loop


Nicolas


On 07 Aug 2014, at 14:04, Gregor Thalhammer <[hidden email]> wrote:

>
> Am 07.08.2014 um 13:59 schrieb Gregor Thalhammer <[hidden email]>:
>
>>
>> Am 07.08.2014 um 13:16 schrieb Nicolas P. Rougier <[hidden email]>:
>>
>>>
>>> Hi,
>>>
>>> I've a small problem for which I cannot find a solution and I'm quite sure there is an obvious one:
>>>
>>> I've an array Z (any dtype) with some data.
>>> I've a (sorted) array I (of integer, same size as Z) that tells me the  index of Z[i]  (if necessary, the index can be stored in Z).
>>>
>>> Now, I have an arbitrary sequence S of indices (in the sense of I), how do I build the corresponding data ?
>>>
>>> Here is a small example:
>>>
>>> Z = [(0,0), (1,1), (2,2), (3,3), (4,4))
>>> I  = [0, 20, 23, 24, 37]
>>>
>>> S = [ 20,20,0,24]
>>> -> Result should be [(1,1), (1,1), (0,0),(3,3)]
>>>
>>> S = [15,15]
>>> -> Wrong (15 not in I) but ideally, I would like this to be converted to [(0,0), (0,0)]
>>>
>>>
>>> Any idea ?
>>>
>>
>> If I is sorted, I would propose to use a bisection algorithm, faster than linear search:
>>
>> Z = array([(0,0), (1,1), (2,2), (3,3), (4,4)])
>> I = array([0, 20, 23, 24, 37])
>> S = array([ 20,20,0,24,15,27])
>>
>> a = zeros(S.shape,dtype=int)
>> b = a + S.shape[0]-1
>> for i in range(int(log2(S.shape[0]))+2):
>>    c = (a+b)>>1
>>    sel = I[c]<=S
>>    a[sel] = c[sel]
>>    b[~sel] = c[~sel]
>
> or even simpler:
> c = searchsorted(I, S)
> Z[c]
>
> Gregor
>
>> Z[c]
>>
>> If I[c] != S, then there is no corresponding index entry in I to match S.
>>
>> Gregor
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion