

After poking around for a bit, I was wondering if there was a faster method for the following: # Array of index values 0..n items = numpy.array([0,3,2,1,4,2],dtype=int)
# Count the number of occurrences of each index counts = numpy.zeros(5, dtype=int) for i in items: counts[i] += 1
In my real code, 'items' contain up to a million values and this loop will be in a performance critical area of code. If there is no simple solution, I can trivially code this using the CAPI.
Thanks, Kevin
_______________________________________________
Numpydiscussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpydiscussion


On Thu, May 22, 2008 at 8:59 AM, Kevin Jacobs < [hidden email]>
< [hidden email]> wrote:
> After poking around for a bit, I was wondering if there was a faster method
> for the following:
>
> # Array of index values 0..n
> items = numpy.array([0,3,2,1,4,2],dtype=int)
>
> # Count the number of occurrences of each index
> counts = numpy.zeros(5, dtype=int)
> for i in items:
> counts[i] += 1
>
> In my real code, 'items' contain up to a million values and this loop will
> be in a performance critical area of code. If there is no simple solution,
> I can trivially code this using the CAPI.
How big is n? If it is much smaller than a million then loop over that instead.
_______________________________________________
Numpydiscussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpydiscussion


On Thu, May 22, 2008 at 12:08 PM, Keith Goodman < [hidden email]> wrote:
How big is n? If it is much smaller than a million then loop over that instead.
n is always relatively small, but I'd rather not do: for i in range(n): counts[i] = (items==i).sum()
If that was the best alternative, I'd just bite the bullet and code this in C. Thanks, Kevin
_______________________________________________
Numpydiscussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpydiscussion


On Thu, May 22, 2008 at 9:08 AM, Keith Goodman < [hidden email]> wrote:
> On Thu, May 22, 2008 at 8:59 AM, Kevin Jacobs < [hidden email]>
> < [hidden email]> wrote:
>> After poking around for a bit, I was wondering if there was a faster method
>> for the following:
>>
>> # Array of index values 0..n
>> items = numpy.array([0,3,2,1,4,2],dtype=int)
>>
>> # Count the number of occurrences of each index
>> counts = numpy.zeros(5, dtype=int)
>> for i in items:
>> counts[i] += 1
>>
>> In my real code, 'items' contain up to a million values and this loop will
>> be in a performance critical area of code. If there is no simple solution,
>> I can trivially code this using the CAPI.
>
> How big is n? If it is much smaller than a million then loop over that instead.
Or how about using a list instead:
>> items = [0,3,2,1,4,2]
>> uitems = frozenset(items)
>> count = [items.count(i) for i in uitems]
>> count
[1, 1, 2, 1, 1]
_______________________________________________
Numpydiscussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpydiscussion


You're just trying to do this...correct? >>> import numpy >>> items = numpy.array([0,3,2,1,4,2],dtype=int) >>> unique = numpy.unique(items) >>> unique array([0, 1, 2, 3, 4])
>>> counts=numpy.histogram(items,unique) >>> counts (array([1, 1, 2, 1, 1]), array([0, 1, 2, 3, 4])) >>> counts[0] array([1, 1, 2, 1, 1]) >>>
On Thu, May 22, 2008 at 9:08 AM, Keith Goodman < [hidden email]> wrote:
On Thu, May 22, 2008 at 8:59 AM, Kevin Jacobs <[hidden email]>
<[hidden email]> wrote:
> After poking around for a bit, I was wondering if there was a faster method
> for the following:
>
> # Array of index values 0..n
> items = numpy.array([0,3,2,1,4,2],dtype=int)
>
> # Count the number of occurrences of each index
> counts = numpy.zeros(5, dtype=int)
> for i in items:
> counts[i] += 1
>
> In my real code, 'items' contain up to a million values and this loop will
> be in a performance critical area of code. If there is no simple solution,
> I can trivially code this using the CAPI.
How big is n? If it is much smaller than a million then loop over that instead.
_______________________________________________
Numpydiscussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpydiscussion
_______________________________________________
Numpydiscussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpydiscussion


On Thu, May 22, 2008 at 9:15 AM, Keith Goodman < [hidden email]> wrote:
> On Thu, May 22, 2008 at 9:08 AM, Keith Goodman < [hidden email]> wrote:
>> On Thu, May 22, 2008 at 8:59 AM, Kevin Jacobs < [hidden email]>
>> < [hidden email]> wrote:
>>> After poking around for a bit, I was wondering if there was a faster method
>>> for the following:
>>>
>>> # Array of index values 0..n
>>> items = numpy.array([0,3,2,1,4,2],dtype=int)
>>>
>>> # Count the number of occurrences of each index
>>> counts = numpy.zeros(5, dtype=int)
>>> for i in items:
>>> counts[i] += 1
>>>
>>> In my real code, 'items' contain up to a million values and this loop will
>>> be in a performance critical area of code. If there is no simple solution,
>>> I can trivially code this using the CAPI.
>>
>> How big is n? If it is much smaller than a million then loop over that instead.
>
> Or how about using a list instead:
>
>>> items = [0,3,2,1,4,2]
>>> uitems = frozenset(items)
>>> count = [items.count(i) for i in uitems]
>>> count
> [1, 1, 2, 1, 1]
Oh, I see, so uitems should be range(n)
_______________________________________________
Numpydiscussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpydiscussion


In reply to this post by Kevin Jacobs <jacobs@bioinformed.com>
On Thu, May 22, 2008 at 4:59 PM, Kevin Jacobs < [hidden email]>
< [hidden email]> wrote:
> After poking around for a bit, I was wondering if there was a faster method
> for the following:
>
> # Array of index values 0..n
> items = numpy.array([0,3,2,1,4,2],dtype=int)
>
> # Count the number of occurrences of each index
> counts = numpy.zeros(5, dtype=int)
> for i in items:
> counts[i] += 1
>
> In my real code, 'items' contain up to a million values and this loop will
> be in a performance critical area of code. If there is no simple solution,
> I can trivially code this using the CAPI.
I would use bincount:
count = bincount(items)
should be all you need:
In [192]: items = [0,3,2,1,4,2]
In [193]: bincount(items)
Out[193]: array([1, 1, 2, 1, 1])
In [194]: bincount?
Type: builtin_function_or_method
Base Class: <type 'builtin_function_or_method'>
String Form: <builtin function bincount>
Namespace: Interactive
Docstring:
bincount(x,weights=None)
Return the number of occurrences of each value in x.
x must be a list of nonnegative integers. The output, b[i],
represents the number of times that i is found in x. If weights
is specified, every occurrence of i at a position p contributes
weights[p] instead of 1.
See also: histogram, digitize, unique.
Robin
_______________________________________________
Numpydiscussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpydiscussion


On Thu, May 22, 2008 at 9:22 AM, Robin < [hidden email]> wrote:
> On Thu, May 22, 2008 at 4:59 PM, Kevin Jacobs < [hidden email]>
> < [hidden email]> wrote:
>> After poking around for a bit, I was wondering if there was a faster method
>> for the following:
>>
>> # Array of index values 0..n
>> items = numpy.array([0,3,2,1,4,2],dtype=int)
>>
>> # Count the number of occurrences of each index
>> counts = numpy.zeros(5, dtype=int)
>> for i in items:
>> counts[i] += 1
>>
>> In my real code, 'items' contain up to a million values and this loop will
>> be in a performance critical area of code. If there is no simple solution,
>> I can trivially code this using the CAPI.
>
> I would use bincount:
> count = bincount(items)
> should be all you need:
I guess bincount is *little* faster:
>> items = mp.random.randint(0, 100, (1000000,))
>> timeit mp.bincount(items)
100 loops, best of 3: 4.05 ms per loop
>> items = items.tolist()
>> timeit [items.count(i) for i in range(100)]
10 loops, best of 3: 2.91 s per loop
_______________________________________________
Numpydiscussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpydiscussion

