Casting to np.byte before clearing values

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Casting to np.byte before clearing values

Nicolas P. Rougier

Hi all,


I'm trying to understand why viewing an array as bytes before clearing makes the whole operation faster.
I imagine there is some kind of special treatment for byte arrays but I've no clue.


# Native float
Z_float = np.ones(1000000, float)
Z_int   = np.ones(1000000, int)

%timeit Z_float[...] = 0
1000 loops, best of 3: 361 µs per loop

%timeit Z_int[...] = 0
1000 loops, best of 3: 366 µs per loop

%timeit Z_float.view(np.byte)[...] = 0
1000 loops, best of 3: 267 µs per loop

%timeit Z_int.view(np.byte)[...] = 0
1000 loops, best of 3: 266 µs per loop


Nicolas
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Casting to np.byte before clearing values

Sebastian Berg
On Mo, 2016-12-26 at 10:34 +0100, Nicolas P. Rougier wrote:
> Hi all,
>
>
> I'm trying to understand why viewing an array as bytes before
> clearing makes the whole operation faster.
> I imagine there is some kind of special treatment for byte arrays but
> I've no clue. 
>

Sure, if its a 1-byte width type, the code will end up calling
`memset`. If it is not, it will end up calling a loop with:

while (N > 0) {
    *dst = output;
    *dst += 8;  /* or whatever element size/stride is */
    --N;
}

now why this gives such a difference, I don't really know, but I guess
it is not too surprising and may depend on other things as well.

- Sebastian


>
> # Native float
> Z_float = np.ones(1000000, float)
> Z_int   = np.ones(1000000, int)
>
> %timeit Z_float[...] = 0
> 1000 loops, best of 3: 361 µs per loop
>
> %timeit Z_int[...] = 0
> 1000 loops, best of 3: 366 µs per loop
>
> %timeit Z_float.view(np.byte)[...] = 0
> 1000 loops, best of 3: 267 µs per loop
>
> %timeit Z_int.view(np.byte)[...] = 0
> 1000 loops, best of 3: 266 µs per loop
>
>
> Nicolas
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Casting to np.byte before clearing values

Nicolas P. Rougier

Thanks for the explanation Sebastian, makes sense.

Nicolas


> On 26 Dec 2016, at 11:48, Sebastian Berg <[hidden email]> wrote:
>
> On Mo, 2016-12-26 at 10:34 +0100, Nicolas P. Rougier wrote:
>> Hi all,
>>
>>
>> I'm trying to understand why viewing an array as bytes before
>> clearing makes the whole operation faster.
>> I imagine there is some kind of special treatment for byte arrays but
>> I've no clue.
>>
>
> Sure, if its a 1-byte width type, the code will end up calling
> `memset`. If it is not, it will end up calling a loop with:
>
> while (N > 0) {
>     *dst = output;
>     *dst += 8;  /* or whatever element size/stride is */
>     --N;
> }
>
> now why this gives such a difference, I don't really know, but I guess
> it is not too surprising and may depend on other things as well.
>
> - Sebastian
>
>
>>
>> # Native float
>> Z_float = np.ones(1000000, float)
>> Z_int   = np.ones(1000000, int)
>>
>> %timeit Z_float[...] = 0
>> 1000 loops, best of 3: 361 µs per loop
>>
>> %timeit Z_int[...] = 0
>> 1000 loops, best of 3: 366 µs per loop
>>
>> %timeit Z_float.view(np.byte)[...] = 0
>> 1000 loops, best of 3: 267 µs per loop
>>
>> %timeit Z_int.view(np.byte)[...] = 0
>> 1000 loops, best of 3: 266 µs per loop
>>
>>
>> Nicolas
>> _______________________________________________
>> NumPy-Discussion mailing list
>> [hidden email]
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Casting to np.byte before clearing values

Benjamin Root
Might be os-specific, too. Some virtual memory management systems might special case the zeroing out of memory. Try doing the same thing with a different value than zero.

On Dec 26, 2016 6:15 AM, "Nicolas P. Rougier" <[hidden email]> wrote:

Thanks for the explanation Sebastian, makes sense.

Nicolas


> On 26 Dec 2016, at 11:48, Sebastian Berg <[hidden email]> wrote:
>
> On Mo, 2016-12-26 at 10:34 +0100, Nicolas P. Rougier wrote:
>> Hi all,
>>
>>
>> I'm trying to understand why viewing an array as bytes before
>> clearing makes the whole operation faster.
>> I imagine there is some kind of special treatment for byte arrays but
>> I've no clue.
>>
>
> Sure, if its a 1-byte width type, the code will end up calling
> `memset`. If it is not, it will end up calling a loop with:
>
> while (N > 0) {
>     *dst = output;
>     *dst += 8;  /* or whatever element size/stride is */
>     --N;
> }
>
> now why this gives such a difference, I don't really know, but I guess
> it is not too surprising and may depend on other things as well.
>
> - Sebastian
>
>
>>
>> # Native float
>> Z_float = np.ones(1000000, float)
>> Z_int   = np.ones(1000000, int)
>>
>> %timeit Z_float[...] = 0
>> 1000 loops, best of 3: 361 µs per loop
>>
>> %timeit Z_int[...] = 0
>> 1000 loops, best of 3: 366 µs per loop
>>
>> %timeit Z_float.view(np.byte)[...] = 0
>> 1000 loops, best of 3: 267 µs per loop
>>
>> %timeit Z_int.view(np.byte)[...] = 0
>> 1000 loops, best of 3: 266 µs per loop
>>
>>
>> Nicolas
>> _______________________________________________
>> NumPy-Discussion mailing list
>> [hidden email]
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Casting to np.byte before clearing values

Chris Barker - NOAA Federal
In reply to this post by Nicolas P. Rougier
On Mon, Dec 26, 2016 at 1:34 AM, Nicolas P. Rougier <[hidden email]> wrote:

I'm trying to understand why viewing an array as bytes before clearing makes the whole operation faster.
I imagine there is some kind of special treatment for byte arrays but I've no clue.

I notice that the code is simply setting a value using broadcasting -- I don't think there is anything special about zero in that case. But your subject refers to "clearing" an array.

So I wonder if you have a use case where the performance difference matters, in which case _maybe_ it would be worth having a ndarray.zero() method that efficiently zeros out an array.

Actually, there is ndarray.fill():

In [7]: %timeit Z_float[...] = 0

1000 loops, best of 3: 380 µs per loop


In [8]: %timeit Z_float.view(np.byte)[...] = 0

1000 loops, best of 3: 271 µs per loop


In [9]: %timeit Z_float.fill(0)

1000 loops, best of 3: 363 µs per loop

which seems to take an insignificantly shorter time than assignment. Probably because it's doing exactly the same loop.

whereas a .zero() could use a memset, like it does with bytes.

can't say I have a use-case that would justify this, though.

-CHB




 

# Native float
Z_float = np.ones(1000000, float)
Z_int   = np.ones(1000000, int)

%timeit Z_float[...] = 0
1000 loops, best of 3: 361 µs per loop

%timeit Z_int[...] = 0
1000 loops, best of 3: 366 µs per loop

%timeit Z_float.view(np.byte)[...] = 0
1000 loops, best of 3: 267 µs per loop

%timeit Z_int.view(np.byte)[...] = 0
1000 loops, best of 3: 266 µs per loop


Nicolas
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/numpy-discussion



--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

[hidden email]

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Casting to np.byte before clearing values

Nicolas P. Rougier

Yes, clearing is not the proper word but the "trick" works only work for 0 (I'll get the same result in both cases).


Nicolas


> On 27 Dec 2016, at 20:52, Chris Barker <[hidden email]> wrote:
>
> On Mon, Dec 26, 2016 at 1:34 AM, Nicolas P. Rougier <[hidden email]> wrote:
>
> I'm trying to understand why viewing an array as bytes before clearing makes the whole operation faster.
> I imagine there is some kind of special treatment for byte arrays but I've no clue.
>
> I notice that the code is simply setting a value using broadcasting -- I don't think there is anything special about zero in that case. But your subject refers to "clearing" an array.
>
> So I wonder if you have a use case where the performance difference matters, in which case _maybe_ it would be worth having a ndarray.zero() method that efficiently zeros out an array.
>
> Actually, there is ndarray.fill():
>
> In [7]: %timeit Z_float[...] = 0
>
> 1000 loops, best of 3: 380 µs per loop
>
>
> In [8]: %timeit Z_float.view(np.byte)[...] = 0
>
> 1000 loops, best of 3: 271 µs per loop
>
>
> In [9]: %timeit Z_float.fill(0)
>
> 1000 loops, best of 3: 363 µs per loop
>
> which seems to take an insignificantly shorter time than assignment. Probably because it's doing exactly the same loop.
>
> whereas a .zero() could use a memset, like it does with bytes.
>
> can't say I have a use-case that would justify this, though.
>
> -CHB
>
>
>
>
>  
>
> # Native float
> Z_float = np.ones(1000000, float)
> Z_int   = np.ones(1000000, int)
>
> %timeit Z_float[...] = 0
> 1000 loops, best of 3: 361 µs per loop
>
> %timeit Z_int[...] = 0
> 1000 loops, best of 3: 366 µs per loop
>
> %timeit Z_float.view(np.byte)[...] = 0
> 1000 loops, best of 3: 267 µs per loop
>
> %timeit Z_int.view(np.byte)[...] = 0
> 1000 loops, best of 3: 266 µs per loop
>
>
> Nicolas
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R            (206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115       (206) 526-6317   main reception
>
> [hidden email]
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Loading...