Looking for a difference between Numpy 0.19.5 and 0.20 explaining a perf regression with Pythran

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

Looking for a difference between Numpy 0.19.5 and 0.20 explaining a perf regression with Pythran

PIERRE AUGIER
Hi,

I'm looking for a difference between Numpy 0.19.5 and 0.20 which could explain a performance regression (~15 %) with Pythran.

I observe this regression with the script https://github.com/paugier/nbabel/blob/master/py/bench.py

Pythran reimplements Numpy so it is not about Numpy code for computation. However, Pythran of course uses the native array contained in a Numpy array. I'm quite sure that something has changed between Numpy 0.19.5 and 0.20 (or between the corresponding wheels?) since I don't get the same performance with Numpy 0.20. I checked that the values in the arrays are the same and that the flags characterizing the arrays are also the same.

Good news, I'm now able to obtain the performance difference just with Numpy 0.19.5. In this code, I load the data with Pandas and need to prepare contiguous Numpy arrays to give them to Pythran. With Numpy 0.19.5, if I use np.copy I get better performance that with np.ascontiguousarray. With Numpy 0.20, both functions create array giving the same performance with Pythran (again, less good that with Numpy 0.19.5).

Note that this code is very efficient (more that 100 times faster than using Numpy), so I guess that things like alignment or memory location can lead to such difference.

More details in this issue https://github.com/serge-sans-paille/pythran/issues/1735

Any help to understand what has changed would be greatly appreciated!

Cheers,
Pierre
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Looking for a difference between Numpy 0.19.5 and 0.20 explaining a perf regression with Pythran

Sebastian Berg
On Fri, 2021-03-12 at 21:36 +0100, PIERRE AUGIER wrote:

> Hi,
>
> I'm looking for a difference between Numpy 0.19.5 and 0.20 which
> could explain a performance regression (~15 %) with Pythran.
>
> I observe this regression with the script
> https://github.com/paugier/nbabel/blob/master/py/bench.py
>
> Pythran reimplements Numpy so it is not about Numpy code for
> computation. However, Pythran of course uses the native array
> contained in a Numpy array. I'm quite sure that something has changed
> between Numpy 0.19.5 and 0.20 (or between the corresponding wheels?)
> since I don't get the same performance with Numpy 0.20. I checked
> that the values in the arrays are the same and that the flags
> characterizing the arrays are also the same.
>
> Good news, I'm now able to obtain the performance difference just
> with Numpy 0.19.5. In this code, I load the data with Pandas and need
> to prepare contiguous Numpy arrays to give them to Pythran. With
> Numpy 0.19.5, if I use np.copy I get better performance that with
> np.ascontiguousarray. With Numpy 0.20, both functions create array
> giving the same performance with Pythran (again, less good that with
> Numpy 0.19.5).
>
> Note that this code is very efficient (more that 100 times faster
> than using Numpy), so I guess that things like alignment or memory
> location can lead to such difference.
>
> More details in this issue
> https://github.com/serge-sans-paille/pythran/issues/1735
>
> Any help to understand what has changed would be greatly appreciated!
>
If you want to really dig into this, it would be good to do profiling
to find out at where the differences are.

Without that, I don't have much appetite to investigate personally. The
reason is that fluctuations of ~30% (or even much more) when running
the NumPy benchmarks are very common.

I am not aware of an immediate change in NumPy, especially since you
are talking pythran, and only the memory space or the interface code
should matter.
As to the interface code... I would expect it to be quite a bit faster,
not slower.
There was no change around data allocation, so at best what you are
seeing is a different pattern in how the "small array cache" ends up
being used.


Unfortunately, getting stable benchmarks that reflect code changes
exactly is tough...  Here is a nice blog post from Victor Stinner where
he had to go as far as using "profile guided compilation" to avoid
fluctuations:

https://vstinner.github.io/journey-to-stable-benchmark-deadcode.html

I somewhat hope that this is also the reason for the huge fluctuations
we see in the NumPy benchmarks due to absolutely unrelated code
changes.
But I did not have the energy to try it (and a probably fixed bug in
gcc makes it a bit harder right now).

Cheers,

Sebastian




> Cheers,
> Pierre
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
>


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Looking for a difference between Numpy 0.19.5 and 0.20 explaining a perf regression with Pythran

PIERRE AUGIER
Hi,

I tried to compile Numpy with `pip install numpy==1.20.1 --no-binary numpy --force-reinstall` and I can reproduce the regression.

Good news, I was able to reproduce the difference with only Numpy 1.20.1.

Arrays prepared with (`df` is a Pandas dataframe)

arr = df.values.copy()

or

arr = np.ascontiguousarray(df.values)

lead to "slow" execution while arrays prepared with

arr = np.copy(df.values)

lead to faster execution.

arr.copy() or np.copy(arr) do not give the same result, with arr obtained from a Pandas dataframe with arr = df.values. It's strange because type(df.values) gives <class 'numpy.ndarray'> so I would expect arr.copy() and np.copy(arr) to give exactly the same result.

Note that I think I'm doing quite serious and reproducible benchmarks. I also checked that this regression is reproducible on another computer.

Cheers,

Pierre

----- Mail original -----
> De: "Sebastian Berg" <[hidden email]>
> À: "numpy-discussion" <[hidden email]>
> Envoyé: Vendredi 12 Mars 2021 22:50:24
> Objet: Re: [Numpy-discussion] Looking for a difference between Numpy 0.19.5 and 0.20 explaining a perf regression with
> Pythran

> On Fri, 2021-03-12 at 21:36 +0100, PIERRE AUGIER wrote:
>> Hi,
>>
>> I'm looking for a difference between Numpy 0.19.5 and 0.20 which
>> could explain a performance regression (~15 %) with Pythran.
>>
>> I observe this regression with the script
>> https://github.com/paugier/nbabel/blob/master/py/bench.py
>>
>> Pythran reimplements Numpy so it is not about Numpy code for
>> computation. However, Pythran of course uses the native array
>> contained in a Numpy array. I'm quite sure that something has changed
>> between Numpy 0.19.5 and 0.20 (or between the corresponding wheels?)
>> since I don't get the same performance with Numpy 0.20. I checked
>> that the values in the arrays are the same and that the flags
>> characterizing the arrays are also the same.
>>
>> Good news, I'm now able to obtain the performance difference just
>> with Numpy 0.19.5. In this code, I load the data with Pandas and need
>> to prepare contiguous Numpy arrays to give them to Pythran. With
>> Numpy 0.19.5, if I use np.copy I get better performance that with
>> np.ascontiguousarray. With Numpy 0.20, both functions create array
>> giving the same performance with Pythran (again, less good that with
>> Numpy 0.19.5).
>>
>> Note that this code is very efficient (more that 100 times faster
>> than using Numpy), so I guess that things like alignment or memory
>> location can lead to such difference.
>>
>> More details in this issue
>> https://github.com/serge-sans-paille/pythran/issues/1735
>>
>> Any help to understand what has changed would be greatly appreciated!
>>
>
> If you want to really dig into this, it would be good to do profiling
> to find out at where the differences are.
>
> Without that, I don't have much appetite to investigate personally. The
> reason is that fluctuations of ~30% (or even much more) when running
> the NumPy benchmarks are very common.
>
> I am not aware of an immediate change in NumPy, especially since you
> are talking pythran, and only the memory space or the interface code
> should matter.
> As to the interface code... I would expect it to be quite a bit faster,
> not slower.
> There was no change around data allocation, so at best what you are
> seeing is a different pattern in how the "small array cache" ends up
> being used.
>
>
> Unfortunately, getting stable benchmarks that reflect code changes
> exactly is tough...  Here is a nice blog post from Victor Stinner where
> he had to go as far as using "profile guided compilation" to avoid
> fluctuations:
>
> https://vstinner.github.io/journey-to-stable-benchmark-deadcode.html
>
> I somewhat hope that this is also the reason for the huge fluctuations
> we see in the NumPy benchmarks due to absolutely unrelated code
> changes.
> But I did not have the energy to try it (and a probably fixed bug in
> gcc makes it a bit harder right now).
>
> Cheers,
>
> Sebastian
>
>
>
>
>> Cheers,
>> Pierre
>> _______________________________________________
>> NumPy-Discussion mailing list
>> [hidden email]
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Looking for a difference between Numpy 0.19.5 and 0.20 explaining a perf regression with Pythran

Eric Firing
On 2021/03/12 1:33 PM, PIERRE AUGIER wrote:
> arr.copy() or np.copy(arr) do not give the same result, with arr obtained from a Pandas dataframe with arr = df.values. It's strange because type(df.values) gives <class 'numpy.ndarray'> so I would expect arr.copy() and np.copy(arr) to give exactly the same result.

According to the docstrings for numpy.copy and arr.copy, the function
and the method have different defaults for the memory layout.  np.copy()
tries to maintain the order of the original while arr.copy() defaults to
C order.

Eric
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Looking for a difference between Numpy 0.19.5 and 0.20 explaining a perf regression with Pythran

Sebastian Berg
In reply to this post by PIERRE AUGIER
On Sat, 2021-03-13 at 00:33 +0100, PIERRE AUGIER wrote:

> Hi,
>
> I tried to compile Numpy with `pip install numpy==1.20.1 --no-binary
> numpy --force-reinstall` and I can reproduce the regression.
>
> Good news, I was able to reproduce the difference with only Numpy
> 1.20.1.
>
> Arrays prepared with (`df` is a Pandas dataframe)
>
> arr = df.values.copy()
>
> or
>
> arr = np.ascontiguousarray(df.values)
>
> lead to "slow" execution while arrays prepared with
>
> arr = np.copy(df.values)
>
> lead to faster execution.
>
> arr.copy() or np.copy(arr) do not give the same result, with arr
> obtained from a Pandas dataframe with arr = df.values. It's strange
> because type(df.values) gives <class 'numpy.ndarray'> so I would
> expect arr.copy() and np.copy(arr) to give exactly the same result.
The only thing that can change would be the arrays flags and
`arr.strides`, but they should not have cahnged.  And there is no
change in NumPy that I can even remotely think of.  Array data is just
allocated with `malloc`.

That is: as I understand it, you are *not* timing `np.copy` or
`np.ascontiguouscopy` itself, but just operating on the array returned.
NumPy only ever uses `malloc` for allocating array content.

>
> Note that I think I'm doing quite serious and reproducible
> benchmarks. I also checked that this regression is reproducible on
> another computer.

I absolutely trust the benchmark results. I was hoping you might also
be running a profiler (as in analyze the running program) to find out
where the difference originate on the C side.  That would allow to say
with certainty either what changed or that there was no actual related
code change.

E.g. I have seen huge speed differences in the same `memcpy` or similar
calls, due to whatever reasons (maybe due to compiler changes, or due
to address space changes... or maybe the former causing the latter, I
don't know.).

Cheers,

Sebastian


>
> Cheers,
>
> Pierre
>
> ----- Mail original -----
> > De: "Sebastian Berg" <[hidden email]>
> > À: "numpy-discussion" <[hidden email]>
> > Envoyé: Vendredi 12 Mars 2021 22:50:24
> > Objet: Re: [Numpy-discussion] Looking for a difference between
> > Numpy 0.19.5 and 0.20 explaining a perf regression with
> > Pythran
>
> > On Fri, 2021-03-12 at 21:36 +0100, PIERRE AUGIER wrote:
> > > Hi,
> > >
> > > I'm looking for a difference between Numpy 0.19.5 and 0.20 which
> > > could explain a performance regression (~15 %) with Pythran.
> > >
> > > I observe this regression with the script
> > > https://github.com/paugier/nbabel/blob/master/py/bench.py
> > >
> > > Pythran reimplements Numpy so it is not about Numpy code for
> > > computation. However, Pythran of course uses the native array
> > > contained in a Numpy array. I'm quite sure that something has
> > > changed
> > > between Numpy 0.19.5 and 0.20 (or between the corresponding
> > > wheels?)
> > > since I don't get the same performance with Numpy 0.20. I checked
> > > that the values in the arrays are the same and that the flags
> > > characterizing the arrays are also the same.
> > >
> > > Good news, I'm now able to obtain the performance difference just
> > > with Numpy 0.19.5. In this code, I load the data with Pandas and
> > > need
> > > to prepare contiguous Numpy arrays to give them to Pythran. With
> > > Numpy 0.19.5, if I use np.copy I get better performance that with
> > > np.ascontiguousarray. With Numpy 0.20, both functions create
> > > array
> > > giving the same performance with Pythran (again, less good that
> > > with
> > > Numpy 0.19.5).
> > >
> > > Note that this code is very efficient (more that 100 times faster
> > > than using Numpy), so I guess that things like alignment or
> > > memory
> > > location can lead to such difference.
> > >
> > > More details in this issue
> > > https://github.com/serge-sans-paille/pythran/issues/1735
> > >
> > > Any help to understand what has changed would be greatly
> > > appreciated!
> > >
> >
> > If you want to really dig into this, it would be good to do
> > profiling
> > to find out at where the differences are.
> >
> > Without that, I don't have much appetite to investigate personally.
> > The
> > reason is that fluctuations of ~30% (or even much more) when
> > running
> > the NumPy benchmarks are very common.
> >
> > I am not aware of an immediate change in NumPy, especially since
> > you
> > are talking pythran, and only the memory space or the interface
> > code
> > should matter.
> > As to the interface code... I would expect it to be quite a bit
> > faster,
> > not slower.
> > There was no change around data allocation, so at best what you are
> > seeing is a different pattern in how the "small array cache" ends
> > up
> > being used.
> >
> >
> > Unfortunately, getting stable benchmarks that reflect code changes
> > exactly is tough...  Here is a nice blog post from Victor Stinner
> > where
> > he had to go as far as using "profile guided compilation" to avoid
> > fluctuations:
> >
> >  
> > https://vstinner.github.io/journey-to-stable-benchmark-deadcode.html
> >
> > I somewhat hope that this is also the reason for the huge
> > fluctuations
> > we see in the NumPy benchmarks due to absolutely unrelated code
> > changes.
> > But I did not have the energy to try it (and a probably fixed bug
> > in
> > gcc makes it a bit harder right now).
> >
> > Cheers,
> >
> > Sebastian
> >
> >
> >
> >
> > > Cheers,
> > > Pierre
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > [hidden email]
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > >
> >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > [hidden email]
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

size of arrays

klark--kent
Dear colleagues!
 
Size of np.float16(1) is 26
Size of np.float64(1) is 32
32 / 26 = 1.23
 
Since memory is limited I have a question after this code:
 
   import numpy as np
   import sys
 
   a1 = np.ones(1, dtype='float16')
   b1 = np.ones(1, dtype='float64')
   div_1 = sys.getsizeof(b1) / sys.getsizeof(a1)
   # div_1 = 1.06
 
   a2 = np.ones(10, dtype='float16')
   b2 = np.ones(10, dtype='float64')
   div_2 = sys.getsizeof(b2) / sys.getsizeof(a2)    
   # div_2 = 1.51
 
   a3 = np.ones(100, dtype='float16')
   b3 = np.ones(100, dtype='float64')
   div_3 = sys.getsizeof(b3) / sys.getsizeof(a3)
   # div_3 = 3.0
Size of np.float64 numpy arrays is four times more than for np.float16.
Is it possible to minimize the difference close to 1.23?

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: size of arrays

Todd
Ideally float64 uses 64 bits for each number while float16 uses 16 bits.  64/16=4.  However, there is some additional overhead.  This overhead makes up a large portion of small arrays, but becomes negligible as the array gets bigger.

On Sat, Mar 13, 2021, 16:01 <[hidden email]> wrote:
Dear colleagues!
 
Size of np.float16(1) is 26
Size of np.float64(1) is 32
32 / 26 = 1.23
 
Since memory is limited I have a question after this code:
 
   import numpy as np
   import sys
 
   a1 = np.ones(1, dtype='float16')
   b1 = np.ones(1, dtype='float64')
   div_1 = sys.getsizeof(b1) / sys.getsizeof(a1)
   # div_1 = 1.06
 
   a2 = np.ones(10, dtype='float16')
   b2 = np.ones(10, dtype='float64')
   div_2 = sys.getsizeof(b2) / sys.getsizeof(a2)    
   # div_2 = 1.51
 
   a3 = np.ones(100, dtype='float16')
   b3 = np.ones(100, dtype='float64')
   div_3 = sys.getsizeof(b3) / sys.getsizeof(a3)
   # div_3 = 3.0
Size of np.float64 numpy arrays is four times more than for np.float16.
Is it possible to minimize the difference close to 1.23?
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

image.png (31K) Download Attachment
image.png (31K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: size of arrays

Robert Kern-2
In reply to this post by klark--kent
On Sat, Mar 13, 2021 at 4:02 PM <[hidden email]> wrote:
Dear colleagues!
 
Size of np.float16(1) is 26
Size of np.float64(1) is 32
32 / 26 = 1.23

Note that `sys.getsizeof()` is returning the size of the given Python object in bytes. `np.float16(1)` and `np.float64(1)` are so-called "numpy scalar objects" that wrap up the raw `float16` (2 bytes) and `float64` (8 bytes) values with the necessary information to make them Python objects. The extra 24 bytes for each is _not_ present for each value when you have `float16` and `float64` arrays of larger lengths. There is still some overhead to make the array of numbers into a Python object, but this does not increase with the number of array elements. This is what you are seeing below when you compute the sizes of the Python objects that are the arrays. The fixed overhead does not increase when you increase the sizes of the arrays. They eventually approach the ideal ratio of 4: `float64` values take up 4 times as many bytes as `float16` values, as the names suggest. The ratio of 1.23 that you get from comparing the scalar objects reflects that the overhead for making a single value into a Python object takes up significantly more memory than the actual single number itself.

--
Robert Kern

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: size of arrays

klark--kent
In reply to this post by Todd
So is it right that 100 arrays of one element is smaller than one array with size of 100 elements?

14.03.2021, 00:06, "Todd" <[hidden email]>:
Ideally float64 uses 64 bits for each number while float16 uses 16 bits.  64/16=4.  However, there is some additional overhead.  This overhead makes up a large portion of small arrays, but becomes negligible as the array gets bigger.

On Sat, Mar 13, 2021, 16:01 <[hidden email]> wrote:
Dear colleagues!
 
Size of np.float16(1) is 26
Size of np.float64(1) is 32
32 / 26 = 1.23
 
Since memory is limited I have a question after this code:
 
   import numpy as np
   import sys
 
   a1 = np.ones(1, dtype='float16')
   b1 = np.ones(1, dtype='float64')
   div_1 = sys.getsizeof(b1) / sys.getsizeof(a1)
   # div_1 = 1.06
 
   a2 = np.ones(10, dtype='float16')
   b2 = np.ones(10, dtype='float64')
   div_2 = sys.getsizeof(b2) / sys.getsizeof(a2)    
   # div_2 = 1.51
 
   a3 = np.ones(100, dtype='float16')
   b3 = np.ones(100, dtype='float64')
   div_3 = sys.getsizeof(b3) / sys.getsizeof(a3)
   # div_3 = 3.0
Size of np.float64 numpy arrays is four times more than for np.float16.
Is it possible to minimize the difference close to 1.23?
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion





_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: size of arrays

Robert Kern-2
On Sat, Mar 13, 2021 at 4:18 PM <[hidden email]> wrote:
So is it right that 100 arrays of one element is smaller than one array with size of 100 elements?

No, typically the opposite is true.

--
Robert Kern

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: size of arrays

Todd
In reply to this post by klark--kent
No, because the array of 100 elements will only have the overhead once, while the 100 arrays will each have the overhead repeated.


Think about the overhead like a book cover on a book. It takes additional space, but provides storage for the book, information to help you find it, etc. Each book only needs one cover. So a single 100 page book only needs one cover, while a hundred 1 page books needs 100 covers. Also, as the book gets more pages the cover takes a smaller portion of the total size of the book.

On Sat, Mar 13, 2021, 16:17 <[hidden email]> wrote:
So is it right that 100 arrays of one element is smaller than one array with size of 100 elements?

14.03.2021, 00:06, "Todd" <[hidden email]>:
Ideally float64 uses 64 bits for each number while float16 uses 16 bits.  64/16=4.  However, there is some additional overhead.  This overhead makes up a large portion of small arrays, but becomes negligible as the array gets bigger.

On Sat, Mar 13, 2021, 16:01 <[hidden email]> wrote:
Dear colleagues!
 
Size of np.float16(1) is 26
Size of np.float64(1) is 32
32 / 26 = 1.23
 
Since memory is limited I have a question after this code:
 
   import numpy as np
   import sys
 
   a1 = np.ones(1, dtype='float16')
   b1 = np.ones(1, dtype='float64')
   div_1 = sys.getsizeof(b1) / sys.getsizeof(a1)
   # div_1 = 1.06
 
   a2 = np.ones(10, dtype='float16')
   b2 = np.ones(10, dtype='float64')
   div_2 = sys.getsizeof(b2) / sys.getsizeof(a2)    
   # div_2 = 1.51
 
   a3 = np.ones(100, dtype='float16')
   b3 = np.ones(100, dtype='float64')
   div_3 = sys.getsizeof(b3) / sys.getsizeof(a3)
   # div_3 = 3.0
Size of np.float64 numpy arrays is four times more than for np.float16.
Is it possible to minimize the difference close to 1.23?
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion




_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Looking for a difference between Numpy 0.19.5 and 0.20 explaining a perf regression with Pythran

Juan Nunez-Iglesias-2
In reply to this post by PIERRE AUGIER
Hi Pierre,

If you’re able to compile NumPy locally and you have reliable benchmarks, you can write a script that tests the runtime of your benchmark and reports it as a test pass/fail. You can then use “git bisect run” to automatically find the commit that caused the issue. That will help narrow down the discussion before it gets completely derailed a second time. 😂


Juan. 

On 13 Mar 2021, at 10:34 am, PIERRE AUGIER <[hidden email]> wrote:

Hi,

I tried to compile Numpy with `pip install numpy==1.20.1 --no-binary numpy --force-reinstall` and I can reproduce the regression.

Good news, I was able to reproduce the difference with only Numpy 1.20.1.

Arrays prepared with (`df` is a Pandas dataframe)

arr = df.values.copy()

or

arr = np.ascontiguousarray(df.values)

lead to "slow" execution while arrays prepared with

arr = np.copy(df.values)

lead to faster execution.

arr.copy() or np.copy(arr) do not give the same result, with arr obtained from a Pandas dataframe with arr = df.values. It's strange because type(df.values) gives <class 'numpy.ndarray'> so I would expect arr.copy() and np.copy(arr) to give exactly the same result.

Note that I think I'm doing quite serious and reproducible benchmarks. I also checked that this regression is reproducible on another computer.

Cheers,

Pierre

----- Mail original -----
De: "Sebastian Berg" <[hidden email]>
À: "numpy-discussion" <[hidden email]>
Envoyé: Vendredi 12 Mars 2021 22:50:24
Objet: Re: [Numpy-discussion] Looking for a difference between Numpy 0.19.5 and 0.20 explaining a perf regression with
Pythran

On Fri, 2021-03-12 at 21:36 +0100, PIERRE AUGIER wrote:
Hi,

I'm looking for a difference between Numpy 0.19.5 and 0.20 which
could explain a performance regression (~15 %) with Pythran.

I observe this regression with the script
https://github.com/paugier/nbabel/blob/master/py/bench.py

Pythran reimplements Numpy so it is not about Numpy code for
computation. However, Pythran of course uses the native array
contained in a Numpy array. I'm quite sure that something has changed
between Numpy 0.19.5 and 0.20 (or between the corresponding wheels?)
since I don't get the same performance with Numpy 0.20. I checked
that the values in the arrays are the same and that the flags
characterizing the arrays are also the same.

Good news, I'm now able to obtain the performance difference just
with Numpy 0.19.5. In this code, I load the data with Pandas and need
to prepare contiguous Numpy arrays to give them to Pythran. With
Numpy 0.19.5, if I use np.copy I get better performance that with
np.ascontiguousarray. With Numpy 0.20, both functions create array
giving the same performance with Pythran (again, less good that with
Numpy 0.19.5).

Note that this code is very efficient (more that 100 times faster
than using Numpy), so I guess that things like alignment or memory
location can lead to such difference.

More details in this issue
https://github.com/serge-sans-paille/pythran/issues/1735

Any help to understand what has changed would be greatly appreciated!


If you want to really dig into this, it would be good to do profiling
to find out at where the differences are.

Without that, I don't have much appetite to investigate personally. The
reason is that fluctuations of ~30% (or even much more) when running
the NumPy benchmarks are very common.

I am not aware of an immediate change in NumPy, especially since you
are talking pythran, and only the memory space or the interface code
should matter.
As to the interface code... I would expect it to be quite a bit faster,
not slower.
There was no change around data allocation, so at best what you are
seeing is a different pattern in how the "small array cache" ends up
being used.


Unfortunately, getting stable benchmarks that reflect code changes
exactly is tough...  Here is a nice blog post from Victor Stinner where
he had to go as far as using "profile guided compilation" to avoid
fluctuations:

https://vstinner.github.io/journey-to-stable-benchmark-deadcode.html

I somewhat hope that this is also the reason for the huge fluctuations
we see in the NumPy benchmarks due to absolutely unrelated code
changes.
But I did not have the energy to try it (and a probably fixed bug in
gcc makes it a bit harder right now).

Cheers,

Sebastian




Cheers,
Pierre
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion



_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Looking for a difference between Numpy 0.19.5 and 0.20 explaining a perf regression with Pythran

Sebastian Berg
On Sun, 2021-03-14 at 17:15 +1100, Juan Nunez-Iglesias wrote:

> Hi Pierre,
>
> If you’re able to compile NumPy locally and you have reliable
> benchmarks, you can write a script that tests the runtime of your
> benchmark and reports it as a test pass/fail. You can then use “git
> bisect run” to automatically find the commit that caused the issue.
> That will help narrow down the discussion before it gets completely
> derailed a second time. 😂
>
> https://lwn.net/Articles/317154/

Let me share this partial benchmark result for branch I just worked on
in NumPy:

       before           after         ratio
     [c5de5b5c]       [2d9e11ea]
     <main>           <splitup-faster-argparsing>
+     2.12±0.01μs      3.69±0.02μs     1.74  bench_io.Copy.time_cont_assign('float32')
+     22.6±0.08μs       36.0±0.2μs     1.59  bench_io.CopyTo.time_copyto_sparse
+      49.4±0.8μs       55.2±0.1μs     1.12  bench_io.CopyTo.time_copyto_8_sparse
-     7.40±0.06μs      4.11±0.01μs     0.56  bench_io.CopyTo.time_copyto_dense
-     6.99±0.05μs         3.77±0μs     0.54  bench_io.Copy.time_cont_assign('float64')
-     6.94±0.02μs      3.73±0.01μs     0.54  bench_io.Copy.time_cont_assign('complex64')


That looks weird!  The benchmark sometimes speeds up by a factor of
almost 2, and sometimes the (de-facto) same code slows down by just as
much? (Focus on the `time_cont_assign` with float64 vs. float32).

Even better: I know 100% that no related code is touched!  The core of
that benchmark is just:

     arrya[...] = 1

and I did not even come close to any code related to that operation.


I have, as I did before, tried quite a few things (not as much as in
Victor Stinner's blog when it comes to compiler flags).
Such as enabling/disabling huge-pages, disabling address-space-
randomization (and disabling the NumPy small-array cache).

Note that the results are *stable*, as in: On this branch, I get
extremely reliable results for the benchmark [1]!


As you noticed, I have also seen these (or similar) changes "toggle"
e.g. when copying the array multiple times.  And I have dug down into
profiling one instance on the instruction level with `perf` so I know
for a fact that it is memory access speed.  (Which is a no-brainer
here, the operations are obviously memory or even cache speed bound.)


The point I was hoping to make is: Its complicated, and I am not
holding my breath that you can find an answer without digging much
deeper.
The blog post from Victor Stinner gave me the thought that profile-
guided-optimization *might* be a way to avoid some random fluctuations,
but I have not checked that the inner-loop for the code actually
compiles to different byte-code.


I would hope that someone comes along and "just knows" what is going
on. But, I don't know where to ask or what to google for.

My best bets right now (they may be terrible!) are:

* Profiler guided optimization might help (as in stabilize compiler
output due to *random* changes in code).  Which probably is involved in
some way or another.  But Victor Stinner's timed Python and that may
not have any massively memory bound operations (which are the "big"
things here).

* Maybe try to make the NumPy allocator align all its allocation to
much larger boundaries, such as the CPU cache-line size.  But I think I
tried to check whether alignment seems to matter, and it didn't.  Also,
the arrays feel large enough that it shouldn't matter?

* CPU caching L1/L2 uses a lot of fancy heuristics these days. Maybe to
really understand whats going on, you would have to drill into what the
CPU caches are doing here?


The only thing I do know for sure currently, is that it is a rabbit
hole that I would love to understand, but don't really want to spend
days just to get nowhere.

Cheers,

Sebastian



[1] That run above is without address space randomization, it feels
even more stable than the others.  But that doesn't matter, since we
average in any case, so ASR is probably useless and maybe even
detrimental.


>
> Juan.
>
> > On 13 Mar 2021, at 10:34 am, PIERRE AUGIER <
> > [hidden email]> wrote:
> >
> > Hi,
> >
> > I tried to compile Numpy with `pip install numpy==1.20.1 --no-
> > binary numpy --force-reinstall` and I can reproduce the regression.
> >
> > Good news, I was able to reproduce the difference with only Numpy
> > 1.20.1.
> >
> > Arrays prepared with (`df` is a Pandas dataframe)
> >
> > arr = df.values.copy()
> >
> > or
> >
> > arr = np.ascontiguousarray(df.values)
> >
> > lead to "slow" execution while arrays prepared with
> >
> > arr = np.copy(df.values)
> >
> > lead to faster execution.
> >
> > arr.copy() or np.copy(arr) do not give the same result, with arr
> > obtained from a Pandas dataframe with arr = df.values. It's strange
> > because type(df.values) gives <class 'numpy.ndarray'> so I would
> > expect arr.copy() and np.copy(arr) to give exactly the same result.
> >
> > Note that I think I'm doing quite serious and reproducible
> > benchmarks. I also checked that this regression is reproducible on
> > another computer.
> >
> > Cheers,
> >
> > Pierre
> >
> > ----- Mail original -----
> > > De: "Sebastian Berg" <[hidden email]>
> > > À: "numpy-discussion" <[hidden email]>
> > > Envoyé: Vendredi 12 Mars 2021 22:50:24
> > > Objet: Re: [Numpy-discussion] Looking for a difference between
> > > Numpy 0.19.5 and 0.20 explaining a perf regression with
> > > Pythran
> >
> > > > On Fri, 2021-03-12 at 21:36 +0100, PIERRE AUGIER wrote:
> > > > Hi,
> > > >
> > > > I'm looking for a difference between Numpy 0.19.5 and 0.20
> > > > which
> > > > could explain a performance regression (~15 %) with Pythran.
> > > >
> > > > I observe this regression with the script
> > > > https://github.com/paugier/nbabel/blob/master/py/bench.py
> > > >
> > > > Pythran reimplements Numpy so it is not about Numpy code for
> > > > computation. However, Pythran of course uses the native array
> > > > contained in a Numpy array. I'm quite sure that something has
> > > > changed
> > > > between Numpy 0.19.5 and 0.20 (or between the corresponding
> > > > wheels?)
> > > > since I don't get the same performance with Numpy 0.20. I
> > > > checked
> > > > that the values in the arrays are the same and that the flags
> > > > characterizing the arrays are also the same.
> > > >
> > > > Good news, I'm now able to obtain the performance difference
> > > > just
> > > > with Numpy 0.19.5. In this code, I load the data with Pandas
> > > > and need
> > > > to prepare contiguous Numpy arrays to give them to Pythran.
> > > > With
> > > > Numpy 0.19.5, if I use np.copy I get better performance that
> > > > with
> > > > np.ascontiguousarray. With Numpy 0.20, both functions create
> > > > array
> > > > giving the same performance with Pythran (again, less good that
> > > > with
> > > > Numpy 0.19.5).
> > > >
> > > > Note that this code is very efficient (more that 100 times
> > > > faster
> > > > than using Numpy), so I guess that things like alignment or
> > > > memory
> > > > location can lead to such difference.
> > > >
> > > > More details in this issue
> > > > https://github.com/serge-sans-paille/pythran/issues/1735
> > > >
> > > > Any help to understand what has changed would be greatly
> > > > appreciated!
> > > >
> > >
> > > If you want to really dig into this, it would be good to do
> > > profiling
> > > to find out at where the differences are.
> > >
> > > Without that, I don't have much appetite to investigate
> > > personally. The
> > > reason is that fluctuations of ~30% (or even much more) when
> > > running
> > > the NumPy benchmarks are very common.
> > >
> > > I am not aware of an immediate change in NumPy, especially since
> > > you
> > > are talking pythran, and only the memory space or the interface
> > > code
> > > should matter.
> > > As to the interface code... I would expect it to be quite a bit
> > > faster,
> > > not slower.
> > > There was no change around data allocation, so at best what you
> > > are
> > > seeing is a different pattern in how the "small array cache" ends
> > > up
> > > being used.
> > >
> > >
> > > Unfortunately, getting stable benchmarks that reflect code
> > > changes
> > > exactly is tough...  Here is a nice blog post from Victor Stinner
> > > where
> > > he had to go as far as using "profile guided compilation" to
> > > avoid
> > > fluctuations:
> > >
> > > https://vstinner.github.io/journey-to-stable-benchmark-deadcode.html
> > >
> > > I somewhat hope that this is also the reason for the huge
> > > fluctuations
> > > we see in the NumPy benchmarks due to absolutely unrelated code
> > > changes.
> > > But I did not have the energy to try it (and a probably fixed bug
> > > in
> > > gcc makes it a bit harder right now).
> > >
> > > Cheers,
> > >
> > > Sebastian
> > >
> > >
> > >
> > >
> > > > Cheers,
> > > > Pierre
> > > > _______________________________________________
> > > > NumPy-Discussion mailing list
> > > > [hidden email]
> > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > >
> > >
> > >
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > [hidden email]
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > _______________________________________________
> > NumPy-Discussion mailing list
> > [hidden email]
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Perf regression with Pythran between Numpy 0.19.5 and 0.20 (commit 4cd6e4b336fbc68d88c0e9bc45a435ce7b721f1f, ENH: implement NEP-35's `like=` argument)

PIERRE AUGIER
In reply to this post by Juan Nunez-Iglesias-2

----- Mail original -----
> De: "Juan Nunez-Iglesias" <[hidden email]>
> À: "numpy-discussion" <[hidden email]>
> Envoyé: Dimanche 14 Mars 2021 07:15:39
> Objet: Re: [Numpy-discussion] Looking for a difference between Numpy 0.19.5 and 0.20 explaining a perf regression with
> Pythran

> Hi Pierre,
>
> If you’re able to compile NumPy locally and you have reliable benchmarks, you
> can write a script that tests the runtime of your benchmark and reports it as a
> test pass/fail. You can then use “git bisect run” to automatically find the
> commit that caused the issue. That will help narrow down the discussion before
> it gets completely derailed a second time. 😂
>
> [ https://lwn.net/Articles/317154/ | https://lwn.net/Articles/317154/ ]
>
> Juan.

Thanks a lot for this advice Juan! I wasn't able to use Git but with `hg bisect` I managed to find that the first "bad" commit is

https://github.com/numpy/numpy/commit/4cd6e4b336fbc68d88c0e9bc45a435ce7b721f1f   ENH: implement NEP-35's `like=` argument (gh-16935)

From the point of view of my benchmark, this commit changes the behavior of arr.copy() (the resulting arrays do not give to the same performance). This makes sense because it is indeed about the array creation.

I haven't yet studied in details this commit (which is quite big and not simple) and I'm not sure I'm going to be able to understand it and in particular understand why it leads to such performance regression!

Cheers,

Pierre

>
>
> On 13 Mar 2021, at 10:34 am, PIERRE AUGIER
> <[hidden email]> wrote:
>
>
>
>
> Hi,
>
> I tried to compile Numpy with `pip install numpy==1.20.1 --no-binary numpy
> --force-reinstall` and I can reproduce the regression.
>
> Good news, I was able to reproduce the difference with only Numpy 1.20.1.
>
> Arrays prepared with (`df` is a Pandas dataframe)
>
> arr = df.values.copy()
>
> or
>
> arr = np.ascontiguousarray(df.values)
>
> lead to "slow" execution while arrays prepared with
>
> arr = np.copy(df.values)
>
> lead to faster execution.
>
> arr.copy() or np.copy(arr) do not give the same result, with arr obtained from a
> Pandas dataframe with arr = df.values. It's strange because type(df.values)
> gives <class 'numpy.ndarray'> so I would expect arr.copy() and np.copy(arr) to
> give exactly the same result.
>
> Note that I think I'm doing quite serious and reproducible benchmarks. I also
> checked that this regression is reproducible on another computer.
>
> Cheers,
>
> Pierre
>
> ----- Mail original -----
>
>
> De: "Sebastian Berg" <[hidden email]>
>
>
> À: "numpy-discussion" <[hidden email]>
>
>
> Envoyé: Vendredi 12 Mars 2021 22:50:24
>
>
> Objet: Re: [Numpy-discussion] Looking for a difference between Numpy 0.19.5 and
> 0.20 explaining a perf regression with
>
>
> Pythran
>
>
>
> On Fri, 2021-03-12 at 21:36 +0100, PIERRE AUGIER wrote:
>
>
>
>
> Hi,
>
>
>
>
>
>
>
>
>
> I'm looking for a difference between Numpy 0.19.5 and 0.20 which
>
>
>
>
> could explain a performance regression (~15 %) with Pythran.
>
>
>
>
>
>
>
>
>
> I observe this regression with the script
>
>
>
>
> https://github.com/paugier/nbabel/blob/master/py/bench.py
>
>
>
>
>
>
>
>
>
> Pythran reimplements Numpy so it is not about Numpy code for
>
>
>
>
> computation. However, Pythran of course uses the native array
>
>
>
>
> contained in a Numpy array. I'm quite sure that something has changed
>
>
>
>
> between Numpy 0.19.5 and 0.20 (or between the corresponding wheels?)
>
>
>
>
> since I don't get the same performance with Numpy 0.20. I checked
>
>
>
>
> that the values in the arrays are the same and that the flags
>
>
>
>
> characterizing the arrays are also the same.
>
>
>
>
>
>
>
>
>
> Good news, I'm now able to obtain the performance difference just
>
>
>
>
> with Numpy 0.19.5. In this code, I load the data with Pandas and need
>
>
>
>
> to prepare contiguous Numpy arrays to give them to Pythran. With
>
>
>
>
> Numpy 0.19.5, if I use np.copy I get better performance that with
>
>
>
>
> np.ascontiguousarray. With Numpy 0.20, both functions create array
>
>
>
>
> giving the same performance with Pythran (again, less good that with
>
>
>
>
> Numpy 0.19.5).
>
>
>
>
>
>
>
>
>
> Note that this code is very efficient (more that 100 times faster
>
>
>
>
> than using Numpy), so I guess that things like alignment or memory
>
>
>
>
> location can lead to such difference.
>
>
>
>
>
>
>
>
>
> More details in this issue
>
>
>
>
> https://github.com/serge-sans-paille/pythran/issues/1735
>
>
>
>
>
>
>
>
>
> Any help to understand what has changed would be greatly appreciated!
>
>
>
>
>
>
>
>
>
>
> If you want to really dig into this, it would be good to do profiling
>
>
> to find out at where the differences are.
>
>
>
>
>
> Without that, I don't have much appetite to investigate personally. The
>
>
> reason is that fluctuations of ~30% (or even much more) when running
>
>
> the NumPy benchmarks are very common.
>
>
>
>
>
> I am not aware of an immediate change in NumPy, especially since you
>
>
> are talking pythran, and only the memory space or the interface code
>
>
> should matter.
>
>
> As to the interface code... I would expect it to be quite a bit faster,
>
>
> not slower.
>
>
> There was no change around data allocation, so at best what you are
>
>
> seeing is a different pattern in how the "small array cache" ends up
>
>
> being used.
>
>
>
>
>
>
>
>
> Unfortunately, getting stable benchmarks that reflect code changes
>
>
> exactly is tough... Here is a nice blog post from Victor Stinner where
>
>
> he had to go as far as using "profile guided compilation" to avoid
>
>
> fluctuations:
>
>
>
>
>
> https://vstinner.github.io/journey-to-stable-benchmark-deadcode.html
>
>
>
>
>
> I somewhat hope that this is also the reason for the huge fluctuations
>
>
> we see in the NumPy benchmarks due to absolutely unrelated code
>
>
> changes.
>
>
> But I did not have the energy to try it (and a probably fixed bug in
>
>
> gcc makes it a bit harder right now).
>
>
>
>
>
> Cheers,
>
>
>
>
>
> Sebastian
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Cheers,
>
>
>
>
> Pierre
>
>
>
>
> _______________________________________________
>
>
>
>
> NumPy-Discussion mailing list
>
>
>
>
> [hidden email]
>
>
>
>
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
>
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
>
>
> NumPy-Discussion mailing list
>
>
> [hidden email]
>
>
> https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Perf regression with Pythran between Numpy 0.19.5 and 0.20 (commit 4cd6e4b336fbc68d88c0e9bc45a435ce7b721f1f, ENH: implement NEP-35's `like=` argument)

Peter Andreas Entschev
Hi Pierre,

Thanks for pinging me. To put it in the simplest way possible, that PR
adds a new `like` kwarg that will dispatch to downstream libraries
using `__array_function__` when specified, otherwise fallback to the
default behavior of NumPy. While that introduces an extra check on the
C side, that should have minimal impact for use cases that don't use
the `like` kwarg.

Is there a simple reproducer with NumPy only? I assume your case with
Pandas is much more complex (unfortunately I'm not very experienced
with DataFrames), but curiously I see NumPy 1.20.1 being considerably
faster for small arrays and mildly-faster with large arrays (results
in https://gist.github.com/pentschev/add38b5aee61da87b4b70a1c4649861f)
.

Best,
Peter



On Mon, Mar 15, 2021 at 12:29 PM PIERRE AUGIER
<[hidden email]> wrote:

>
>
> ----- Mail original -----
> > De: "Juan Nunez-Iglesias" <[hidden email]>
> > À: "numpy-discussion" <[hidden email]>
> > Envoyé: Dimanche 14 Mars 2021 07:15:39
> > Objet: Re: [Numpy-discussion] Looking for a difference between Numpy 0.19.5 and 0.20 explaining a perf regression with
> > Pythran
>
> > Hi Pierre,
> >
> > If you’re able to compile NumPy locally and you have reliable benchmarks, you
> > can write a script that tests the runtime of your benchmark and reports it as a
> > test pass/fail. You can then use “git bisect run” to automatically find the
> > commit that caused the issue. That will help narrow down the discussion before
> > it gets completely derailed a second time.
> >
> > [ https://lwn.net/Articles/317154/ | https://lwn.net/Articles/317154/ ]
> >
> > Juan.
>
> Thanks a lot for this advice Juan! I wasn't able to use Git but with `hg bisect` I managed to find that the first "bad" commit is
>
> https://github.com/numpy/numpy/commit/4cd6e4b336fbc68d88c0e9bc45a435ce7b721f1f   ENH: implement NEP-35's `like=` argument (gh-16935)
>
> From the point of view of my benchmark, this commit changes the behavior of arr.copy() (the resulting arrays do not give to the same performance). This makes sense because it is indeed about the array creation.
>
> I haven't yet studied in details this commit (which is quite big and not simple) and I'm not sure I'm going to be able to understand it and in particular understand why it leads to such performance regression!
>
> Cheers,
>
> Pierre
>
> >
> >
> > On 13 Mar 2021, at 10:34 am, PIERRE AUGIER
> > <[hidden email]> wrote:
> >
> >
> >
> >
> > Hi,
> >
> > I tried to compile Numpy with `pip install numpy==1.20.1 --no-binary numpy
> > --force-reinstall` and I can reproduce the regression.
> >
> > Good news, I was able to reproduce the difference with only Numpy 1.20.1.
> >
> > Arrays prepared with (`df` is a Pandas dataframe)
> >
> > arr = df.values.copy()
> >
> > or
> >
> > arr = np.ascontiguousarray(df.values)
> >
> > lead to "slow" execution while arrays prepared with
> >
> > arr = np.copy(df.values)
> >
> > lead to faster execution.
> >
> > arr.copy() or np.copy(arr) do not give the same result, with arr obtained from a
> > Pandas dataframe with arr = df.values. It's strange because type(df.values)
> > gives <class 'numpy.ndarray'> so I would expect arr.copy() and np.copy(arr) to
> > give exactly the same result.
> >
> > Note that I think I'm doing quite serious and reproducible benchmarks. I also
> > checked that this regression is reproducible on another computer.
> >
> > Cheers,
> >
> > Pierre
> >
> > ----- Mail original -----
> >
> >
> > De: "Sebastian Berg" <[hidden email]>
> >
> >
> > À: "numpy-discussion" <[hidden email]>
> >
> >
> > Envoyé: Vendredi 12 Mars 2021 22:50:24
> >
> >
> > Objet: Re: [Numpy-discussion] Looking for a difference between Numpy 0.19.5 and
> > 0.20 explaining a perf regression with
> >
> >
> > Pythran
> >
> >
> >
> > On Fri, 2021-03-12 at 21:36 +0100, PIERRE AUGIER wrote:
> >
> >
> >
> >
> > Hi,
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > I'm looking for a difference between Numpy 0.19.5 and 0.20 which
> >
> >
> >
> >
> > could explain a performance regression (~15 %) with Pythran.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > I observe this regression with the script
> >
> >
> >
> >
> > https://github.com/paugier/nbabel/blob/master/py/bench.py
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Pythran reimplements Numpy so it is not about Numpy code for
> >
> >
> >
> >
> > computation. However, Pythran of course uses the native array
> >
> >
> >
> >
> > contained in a Numpy array. I'm quite sure that something has changed
> >
> >
> >
> >
> > between Numpy 0.19.5 and 0.20 (or between the corresponding wheels?)
> >
> >
> >
> >
> > since I don't get the same performance with Numpy 0.20. I checked
> >
> >
> >
> >
> > that the values in the arrays are the same and that the flags
> >
> >
> >
> >
> > characterizing the arrays are also the same.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Good news, I'm now able to obtain the performance difference just
> >
> >
> >
> >
> > with Numpy 0.19.5. In this code, I load the data with Pandas and need
> >
> >
> >
> >
> > to prepare contiguous Numpy arrays to give them to Pythran. With
> >
> >
> >
> >
> > Numpy 0.19.5, if I use np.copy I get better performance that with
> >
> >
> >
> >
> > np.ascontiguousarray. With Numpy 0.20, both functions create array
> >
> >
> >
> >
> > giving the same performance with Pythran (again, less good that with
> >
> >
> >
> >
> > Numpy 0.19.5).
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Note that this code is very efficient (more that 100 times faster
> >
> >
> >
> >
> > than using Numpy), so I guess that things like alignment or memory
> >
> >
> >
> >
> > location can lead to such difference.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > More details in this issue
> >
> >
> >
> >
> > https://github.com/serge-sans-paille/pythran/issues/1735
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Any help to understand what has changed would be greatly appreciated!
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > If you want to really dig into this, it would be good to do profiling
> >
> >
> > to find out at where the differences are.
> >
> >
> >
> >
> >
> > Without that, I don't have much appetite to investigate personally. The
> >
> >
> > reason is that fluctuations of ~30% (or even much more) when running
> >
> >
> > the NumPy benchmarks are very common.
> >
> >
> >
> >
> >
> > I am not aware of an immediate change in NumPy, especially since you
> >
> >
> > are talking pythran, and only the memory space or the interface code
> >
> >
> > should matter.
> >
> >
> > As to the interface code... I would expect it to be quite a bit faster,
> >
> >
> > not slower.
> >
> >
> > There was no change around data allocation, so at best what you are
> >
> >
> > seeing is a different pattern in how the "small array cache" ends up
> >
> >
> > being used.
> >
> >
> >
> >
> >
> >
> >
> >
> > Unfortunately, getting stable benchmarks that reflect code changes
> >
> >
> > exactly is tough... Here is a nice blog post from Victor Stinner where
> >
> >
> > he had to go as far as using "profile guided compilation" to avoid
> >
> >
> > fluctuations:
> >
> >
> >
> >
> >
> > https://vstinner.github.io/journey-to-stable-benchmark-deadcode.html
> >
> >
> >
> >
> >
> > I somewhat hope that this is also the reason for the huge fluctuations
> >
> >
> > we see in the NumPy benchmarks due to absolutely unrelated code
> >
> >
> > changes.
> >
> >
> > But I did not have the energy to try it (and a probably fixed bug in
> >
> >
> > gcc makes it a bit harder right now).
> >
> >
> >
> >
> >
> > Cheers,
> >
> >
> >
> >
> >
> > Sebastian
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Cheers,
> >
> >
> >
> >
> > Pierre
> >
> >
> >
> >
> > _______________________________________________
> >
> >
> >
> >
> > NumPy-Discussion mailing list
> >
> >
> >
> >
> > [hidden email]
> >
> >
> >
> >
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> >
> >
> > NumPy-Discussion mailing list
> >
> >
> > [hidden email]
> >
> >
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> > _______________________________________________
> > NumPy-Discussion mailing list
> > [hidden email]
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > [hidden email]
> > https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Perf regression with Pythran between Numpy 0.19.5 and 0.20 (commit 4cd6e4b336fbc68d88c0e9bc45a435ce7b721f1f, ENH: implement NEP-35's `like=` argument)

Sebastian Berg
On Mon, 2021-03-15 at 14:59 +0100, Peter Andreas Entschev wrote:

> Hi Pierre,
>
> Thanks for pinging me. To put it in the simplest way possible, that
> PR
> adds a new `like` kwarg that will dispatch to downstream libraries
> using `__array_function__` when specified, otherwise fallback to the
> default behavior of NumPy. While that introduces an extra check on
> the
> C side, that should have minimal impact for use cases that don't use
> the `like` kwarg.
>
> Is there a simple reproducer with NumPy only? I assume your case with
> Pandas is much more complex (unfortunately I'm not very experienced
> with DataFrames), but curiously I see NumPy 1.20.1 being considerably
> faster for small arrays and mildly-faster with large arrays (results
> in
> https://gist.github.com/pentschev/add38b5aee61da87b4b70a1c4649861f)
> .

1.20.1 should have some small overhead reductions there since array the
array-object life-cycle is probably around 30% faster (deleting an
array is faster).  But the array-object life-cycle is pretty
insignificant aside from creating views.
There are also many performance improvements around SIMD, which will
affect certain math operations.

The changes on that PR may add additional overhead to array creation
(something that should go away again in 1.21 and end up being much
faster when https://github.com/numpy/numpy/pull/15270 goes in).  But
that is all.


As much as I would love to have an answer, looking for changes in the
NumPy code seems to me unlikely to get you anywhere. Another example,
check out this benchmark from the NumPy benchmarks:



https://pv.github.io/numpy-bench/index.html#bench_reduce.AddReduceSeparate.time_reduce?cpu=Intel(R)%20Core(TM)%20i7%20CPU%20920%20%40%202.67GHz&machine=i7&os=Linux&ram=16416652&Cython=0.29.21&p-axis=1&p-type='int16'&p-type='int32'

It keeps jumping back and forth around 30% for the 'int16' version, but
the 'int32' one is pretty much stable, so its unlikely to be just bad
benchmarking.

Right now, I am willing to get that if you repeat that whole thing with
a different commit range, you will find another random bad commit.

Cheers,

Sebastian


>
> Best,
> Peter
>
>
>
> On Mon, Mar 15, 2021 at 12:29 PM PIERRE AUGIER
> <[hidden email]> wrote:
> >
> >
> > ----- Mail original -----
> > > De: "Juan Nunez-Iglesias" <[hidden email]>
> > > À: "numpy-discussion" <[hidden email]>
> > > Envoyé: Dimanche 14 Mars 2021 07:15:39
> > > Objet: Re: [Numpy-discussion] Looking for a difference between
> > > Numpy 0.19.5 and 0.20 explaining a perf regression with
> > > Pythran
> >
> > > Hi Pierre,
> > >
> > > If you’re able to compile NumPy locally and you have reliable
> > > benchmarks, you
> > > can write a script that tests the runtime of your benchmark and
> > > reports it as a
> > > test pass/fail. You can then use “git bisect run” to
> > > automatically find the
> > > commit that caused the issue. That will help narrow down the
> > > discussion before
> > > it gets completely derailed a second time.
> > >
> > > [ https://lwn.net/Articles/317154/ |
> > > https://lwn.net/Articles/317154/ ]
> > >
> > > Juan.
> >
> > Thanks a lot for this advice Juan! I wasn't able to use Git but
> > with `hg bisect` I managed to find that the first "bad" commit is
> >
> > https://github.com/numpy/numpy/commit/4cd6e4b336fbc68d88c0e9bc45a435ce7b721f1f
> >    ENH: implement NEP-35's `like=` argument (gh-16935)
> >
> > From the point of view of my benchmark, this commit changes the
> > behavior of arr.copy() (the resulting arrays do not give to the
> > same performance). This makes sense because it is indeed about the
> > array creation.
> >
> > I haven't yet studied in details this commit (which is quite big
> > and not simple) and I'm not sure I'm going to be able to understand
> > it and in particular understand why it leads to such performance
> > regression!
> >
> > Cheers,
> >
> > Pierre
> >
> > >
> > >
> > > On 13 Mar 2021, at 10:34 am, PIERRE AUGIER
> > > <[hidden email]> wrote:
> > >
> > >
> > >
> > >
> > > Hi,
> > >
> > > I tried to compile Numpy with `pip install numpy==1.20.1 --no-
> > > binary numpy
> > > --force-reinstall` and I can reproduce the regression.
> > >
> > > Good news, I was able to reproduce the difference with only Numpy
> > > 1.20.1.
> > >
> > > Arrays prepared with (`df` is a Pandas dataframe)
> > >
> > > arr = df.values.copy()
> > >
> > > or
> > >
> > > arr = np.ascontiguousarray(df.values)
> > >
> > > lead to "slow" execution while arrays prepared with
> > >
> > > arr = np.copy(df.values)
> > >
> > > lead to faster execution.
> > >
> > > arr.copy() or np.copy(arr) do not give the same result, with arr
> > > obtained from a
> > > Pandas dataframe with arr = df.values. It's strange because
> > > type(df.values)
> > > gives <class 'numpy.ndarray'> so I would expect arr.copy() and
> > > np.copy(arr) to
> > > give exactly the same result.
> > >
> > > Note that I think I'm doing quite serious and reproducible
> > > benchmarks. I also
> > > checked that this regression is reproducible on another computer.
> > >
> > > Cheers,
> > >
> > > Pierre
> > >
> > > ----- Mail original -----
> > >
> > >
> > > De: "Sebastian Berg" <[hidden email]>
> > >
> > >
> > > À: "numpy-discussion" <[hidden email]>
> > >
> > >
> > > Envoyé: Vendredi 12 Mars 2021 22:50:24
> > >
> > >
> > > Objet: Re: [Numpy-discussion] Looking for a difference between
> > > Numpy 0.19.5 and
> > > 0.20 explaining a perf regression with
> > >
> > >
> > > Pythran
> > >
> > >
> > >
> > > On Fri, 2021-03-12 at 21:36 +0100, PIERRE AUGIER wrote:
> > >
> > >
> > >
> > >
> > > Hi,
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > I'm looking for a difference between Numpy 0.19.5 and 0.20 which
> > >
> > >
> > >
> > >
> > > could explain a performance regression (~15 %) with Pythran.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > I observe this regression with the script
> > >
> > >
> > >
> > >
> > > https://github.com/paugier/nbabel/blob/master/py/bench.py
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > Pythran reimplements Numpy so it is not about Numpy code for
> > >
> > >
> > >
> > >
> > > computation. However, Pythran of course uses the native array
> > >
> > >
> > >
> > >
> > > contained in a Numpy array. I'm quite sure that something has
> > > changed
> > >
> > >
> > >
> > >
> > > between Numpy 0.19.5 and 0.20 (or between the corresponding
> > > wheels?)
> > >
> > >
> > >
> > >
> > > since I don't get the same performance with Numpy 0.20. I checked
> > >
> > >
> > >
> > >
> > > that the values in the arrays are the same and that the flags
> > >
> > >
> > >
> > >
> > > characterizing the arrays are also the same.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > Good news, I'm now able to obtain the performance difference just
> > >
> > >
> > >
> > >
> > > with Numpy 0.19.5. In this code, I load the data with Pandas and
> > > need
> > >
> > >
> > >
> > >
> > > to prepare contiguous Numpy arrays to give them to Pythran. With
> > >
> > >
> > >
> > >
> > > Numpy 0.19.5, if I use np.copy I get better performance that with
> > >
> > >
> > >
> > >
> > > np.ascontiguousarray. With Numpy 0.20, both functions create
> > > array
> > >
> > >
> > >
> > >
> > > giving the same performance with Pythran (again, less good that
> > > with
> > >
> > >
> > >
> > >
> > > Numpy 0.19.5).
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > Note that this code is very efficient (more that 100 times faster
> > >
> > >
> > >
> > >
> > > than using Numpy), so I guess that things like alignment or
> > > memory
> > >
> > >
> > >
> > >
> > > location can lead to such difference.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > More details in this issue
> > >
> > >
> > >
> > >
> > > https://github.com/serge-sans-paille/pythran/issues/1735
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > Any help to understand what has changed would be greatly
> > > appreciated!
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > If you want to really dig into this, it would be good to do
> > > profiling
> > >
> > >
> > > to find out at where the differences are.
> > >
> > >
> > >
> > >
> > >
> > > Without that, I don't have much appetite to investigate
> > > personally. The
> > >
> > >
> > > reason is that fluctuations of ~30% (or even much more) when
> > > running
> > >
> > >
> > > the NumPy benchmarks are very common.
> > >
> > >
> > >
> > >
> > >
> > > I am not aware of an immediate change in NumPy, especially since
> > > you
> > >
> > >
> > > are talking pythran, and only the memory space or the interface
> > > code
> > >
> > >
> > > should matter.
> > >
> > >
> > > As to the interface code... I would expect it to be quite a bit
> > > faster,
> > >
> > >
> > > not slower.
> > >
> > >
> > > There was no change around data allocation, so at best what you
> > > are
> > >
> > >
> > > seeing is a different pattern in how the "small array cache" ends
> > > up
> > >
> > >
> > > being used.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > Unfortunately, getting stable benchmarks that reflect code
> > > changes
> > >
> > >
> > > exactly is tough... Here is a nice blog post from Victor Stinner
> > > where
> > >
> > >
> > > he had to go as far as using "profile guided compilation" to
> > > avoid
> > >
> > >
> > > fluctuations:
> > >
> > >
> > >
> > >
> > >
> > > https://vstinner.github.io/journey-to-stable-benchmark-deadcode.html
> > >
> > >
> > >
> > >
> > >
> > > I somewhat hope that this is also the reason for the huge
> > > fluctuations
> > >
> > >
> > > we see in the NumPy benchmarks due to absolutely unrelated code
> > >
> > >
> > > changes.
> > >
> > >
> > > But I did not have the energy to try it (and a probably fixed bug
> > > in
> > >
> > >
> > > gcc makes it a bit harder right now).
> > >
> > >
> > >
> > >
> > >
> > > Cheers,
> > >
> > >
> > >
> > >
> > >
> > > Sebastian
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > Cheers,
> > >
> > >
> > >
> > >
> > > Pierre
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > >
> > >
> > >
> > >
> > > NumPy-Discussion mailing list
> > >
> > >
> > >
> > >
> > > [hidden email]
> > >
> > >
> > >
> > >
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > >
> > >
> > > NumPy-Discussion mailing list
> > >
> > >
> > > [hidden email]
> > >
> > >
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > [hidden email]
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > >
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > [hidden email]
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (849 bytes) Download Attachment