Linking Numpy with parallel OpenBLAS

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Linking Numpy with parallel OpenBLAS

Daπid
I have installed all the OpenBLAS versions availables at the Fedora repos, that include openMP and pthreads versions. But Numpy installed by pip on a virtualenv seems to only link to the serial version. Is there a way to convince it to use the parallel one?

Here are my libraries:

(py27)[david@SQUIDS lib64]$ ls libopenblas*
libopenblas64.a            libopenblaso64.so.0        libopenblasp64.so.0
libopenblas64-r0.2.14.so   libopenblaso.a             libopenblasp.a
libopenblas64.so           libopenblaso-r0.2.14.so    libopenblasp-r0.2.14.so
libopenblas64.so.0         libopenblaso.so            libopenblasp.so
libopenblas.a              libopenblaso.so.0          libopenblasp.so.0
libopenblaso64.a           libopenblasp64.a           libopenblas-r0.2.14.so
libopenblaso64-r0.2.14.so  libopenblasp64-r0.2.14.so  libopenblas.so
libopenblaso64.so          libopenblasp64.so          libopenblas.so.0

And importing numpy shows that the serial is the only one open:

(py27)[david@SQUIDS lib64]$ lsof libopenbl*
lsof: WARNING: can't stat() tracefs file system /sys/kernel/debug/tracing
      Output information may be incomplete.
COMMAND   PID  USER  FD   TYPE DEVICE SIZE/OFF    NODE NAME
ipython  2355 david mem    REG    8,2 32088056 2372346 libopenblas-r0.2.14.so


This is the output of np.show_config():

lapack_opt_info:
    libraries = ['openblas']
    library_dirs = ['/usr/lib64']
    define_macros = [('HAVE_CBLAS', None)]
    language = c
blas_opt_info:
    libraries = ['openblas']
    library_dirs = ['/usr/lib64']
    define_macros = [('HAVE_CBLAS', None)]
    language = c
openblas_info:
    libraries = ['openblas']
    library_dirs = ['/usr/lib64']
    define_macros = [('HAVE_CBLAS', None)]
    language = c
openblas_lapack_info:
    libraries = ['openblas']
    library_dirs = ['/usr/lib64']
    define_macros = [('HAVE_CBLAS', None)]
    language = c
blas_mkl_info:
  NOT AVAILABLE


Thanks,


/David.

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Linking Numpy with parallel OpenBLAS

Julian Taylor-3
should be possible by putting this into: ~/.numpy-site.cfg

[openblas]
libraries = openblasp

LD_PRELOAD the file should also work.


On 29.10.2015 18:25, Daπid wrote:

> I have installed all the OpenBLAS versions availables at the Fedora
> repos, that include openMP and pthreads versions. But Numpy installed by
> pip on a virtualenv seems to only link to the serial version. Is there a
> way to convince it to use the parallel one?
>
> Here are my libraries:
>
> (py27)[david@SQUIDS lib64]$ ls libopenblas*
> libopenblas64.a            libopenblaso64.so.0        libopenblasp64.so.0
> libopenblas64-r0.2.14.so <http://libopenblas64-r0.2.14.so>  
> libopenblaso.a             libopenblasp.a
> libopenblas64.so           libopenblaso-r0.2.14.so
> <http://libopenblaso-r0.2.14.so>    libopenblasp-r0.2.14.so
> <http://libopenblasp-r0.2.14.so>
> libopenblas64.so.0         libopenblaso.so            libopenblasp.so
> libopenblas.a              libopenblaso.so.0          libopenblasp.so.0
> libopenblaso64.a           libopenblasp64.a          
> libopenblas-r0.2.14.so <http://libopenblas-r0.2.14.so>
> libopenblaso64-r0.2.14.so <http://libopenblaso64-r0.2.14.so>
> libopenblasp64-r0.2.14.so <http://libopenblasp64-r0.2.14.so>  libopenblas.so
> libopenblaso64.so          libopenblasp64.so          libopenblas.so.0
>
> And importing numpy shows that the serial is the only one open:
>
> (py27)[david@SQUIDS lib64]$ lsof libopenbl*
> lsof: WARNING: can't stat() tracefs file system /sys/kernel/debug/tracing
>       Output information may be incomplete.
> COMMAND   PID  USER  FD   TYPE DEVICE SIZE/OFF    NODE NAME
> ipython  2355 david mem    REG    8,2 32088056 2372346
> libopenblas-r0.2.14.so <http://libopenblas-r0.2.14.so>
>
>
> This is the output of np.show_config():
>
> lapack_opt_info:
>     libraries = ['openblas']
>     library_dirs = ['/usr/lib64']
>     define_macros = [('HAVE_CBLAS', None)]
>     language = c
> blas_opt_info:
>     libraries = ['openblas']
>     library_dirs = ['/usr/lib64']
>     define_macros = [('HAVE_CBLAS', None)]
>     language = c
> openblas_info:
>     libraries = ['openblas']
>     library_dirs = ['/usr/lib64']
>     define_macros = [('HAVE_CBLAS', None)]
>     language = c
> openblas_lapack_info:
>     libraries = ['openblas']
>     library_dirs = ['/usr/lib64']
>     define_macros = [('HAVE_CBLAS', None)]
>     language = c
> blas_mkl_info:
>   NOT AVAILABLE
>
>
> Thanks,
>
>
> /David.
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Linking Numpy with parallel OpenBLAS

Daπid

On 29 October 2015 at 20:25, Julian Taylor <[hidden email]> wrote:
should be possible by putting this into: ~/.numpy-site.cfg

[openblas]
libraries = openblasp

LD_PRELOAD the file should also work.


Thank!

I did some timings on a dot product of a square matrix of size 10000 with LD_PRELOADing  the different versions. I checked that all the cores were crunching when an other than plain libopenblas/64 was selected. Here are the timings in seconds:


Intel i5-3317U:
/usr/lib64/libopenblaso.so
86.3651878834
/usr/lib64/libopenblasp64.so
96.8817200661
/usr/lib64/libopenblas.so
114.60265708
/usr/lib64/libopenblasp.so
107.927740097
/usr/lib64/libopenblaso64.so
97.<a href="tel:5418870449" value="+15418870449" target="_blank">5418870449
/usr/lib64/libopenblas64.so
109.000799179

Intel  i7-4770:
/usr/lib64/libopenblas.so
37.9794859886
/usr/lib64/libopenblasp.so
12.3455951214
/usr/lib64/libopenblas64.so
38.0571939945
/usr/lib64/libopenblasp64.so
12.5558650494
/usr/lib64/libopenblaso64.so
12.4118559361
/usr/lib64/libopenblaso.so
13.4787950516

Both computers have the same software and OS. So, it seems that openblas doesn't get a significant advantage from going parallel in the older i5; the i7 using all its cores (4 + 4 hyperthread) gains a 3x speed up, and there is no big different between OpenMP and pthreads.

I am particullary puzzled by the i5 results, shouldn't threads get a noticeable speedup?


/David.


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Linking Numpy with parallel OpenBLAS

Julian Taylor-3
On 29.10.2015 21:50, Daπid wrote:

>
> On 29 October 2015 at 20:25, Julian Taylor
> <[hidden email] <mailto:[hidden email]>>
> wrote:
>
>     should be possible by putting this into: ~/.numpy-site.cfg
>
>     [openblas]
>     libraries = openblasp
>
>     LD_PRELOAD the file should also work.
>
>
> Thank!
>
> I did some timings on a dot product of a square matrix of size 10000
> with LD_PRELOADing  the different versions. I checked that all the cores
> were crunching when an other than plain libopenblas/64 was selected.
> Here are the timings in seconds:
>
>
> Intel i5-3317U:
> /usr/lib64/libopenblaso.so
> 86.3651878834
> /usr/lib64/libopenblasp64.so
> 96.8817200661
> /usr/lib64/libopenblas.so
> 114.60265708
> /usr/lib64/libopenblasp.so
> 107.927740097
> /usr/lib64/libopenblaso64.so
> 97.5418870449 <tel:5418870449>
> /usr/lib64/libopenblas64.so
> 109.000799179
>
> Intel  i7-4770:
> /usr/lib64/libopenblas.so
> 37.9794859886
> /usr/lib64/libopenblasp.so
> 12.3455951214
> /usr/lib64/libopenblas64.so
> 38.0571939945
> /usr/lib64/libopenblasp64.so
> 12.5558650494
> /usr/lib64/libopenblaso64.so
> 12.4118559361
> /usr/lib64/libopenblaso.so
> 13.4787950516
>
> Both computers have the same software and OS. So, it seems that openblas
> doesn't get a significant advantage from going parallel in the older i5;
> the i7 using all its cores (4 + 4 hyperthread) gains a 3x speed up, and
> there is no big different between OpenMP and pthreads.
>
> I am particullary puzzled by the i5 results, shouldn't threads get a
> noticeable speedup?
>
>
> /David.
>
>


Try with only 2 cores instead of the 2+2 via OMP_NUM_THREADS=2, its
possible the hyperthreading is just leading to cache trashing.
Also when only one core is active the cpus will overclock themselves a
bit which will decrease relative parallelization speedups (intel turbo
boost).
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/numpy-discussion