Calling BLAS functions from Python

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Calling BLAS functions from Python

Jens Jørgen Mortensen-2
Hi!

I'm trying to use dgemm, zgemm and friends from scipy.linalg.blas to
multiply matrices efficiently.  As an example, I'd like to do:

     c += a.dot(b)

using whatever BLAS scipy is linked to and I want to avoid copies of
large matrices.  This works the way I want it:

 >>> import numpy as np
 >>> from scipy.linalg.blas import dgemm
 >>> a = np.ones((2, 3), order='F')
 >>> b = np.ones((3, 4), order='F')
 >>> c = np.zeros((2, 4), order='F')
 >>> dgemm(1.0, a, b, 1.0, c, 0, 0, 1)
array([[3., 3., 3., 3.],
        [3., 3., 3., 3.]])
 >>> print(c)
[[3. 3. 3. 3.]
  [3. 3. 3. 3.]]

but if c is not contiguous, then c is not overwritten:

 >>> c = np.zeros((7, 4), order='F')[:2, :]
 >>> dgemm(1.0, a, b, 1.0, c, 0, 0, 1)
array([[3., 3., 3., 3.],
        [3., 3., 3., 3.]])
 >>> print(c)
[[0. 0. 0. 0.]
  [0. 0. 0. 0.]]

Which is also what the docs say, but I think the raw BLAS function dgemm
could do the update of c in-place by setting LDC=7.  See here:

     http://www.netlib.org/lapack/explore-html/d7/d2b/dgemm_8f.html

Is there a way to call the raw BLAS function from Python?

I found this capsule thing, but I don't know if there is a way to call
that (maybe using ctypes):

 >>> from scipy.linalg import cython_blas
 >>> cython_blas.__pyx_capi__['dgemm']
<capsule object "void (char *, char *, int *, int *, int *,
__pyx_t_5scipy_6linalg_11cython_blas_d *,
__pyx_t_5scipy_6linalg_11cython_blas_d *, int *,
__pyx_t_5scipy_6linalg_11cython_blas_d *, int *,
__pyx_t_5scipy_6linalg_11cython_blas_d *,
__pyx_t_5scipy_6linalg_11cython_blas_d *, int *)" at 0x7f06fe1d2ba0>

Best,
Jens Jørgen
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Calling BLAS functions from Python

Andras Deak
On Tue, Aug 27, 2019 at 1:18 PM Jens Jørgen Mortensen <[hidden email]> wrote:

>
> Hi!
>
> I'm trying to use dgemm, zgemm and friends from scipy.linalg.blas to
> multiply matrices efficiently.  As an example, I'd like to do:
>
>      c += a.dot(b)
>
> using whatever BLAS scipy is linked to and I want to avoid copies of
> large matrices.  This works the way I want it:
(snip)

Hi,

This is not a direct answer to your question, but are you only trying
to use low-level BLAS or the sake of memory, or are there other
considerations? I'm not certain but I think `a.dot` will call BLAS for
matrices (hence its speed and multithreaded capabilities), so CPU time
might already be optimal. As for memory, most numpy functions (and
definitely numpy arithmetic) can be used with functions in the main
numpy namespace, using the `out` keyword to specify an existing array
in which to print. So for your simple example,

>>> import numpy as np
... a = np.ones((2, 3), order='F')
... b = np.ones((3, 4), order='F')
... c = np.zeros((7, 4), order='F')[:2, :]
... np.add(c, a.dot(b), out=c)
array([[3., 3., 3., 3.],
       [3., 3., 3., 3.]])

>>> c
array([[3., 3., 3., 3.],
       [3., 3., 3., 3.]])

As you can see non-contiguous arrays Just Work™. You will still create
a temporary array for `a.dot(b)` but I'm not sure you can spare that.
Would low-level BLAS allow you to reduce memory at that step as well?
And is there other motivation for you to go down to the metal?
Regards,

András



>
>  >>> import numpy as np
>  >>> from scipy.linalg.blas import dgemm
>  >>> a = np.ones((2, 3), order='F')
>  >>> b = np.ones((3, 4), order='F')
>  >>> c = np.zeros((2, 4), order='F')
>  >>> dgemm(1.0, a, b, 1.0, c, 0, 0, 1)
> array([[3., 3., 3., 3.],
>         [3., 3., 3., 3.]])
>  >>> print(c)
> [[3. 3. 3. 3.]
>   [3. 3. 3. 3.]]
>
> but if c is not contiguous, then c is not overwritten:
>
>  >>> c = np.zeros((7, 4), order='F')[:2, :]
>  >>> dgemm(1.0, a, b, 1.0, c, 0, 0, 1)
> array([[3., 3., 3., 3.],
>         [3., 3., 3., 3.]])
>  >>> print(c)
> [[0. 0. 0. 0.]
>   [0. 0. 0. 0.]]
>
> Which is also what the docs say, but I think the raw BLAS function dgemm
> could do the update of c in-place by setting LDC=7.  See here:
>
>      http://www.netlib.org/lapack/explore-html/d7/d2b/dgemm_8f.html
>
> Is there a way to call the raw BLAS function from Python?
>
> I found this capsule thing, but I don't know if there is a way to call
> that (maybe using ctypes):
>
>  >>> from scipy.linalg import cython_blas
>  >>> cython_blas.__pyx_capi__['dgemm']
> <capsule object "void (char *, char *, int *, int *, int *,
> __pyx_t_5scipy_6linalg_11cython_blas_d *,
> __pyx_t_5scipy_6linalg_11cython_blas_d *, int *,
> __pyx_t_5scipy_6linalg_11cython_blas_d *, int *,
> __pyx_t_5scipy_6linalg_11cython_blas_d *,
> __pyx_t_5scipy_6linalg_11cython_blas_d *, int *)" at 0x7f06fe1d2ba0>
>
> Best,
> Jens Jørgen
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Calling BLAS functions from Python

Jens Jørgen Mortensen-2
On 8/27/19 2:49 PM, Andras Deak wrote:

> On Tue, Aug 27, 2019 at 1:18 PM Jens Jørgen Mortensen <[hidden email]> wrote:
>>
>> Hi!
>>
>> I'm trying to use dgemm, zgemm and friends from scipy.linalg.blas to
>> multiply matrices efficiently.  As an example, I'd like to do:
>>
>>       c += a.dot(b)
>>
>> using whatever BLAS scipy is linked to and I want to avoid copies of
>> large matrices.  This works the way I want it:
> (snip)
>
> Hi,
>
> This is not a direct answer to your question, but are you only trying
> to use low-level BLAS or the sake of memory, or are there other
> considerations? I'm not certain but I think `a.dot` will call BLAS for
> matrices (hence its speed and multithreaded capabilities), so CPU time
> might already be optimal. As for memory, most numpy functions (and
> definitely numpy arithmetic) can be used with functions in the main
> numpy namespace, using the `out` keyword to specify an existing array
> in which to print. So for your simple example,
>
>>>> import numpy as np
> ... a = np.ones((2, 3), order='F')
> ... b = np.ones((3, 4), order='F')
> ... c = np.zeros((7, 4), order='F')[:2, :]
> ... np.add(c, a.dot(b), out=c)
> array([[3., 3., 3., 3.],
>         [3., 3., 3., 3.]])
>
>>>> c
> array([[3., 3., 3., 3.],
>         [3., 3., 3., 3.]])
>
> As you can see non-contiguous arrays Just Work™. You will still create
> a temporary array for `a.dot(b)` but I'm not sure you can spare that.
> Would low-level BLAS allow you to reduce memory at that step as well?
> And is there other motivation for you to go down to the metal?
> Regards,

Thanks for your suggestion.  What I really need is something like:

     c += constant * a.dot(b.conj())

where the matrices have dtype=complex.  The BLAS zgemm() function does
that without copies.  I also need zherk() for this:

     c += constant a.dot(a.conj().T)

You *can* do that with numpy, but I need both time and memory
efficiency.  So, yes, I want to go down to the metal.  At the moment,
the code I'm working on has an interface to BLAS, but I'm trying to see
if we can use scipy's interface in order to simplify installation of our
code.

Jens Jørgen

> András
>
>
>
>>
>>   >>> import numpy as np
>>   >>> from scipy.linalg.blas import dgemm
>>   >>> a = np.ones((2, 3), order='F')
>>   >>> b = np.ones((3, 4), order='F')
>>   >>> c = np.zeros((2, 4), order='F')
>>   >>> dgemm(1.0, a, b, 1.0, c, 0, 0, 1)
>> array([[3., 3., 3., 3.],
>>          [3., 3., 3., 3.]])
>>   >>> print(c)
>> [[3. 3. 3. 3.]
>>    [3. 3. 3. 3.]]
>>
>> but if c is not contiguous, then c is not overwritten:
>>
>>   >>> c = np.zeros((7, 4), order='F')[:2, :]
>>   >>> dgemm(1.0, a, b, 1.0, c, 0, 0, 1)
>> array([[3., 3., 3., 3.],
>>          [3., 3., 3., 3.]])
>>   >>> print(c)
>> [[0. 0. 0. 0.]
>>    [0. 0. 0. 0.]]
>>
>> Which is also what the docs say, but I think the raw BLAS function dgemm
>> could do the update of c in-place by setting LDC=7.  See here:
>>
>>       http://www.netlib.org/lapack/explore-html/d7/d2b/dgemm_8f.html
>>
>> Is there a way to call the raw BLAS function from Python?
>>
>> I found this capsule thing, but I don't know if there is a way to call
>> that (maybe using ctypes):
>>
>>   >>> from scipy.linalg import cython_blas
>>   >>> cython_blas.__pyx_capi__['dgemm']
>> <capsule object "void (char *, char *, int *, int *, int *,
>> __pyx_t_5scipy_6linalg_11cython_blas_d *,
>> __pyx_t_5scipy_6linalg_11cython_blas_d *, int *,
>> __pyx_t_5scipy_6linalg_11cython_blas_d *, int *,
>> __pyx_t_5scipy_6linalg_11cython_blas_d *,
>> __pyx_t_5scipy_6linalg_11cython_blas_d *, int *)" at 0x7f06fe1d2ba0>
>>
>> Best,
>> Jens Jørgen
>> _______________________________________________
>> NumPy-Discussion mailing list
>> [hidden email]
>> https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
>

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Fwd: Re: Calling BLAS functions from Python

Jens Jørgen Mortensen-2
Sorry!  Stupid me, asking scipy questions on numpy-discussion.  Now
continuing on scipy-user.  Any help is much appreciated.  See short
numpy-discussion thread here:
https://mail.python.org/pipermail/numpy-discussion/2019-August/079945.html


Hi!

I'm trying to use dgemm, zgemm and friends from scipy.linalg.blas to
multiply matrices efficiently.  As an example, I'd like to do:

     c += a.dot(b)

using whatever BLAS scipy is linked to and I want to avoid copies of
large matrices.  This works the way I want it:

 >>> import numpy as np
 >>> from scipy.linalg.blas import dgemm
 >>> a = np.ones((2, 3), order='F')
 >>> b = np.ones((3, 4), order='F')
 >>> c = np.zeros((2, 4), order='F')
 >>> dgemm(1.0, a, b, 1.0, c, 0, 0, 1)
array([[3., 3., 3., 3.],
        [3., 3., 3., 3.]])
 >>> print(c)
[[3. 3. 3. 3.]
  [3. 3. 3. 3.]]

but if c is not contiguous, then c is not overwritten:

 >>> c = np.zeros((7, 4), order='F')[:2, :]
 >>> dgemm(1.0, a, b, 1.0, c, 0, 0, 1)
array([[3., 3., 3., 3.],
        [3., 3., 3., 3.]])
 >>> print(c)
[[0. 0. 0. 0.]
  [0. 0. 0. 0.]]

Which is also what the docs say, but I think the raw BLAS function dgemm
could do the update of c in-place by setting LDC=7.  See here:

     http://www.netlib.org/lapack/explore-html/d7/d2b/dgemm_8f.html

Is there a way to call the raw BLAS function from Python?

I found this capsule thing, but I don't know if there is a way to call
that (maybe using ctypes):

 >>> from scipy.linalg import cython_blas
 >>> cython_blas.__pyx_capi__['dgemm']
<capsule object "void (char *, char *, int *, int *, int *,
__pyx_t_5scipy_6linalg_11cython_blas_d *,
__pyx_t_5scipy_6linalg_11cython_blas_d *, int *,
__pyx_t_5scipy_6linalg_11cython_blas_d *, int *,
__pyx_t_5scipy_6linalg_11cython_blas_d *,
__pyx_t_5scipy_6linalg_11cython_blas_d *, int *)" at 0x7f06fe1d2ba0>

Best,
Jens Jørgen
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion