using scalar and tuple/list input on np.PyArray_MultiIterNew2

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

using scalar and tuple/list input on np.PyArray_MultiIterNew2

zoj613
This post was updated on .
Hi all,

I am looking for a way to use `np.PyArray_MultiIterNew2` in Cython to
broadcast parameters of a function. The requirement is that the two
arguments can be scalar and/or sequences. Using the usual `np.broadcast`
function works well but is slow when iterating over the broadcasted input in
a tight loop. I want to achieve the same using the C API.

Currently, if I used `(<double*>np.PyArray_MultiIter_DATA(bcasted_result, i))[0]` to
iterate over the input when one of them is a scalar,
I get no errors, but I notice the output of the parent function returns an
array of zeros, which implies this approach didn't work. After
investigating, it seems that np.PyArray_MultiIter_DATA only accepts numpy
arrays.

I could write a function to handle all combinations of
scalar/array/list/tuple, and create temporary arrays to store the input
data, but that seems daunting and error prone. Is there a way I can achieve
this and have scalar arguments passed to np.PyArray_MultiIter_DATA be
converted to same-element arrays without writing my own code from scratch?

Any suggestions are welcome.



--
Sent from: http://numpy-discussion.10968.n7.nabble.com/
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: using scalar input on np.PyArray_MultiIterNew2

Sebastian Berg
On Sun, 2021-01-10 at 09:59 -0700, zoj613 wrote:
> Hi all,
>
> I am looking for a way to use `np.PyArray_MultiIterNew2` in Cython to
> broadcast parameters of a function. The requirement is that the two
> arguments can be scalar and/or sequences. Using the usual
> `np.broadcast`
> function works well but is slow when iterating over the broadcasted
> input in
> a tight loop. I want to achieve the same using the C API.

NB: There is also a different, newer, iterator (`NpyIter_New`).
Depending on what you do, that might be a lot faster and better
(although, I admit it tends to be more verbose too).
That iterator also does broadcasting, and one nice property is that it
can allocate a result array.  Most importantly, it casts for you and
allows you to take charge of the inner-loop (a one dimensional loop)
for performance.  (This is the core of how ufuncs work.)

> Currently, if I used `(<double*>np.PyArray_MultiIter_DATA(bcast,
> i))[0]` to
> iterate over the input when one of them is a scalar,
> I get no errors, but I notice the output of the parent function
> returns an
> array of zeros, which implies this approach didn't work. After
> investigating, it seems that np.PyArray_MultiIter_DATA only accepts
> numpy
> arrays.

I am honestly confused by this (did you mean a different command?).
`PyArray_MultiIter_DATA` returns a pointer to the raw data, i.e. in
cython you would probably do:

    cdef double *value_ptr = <double *>npc.PyArray_MultiIter_DATA(iter, 0)
    value_ptr[0] = 3.1416

Do want to reference the original data?  You could reference the
original data for scalars (read-only since scalars are immutable), but
for lists/tuples there is no original data to begin with (unless you
have Python objects inside, but it would seem strange if NumPy to
attempted to transparently expose that).

> I could write a function to handle all combinations of
> scalar/array/list/tuple, and create temporary arrays to store the
> input

The typical pattern is to convert everything to array first, but
`PyArray_MultiIter()` does that for you. Unless you want to optimize
that conversion away?

Cheers,

Sebastian


> data, but that seems daunting and error prone. Is there a way I can
> achieve
> this and have scalar arguments passed to np.PyArray_MultiIter_DATA be
> converted to same-element arrays without writing my own code from
> scratch?
>
> Any suggestions are welcome.
>
>
>
> --
> Sent from: http://numpy-discussion.10968.n7.nabble.com/
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
>

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: using scalar input on np.PyArray_MultiIterNew2

zoj613
This post was updated on .
I think I wasn't clear enough in my original question. Here is what I have:
a python function with the signature: def pyfunc(a, b). This function calls another cython function internally such that
def pyfunc(a, b):
    ....
    return cfunc(a, b)

I want the pyfunc to support scalar and array_like input for both parameters, so that internally I can have:
def pyfunc(a, b):
    ....
    # some part of the function
    for i, j in some_broadcasted_iterator:
        output[indx] = cfunc(i, j)

Broadcasting using python level functions allows me to do this but then the iteration part is slow because it uses python python code, meaning I can't release the gil. What I was thinking of doing is using the C API to create the broadcasted iterator using PyArray_MultiIterNew2 then iterating over that. I kept getting a zero array as output when I tested it so I assumed that the function doesnt work with scalars. It turns out the bug was caused by me using integer scalars when the C function expects doubles (But I fixed that after creating this post).

Now I'm left with figuring out how I can efficiently convert even lists/tuple inputs into a broadcasted iterator (without requiring the user to pass numpy arrays). When I pass a list for parameter a and a scalar for parameter b, the program hangs, so Im not sure if the PyArray_MultiIterNew2 function does it for me?

Btw, do you have a link to the more recent NpyIter_New? I couldn't find it reading the C-API page in the docs.
Reply | Threaded
Open this post in threaded view
|

Re: using scalar and tuple/list input on np.PyArray_MultiIterNew2

zoj613
This post was updated on .
In reply to this post by zoj613
For what it's worth, I ended up defining a function like:
cdef np.broadcast broadcast(object a, object b):
    """
    Broadcast the inputs into a multiIterator object.
    the input can be a scalar, list, tuple or numpy array or array_like
object.
    """
    cdef bint is_a_seq = is_sequence(a)
    cdef bint is_b_seq = is_sequence(b)

    a = <double>a if not is_a_seq else np.PyArray_FROM_OT(a, np.NPY_DOUBLE)
    b = <double>b if not is_b_seq else np.PyArray_FROM_OT(b, np.NPY_DOUBLE)

    return np.PyArray_MultiIterNew2(a, b)

It seems to do exactly what I want. Not sure if there are any potential bugs
that could result from this approach.



--
Sent from: http://numpy-discussion.10968.n7.nabble.com/
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: using scalar and tuple/list input on np.PyArray_MultiIterNew2

Sebastian Berg
On Sun, 2021-01-10 at 14:18 -0700, zoj613 wrote:

> For what it's worth, I ended up defining a function like:
> cdef np.broadcast broadcast(object a, object b):
>     """
>     Broadcast the inputs into a multiIterator object.
>     the input can be a scalar, list, tuple or numpy array or
> array_like
> object.
>     """
>     cdef bint is_a_seq = is_sequence(h)
>     cdef bint is_a_seq = is_sequence(z)
>
>     h = <double>h if not is_a_seq else np.PyArray_FROM_OT(a,
> np.NPY_DOUBLE)
>     z = <double>z if not is_b_seq else np.PyArray_FROM_OT(b,
> np.NPY_DOUBLE)
>
>     return np.PyArray_MultiIterNew2(a, b)
>
> It seems to do exactly what I want. Not sure if there are any
> potential bugs
> that could result from this approach.
>
I guess you snipped some code away that does something with h and z?
Maybe it would help if you give an example, because I am still not sure
why `np.PyArray_MultiIterNew2(a, b)` is not what you want.

Is the issue is that you need to ensure a `double` result, and
`MultiIterNew()` will be equivalent to `np.asarray(a)` and not
`np.asarray(a, dtype=np.double)`?  If you want to ensure a double
result, I think it is likely easiest if you just call:
     a = np.PyArray_FROM_OT(a, np.NPY_DOUBLE)

unconditionally.  That is equivalent to calling `np.asarray(a,
dtype=np.double)` and will ensure that you get the double result that
you expect (it will force cast though).
You could still special case scalars of course (this is especially true
if you expect Python scalars `isinstance(a, float):` is a very fast
check.

The new iterator requires you to convert to arrays (sorry for not
copying the link, oddly enough my copy-paste was acting up), but it if
they are of a different dtype, it would allow you to cast in a buffered
way (much better if you expect large arrays, likely not for smaller
ones):
    https://numpy.org/doc/stable/reference/c-api/iterator.html#simple-iteration-example

Hope this helps.

Cheers,

Sebastian




--
Sent from: http://numpy-discussion.10968.n7.nabble.com/
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion



_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: using scalar and tuple/list input on np.PyArray_MultiIterNew2

zoj613
No, The function is what I ended up with, I just forgot to change (h, z) to
(a, b). It works fine. Yes, I want to ensure that I end up with a double
type even if an integer value is passed. I didn't think of unconditionally
using np.PyArray_From_OT() even for scalars. I guess that is probably a
better approach instead of the if statements because ultimately for that
code block I want a broacasted array as a result, not scalars.  I will
experiment with the new multi-iterator interface and see if it is better
than the current approach. Thanks!



--
Sent from: http://numpy-discussion.10968.n7.nabble.com/
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion