This post was updated on .
Hi all,
I am looking for a way to use `np.PyArray_MultiIterNew2` in Cython to broadcast parameters of a function. The requirement is that the two arguments can be scalar and/or sequences. Using the usual `np.broadcast` function works well but is slow when iterating over the broadcasted input in a tight loop. I want to achieve the same using the C API. Currently, if I used `(<double*>np.PyArray_MultiIter_DATA(bcasted_result, i))[0]` to iterate over the input when one of them is a scalar, I get no errors, but I notice the output of the parent function returns an array of zeros, which implies this approach didn't work. After investigating, it seems that np.PyArray_MultiIter_DATA only accepts numpy arrays. I could write a function to handle all combinations of scalar/array/list/tuple, and create temporary arrays to store the input data, but that seems daunting and error prone. Is there a way I can achieve this and have scalar arguments passed to np.PyArray_MultiIter_DATA be converted to same-element arrays without writing my own code from scratch? Any suggestions are welcome. -- Sent from: http://numpy-discussion.10968.n7.nabble.com/ _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion |
On Sun, 2021-01-10 at 09:59 -0700, zoj613 wrote:
> Hi all, > > I am looking for a way to use `np.PyArray_MultiIterNew2` in Cython to > broadcast parameters of a function. The requirement is that the two > arguments can be scalar and/or sequences. Using the usual > `np.broadcast` > function works well but is slow when iterating over the broadcasted > input in > a tight loop. I want to achieve the same using the C API. NB: There is also a different, newer, iterator (`NpyIter_New`). Depending on what you do, that might be a lot faster and better (although, I admit it tends to be more verbose too). That iterator also does broadcasting, and one nice property is that it can allocate a result array. Most importantly, it casts for you and allows you to take charge of the inner-loop (a one dimensional loop) for performance. (This is the core of how ufuncs work.) > Currently, if I used `(<double*>np.PyArray_MultiIter_DATA(bcast, > i))[0]` to > iterate over the input when one of them is a scalar, > I get no errors, but I notice the output of the parent function > returns an > array of zeros, which implies this approach didn't work. After > investigating, it seems that np.PyArray_MultiIter_DATA only accepts > numpy > arrays. I am honestly confused by this (did you mean a different command?). `PyArray_MultiIter_DATA` returns a pointer to the raw data, i.e. in cython you would probably do: cdef double *value_ptr = <double *>npc.PyArray_MultiIter_DATA(iter, 0) value_ptr[0] = 3.1416 Do want to reference the original data? You could reference the original data for scalars (read-only since scalars are immutable), but for lists/tuples there is no original data to begin with (unless you have Python objects inside, but it would seem strange if NumPy to attempted to transparently expose that). > I could write a function to handle all combinations of > scalar/array/list/tuple, and create temporary arrays to store the > input The typical pattern is to convert everything to array first, but `PyArray_MultiIter()` does that for you. Unless you want to optimize that conversion away? Cheers, Sebastian > data, but that seems daunting and error prone. Is there a way I can > achieve > this and have scalar arguments passed to np.PyArray_MultiIter_DATA be > converted to same-element arrays without writing my own code from > scratch? > > Any suggestions are welcome. > > > > -- > Sent from: http://numpy-discussion.10968.n7.nabble.com/ > _______________________________________________ > NumPy-Discussion mailing list > [hidden email] > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion mailing list [hidden email] https://mail.python.org/mailman/listinfo/numpy-discussion |
This post was updated on .
I think I wasn't clear enough in my original question. Here is what I have:
a python function with the signature: def pyfunc(a, b). This function calls another cython function internally such that def pyfunc(a, b): .... return cfunc(a, b) I want the pyfunc to support scalar and array_like input for both parameters, so that internally I can have: def pyfunc(a, b): .... # some part of the function for i, j in some_broadcasted_iterator: output[indx] = cfunc(i, j) Broadcasting using python level functions allows me to do this but then the iteration part is slow because it uses python python code, meaning I can't release the gil. What I was thinking of doing is using the C API to create the broadcasted iterator using PyArray_MultiIterNew2 then iterating over that. I kept getting a zero array as output when I tested it so I assumed that the function doesnt work with scalars. It turns out the bug was caused by me using integer scalars when the C function expects doubles (But I fixed that after creating this post). Now I'm left with figuring out how I can efficiently convert even lists/tuple inputs into a broadcasted iterator (without requiring the user to pass numpy arrays). When I pass a list for parameter a and a scalar for parameter b, the program hangs, so Im not sure if the PyArray_MultiIterNew2 function does it for me? Btw, do you have a link to the more recent NpyIter_New? I couldn't find it reading the C-API page in the docs. |
This post was updated on .
In reply to this post by zoj613
For what it's worth, I ended up defining a function like:
cdef np.broadcast broadcast(object a, object b): """ Broadcast the inputs into a multiIterator object. the input can be a scalar, list, tuple or numpy array or array_like object. """ cdef bint is_a_seq = is_sequence(a) cdef bint is_b_seq = is_sequence(b) a = <double>a if not is_a_seq else np.PyArray_FROM_OT(a, np.NPY_DOUBLE) b = <double>b if not is_b_seq else np.PyArray_FROM_OT(b, np.NPY_DOUBLE) return np.PyArray_MultiIterNew2(a, b) It seems to do exactly what I want. Not sure if there are any potential bugs that could result from this approach. -- Sent from: http://numpy-discussion.10968.n7.nabble.com/ _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion |
On Sun, 2021-01-10 at 14:18 -0700, zoj613 wrote:
> For what it's worth, I ended up defining a function like: > cdef np.broadcast broadcast(object a, object b): > """ > Broadcast the inputs into a multiIterator object. > the input can be a scalar, list, tuple or numpy array or > array_like > object. > """ > cdef bint is_a_seq = is_sequence(h) > cdef bint is_a_seq = is_sequence(z) > > h = <double>h if not is_a_seq else np.PyArray_FROM_OT(a, > np.NPY_DOUBLE) > z = <double>z if not is_b_seq else np.PyArray_FROM_OT(b, > np.NPY_DOUBLE) > > return np.PyArray_MultiIterNew2(a, b) > > It seems to do exactly what I want. Not sure if there are any > potential bugs > that could result from this approach. > Maybe it would help if you give an example, because I am still not sure why `np.PyArray_MultiIterNew2(a, b)` is not what you want. Is the issue is that you need to ensure a `double` result, and `MultiIterNew()` will be equivalent to `np.asarray(a)` and not `np.asarray(a, dtype=np.double)`? If you want to ensure a double result, I think it is likely easiest if you just call: a = np.PyArray_FROM_OT(a, np.NPY_DOUBLE) unconditionally. That is equivalent to calling `np.asarray(a, dtype=np.double)` and will ensure that you get the double result that you expect (it will force cast though). You could still special case scalars of course (this is especially true if you expect Python scalars `isinstance(a, float):` is a very fast check. The new iterator requires you to convert to arrays (sorry for not copying the link, oddly enough my copy-paste was acting up), but it if they are of a different dtype, it would allow you to cast in a buffered way (much better if you expect large arrays, likely not for smaller ones): https://numpy.org/doc/stable/reference/c-api/iterator.html#simple-iteration-example Hope this helps. Cheers, Sebastian -- Sent from: http://numpy-discussion.10968.n7.nabble.com/ _______________________________________________ NumPy-Discussion mailing list [hidden email] https://mail.python.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list [hidden email] https://mail.python.org/mailman/listinfo/numpy-discussion |
No, The function is what I ended up with, I just forgot to change (h, z) to
(a, b). It works fine. Yes, I want to ensure that I end up with a double type even if an integer value is passed. I didn't think of unconditionally using np.PyArray_From_OT() even for scalars. I guess that is probably a better approach instead of the if statements because ultimately for that code block I want a broacasted array as a result, not scalars. I will experiment with the new multi-iterator interface and see if it is better than the current approach. Thanks! -- Sent from: http://numpy-discussion.10968.n7.nabble.com/ _______________________________________________ NumPy-Discussion mailing list [hidden email] https://mail.python.org/mailman/listinfo/numpy-discussion |
Free forum by Nabble | Edit this page |