subclassing ndarray and keeping same ufunc behavior

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

subclassing ndarray and keeping same ufunc behavior

Stuart Reynolds

I'm trying to subclass an ndarray so that I can add some additional fields. When I do this however, I get new odd behavior when my object is passed to a variety of numpy functions. For example nanmin returns now return an object of the type of my new array class, whereas previously I'd get a float64. Why? Is this a bug with nanmin or my class?

import numpy as np

class NDArrayWithColumns(np.ndarray):
    def __new__(cls, obj, columns=None):
        obj = obj.view(cls)
        obj.columns = tuple(columns)
        return obj

    def __array_finalize__(self, obj):
        if obj is None: return
        self.columns = getattr(obj, 'columns', None)

NAN = float("nan")
r = np.array([1.,0.,1.,0.,1.,0.,1.,0.,NAN, 1., 1.])
print "MIN", np.nanmin(r), type(np.nanmin(r)) 

gives:

MIN 0.0 <type 'numpy.float64'>

but

>>> r = NDArrayWithColumns(r, ["a"])
>>> print "MIN", np.nanmin(r), type(np.nanmin(r))
MIN 0.0 <class '__main__.NDArrayWithColumns'>
>>> print r.shape   # ?!
(11,)

Note the change in type, and also that str(np.nanmin(r)) shows 1 field, not 11 as indicated by its shape. This seems wrong. Is there a way to get my subclass to behave more like an ndarray?

I realize from the docs that I can override __array_wrap__, but its not clear me how how to use it to solve this issue. Or whether its the right tool.

In case you're interested, I'm subclassing because I'd like to track column names in matrices of a single type. This is pretty common wish in scikit pipelines. Structured arrays and record type arrays allow for varying type. Pandas provides this functionality, but dealing with numpy arrays is easier (and more efficient) when writing cython extensions. Also, I think the  structured arrays and record types are unlikely to play nice with cython because they're more freely typed -- I want to deal exclusively with arrays of doubles.

Any thoughts of how to subclass ndarray and keep original behavior in ufuncs?


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: subclassing ndarray and keeping same ufunc behavior

Marten van Kerkwijk
Hi Stuart,

It certainly seems correct behaviour to return the subclass you
created: after all, you might want to keep the information on
`columns` (e.g., consider doing nanmin along a given axis). Indeed, we
certainly want to keep the unit in astropy's Quantity (which also is a
subclass of ndarray).

On the shape: shouldn't that be print(np.nanmin(r).shape)??

Overall, I think it is worth considering very carefully what exactly
you try to accomplish; if different elements along a given axis have
different meaning, I'm not sure it makes all that much sense to treat
them as a single array (e.g., np.sin might be useful for one column,
not not another). Even if pandas is slower, the advantage in clarity
of what is happening might well be more important in the long run.

All the best,

Marten

p.s. nanmin is not a ufunc; you can find it in numpy/lib/nan_functions.py
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: subclassing ndarray and keeping same ufunc behavior

Nathan Goldbaum
You might also want to consider writing a wrapper object that contains an ndarray as a (possibly private) attribute and then presents different views or interpretations of that array.

Subclassing ndarray is a pit of snakes, it's best to avoid it if you can (I say as the author and maintainer of an ndarray subclass).

On Tue, Nov 15, 2016 at 1:48 PM, Marten van Kerkwijk <[hidden email]> wrote:
Hi Stuart,

It certainly seems correct behaviour to return the subclass you
created: after all, you might want to keep the information on
`columns` (e.g., consider doing nanmin along a given axis). Indeed, we
certainly want to keep the unit in astropy's Quantity (which also is a
subclass of ndarray).

On the shape: shouldn't that be print(np.nanmin(r).shape)??

Overall, I think it is worth considering very carefully what exactly
you try to accomplish; if different elements along a given axis have
different meaning, I'm not sure it makes all that much sense to treat
them as a single array (e.g., np.sin might be useful for one column,
not not another). Even if pandas is slower, the advantage in clarity
of what is happening might well be more important in the long run.

All the best,

Marten

p.s. nanmin is not a ufunc; you can find it in numpy/lib/nan_functions.py
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: subclassing ndarray and keeping same ufunc behavior

Stuart Reynolds
In reply to this post by Marten van Kerkwijk
Doh! Thanks for that.

On Tue, Nov 15, 2016 at 10:48 AM, Marten van Kerkwijk <[hidden email]> wrote:
Hi Stuart,

It certainly seems correct behaviour to return the subclass you
created: after all, you might want to keep the information on
`columns` (e.g., consider doing nanmin along a given axis). Indeed, we
certainly want to keep the unit in astropy's Quantity (which also is a
subclass of ndarray).

On the shape: shouldn't that be print(np.nanmin(r).shape)??

Overall, I think it is worth considering very carefully what exactly
you try to accomplish; if different elements along a given axis have
different meaning, I'm not sure it makes all that much sense to treat
them as a single array (e.g., np.sin might be useful for one column,
not not another). Even if pandas is slower, the advantage in clarity
of what is happening might well be more important in the long run.

All the best,

Marten

p.s. nanmin is not a ufunc; you can find it in numpy/lib/nan_functions.py
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Loading...