how to use masked arrays

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

how to use masked arrays

Christopher Burns
I'm finding it difficult to tell which methods/operations respect the
mask and which do not, in masked arrays.

mydata.filled returns a copy of the data (in a numpy array) with all
masked elements set to the fill_value.  So, masked respected, but data
returned as a new data-type when what I wanted was to set all masked
values in the array to the same value.

mydata.fill however modifies the data array in-place, modifies all
values regardless of the mask, and leaves the mask unchanged.

Assignment (mydata[:] = 10) sets all values in the slice and updates the mask.

Basic methods respect the mask, like mydata.mean(), but np.asarray
ignores the mask.

Example
------------
In [32]: mydata = ma.array([0,1,2,3,4,5], mask=[1,0,1,0,1,0])

In [34]: mydata
Out[34]:
masked_array(data = [-- 1 -- 3 -- 5],
      mask = [ True False  True False  True False],
      fill_value=999999)

In [35]: mydata.filled(np.nan)
Out[35]: array([0, 1, 0, 3, 0, 5])

In [36]: mydata.fill(np.nan)

In [37]: mydata
Out[37]:
masked_array(data = [-- 0 -- 0 -- 0],
      mask = [ True False  True False  True False],
      fill_value=999999)

In [38]: mydata.data
Out[38]: array([0, 0, 0, 0, 0, 0])

In [48]: mydata[:] = 456

In [49]: mydata
Out[49]:
masked_array(data = [456 456 456 456 456 456],
      mask = [False False False False False False],
      fill_value=999999)

In [53]: mydata = ma.array([0,1,2,3,4,5], mask=[1,0,1,0,1,0])

In [54]: mydata.mean()
Out[54]: 3.0

In [55]: np.asarray(mydata)
Out[55]: array([0, 1, 2, 3, 4, 5])


In summary, is there a tutorial that would show how to use masked
arrays?  Because at this point I'm confused and don't know how to use
them.

Google yields this out of data doc:
http://numpy.scipy.org/numpydoc/numpy-22.html

Thanks!

--
Christopher Burns
Computational Infrastructure for Research Labs
10 Giannini Hall, UC Berkeley
phone: 510.643.4014
http://cirl.berkeley.edu/
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: how to use masked arrays

Pierre GM-2
On Wednesday 14 May 2008 02:18:06 Christopher Burns wrote:
> I'm finding it difficult to tell which methods/operations respect the
> mask and which do not, in masked arrays.

Christopher,
Unfortunately, there's no tutorial yet. Perhaps could you get one started on
the scipy wiki ? I'm afraid I won't have time to do it myself, but I'd be
more than happy to fill the gaps.

To answer some of your questions:
>>>import numpy as np, numpy.ma as ma
>>>mydata = ma.array([0,1,2,3,4,5], mask=[1,0,1,0,1,0])

* If you want to access the underlying data directly, these two commands are
(almost) equivalent [1]:
>>>mydata._data
>>>mydata.view(np.ndarray)
Note that you lose the mask information, and that the values that were masked
can be bogus.

* If you want to get a copy of the underlying data with masked values set
to "myvalue", use .filled(myvalue).
>>>mydata.filled(-999)
array([-99,   1, -99,   3, -99,   5])

If you don't use any argument, ".filled" uses the "fill_value" attribute,
whose value depends on the dtype:
>>>mydata.fill_value
999999
>>>mydata.filled()
array([999999,      1, 999999,      3, 999999,      5])

Note that the argument of ".filled" is casted to the dtype of mydata.
>>>mydata.dtype
dtype('int64')
>>>mydata.filled(np.pi)
array([3, 1, 3, 3, 3, 5])
That can be a problem if you wanted to use NaNs as filling values (a bad idea
in itself):
>>>mydata.filled(np.nan)
array([0, 1, 0, 3, 0, 5])
Here, you don't have the NaNs you expected because NaNs are for floats, not
integers.

* Because masked arrays inherit from ndarrays, there's also a "fill" method
available: this one acts directly on the ._data part, but setting all the
values at once. The mask is preserved.
>>>mydata.fill(-999)
>>>print mydata
[-- -999 -- -999 -- -999]

You could achieve the same result with this command
>>>mydata.flat = -999

* Assigning a value to a slice of mydata will modify the mask:
>>>mydata = ma.array([0,1,2,3,4,5], mask=[1,0,1,0,1,0])
>>>mydata[:2] = -999
>>>print mydata
[-999 -999 -- 3 -- 5]
>>>mydata[-2:] = ma.masked
>>>print mydata
[-999 -999 -- 3 -- --]

* If you want to make sure you don't unmask data by mistake with slice
assignments, set the ._hardmask attribute to True (it is set to False by
default)
>>>mydata = ma.array([0,1,2,3,4,5], mask=[1,0,1,0,1,0], hard_mask=True)
>>>mydata[:2] = -999
>>>print mydata
[-- -999 -- 3 -- 5]
You can change the value of ._hardmask either directly, or with the
soften_mask() and harden_mask() methods

*
> Basic methods respect the mask, like mydata.mean(), but np.asarray
> ignores the mask.

Yes, np.asarray(x) is equivalent to np.array(x, copy=False, subok=False). If
you want to keep the mask, use np.asanyarray, which is equivalent to
np.array(x, copy=False, subok=True) [2]

>>>mydata = ma.array([0,1,2,3,4,5], mask=[1,0,1,0,1,0])
>>>print mydata.mean()
3.0
>>>print np.asarray(mydata).mean()
2.5
>>>print np.asanyarray(mydata).mean()
3.0
>>>print np.mean(mydata)
3.0
On the last command, np.mean(mydta) tries first to access the .mean method of
mydata: if mydata hand't such a method, it would be equivalent to
np.asarray(mydata).mean()

Hope it helps, don't hesitate to ask for more details/explanations. Specific
examples are always easier.
I'm looking forward to your wiki page ;)

P.




[1] Almost: mydata._data is in fact a shortcut to
mydata.view(mydata._baseclass), where ._baseclass is the class of the
underlying data. For example
>>>mxdata=ma.array(np.matrix([[1,2,],[3,4,]]),mask=[[1,0],[0,0]])
>>>print mxdata._baseclass
<class 'numpy.core.defmatrix.matrix'>
>>>print type(mxdata._data)
<class 'numpy.core.defmatrix.matrix'>
>>>print type(mxdata.view(np.ndarray))
<type 'numpy.ndarray'>

[2] Note that np.asanyarray returns a masked array in numpy.ma only, not in
previous implementations.
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: how to use masked arrays

Eric Firing
Pierre GM wrote:
[...]
>
> * If you want to access the underlying data directly, these two commands are
> (almost) equivalent [1]:
>>>> mydata._data
>>>> mydata.view(np.ndarray)

Shouldn't the former be discouraged, on the grounds that a leading
underscore, by Python convention, indicates an attribute that is not
part of the public API, but is instead part of the potentially
changeable implementation?

Eric
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: how to use masked arrays

Pierre GM-2
On Wednesday 14 May 2008 13:19:55 Eric Firing wrote:
> Pierre GM wrote:
> > (almost) equivalent [1]:
> >>>> mydata._data
> >>>> mydata.view(np.ndarray)
>
> Shouldn't the former be discouraged, on the grounds that a leading
> underscore, by Python convention, indicates an attribute that is not
> part of the public API, but is instead part of the potentially
> changeable implementation?

Eric,
* Please keep the note [1] in mind: the two commands are NOT equivalent: the
former outputs a subclass of ndarray (when appropriate), the latter a regular
ndarray.
* You can use mydata.data to achieve the same result as mydata._data. In
practice, both _data and data are properties, without a fset method and a
with fget= lambda x:x.view(x._baseclass). I'm not very comfortable with
using .data myself, it looks a bit awkward (personal taste), and it may let a
user think that the readbuffer object is accessed (when in fact, it's
mydata.data.data...)
* The syntax ._data is required for backwards compatibility (that was the data
portion of the old MaskedArray object). So is ._mask
* You can also use the getdata(mydata) function: it returns the ._data part of
a masked array or the argument as a ndarray, depending which is available.

_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion