numpy masked array oddity

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

numpy masked array oddity

Russell E. Owen
The object returned by maskedArray.compressed() appears to be a normal
numpy array (based on repr output), but in reality it has some
surprising differences:

import numpy
a = numpy.arange(10, dtype=int)
b = numpy.zeros(10)
b[1] = 1
b[3] = 1
ma = numpy.core.ma.array(a, mask=b, dtype=float)
print ma
# [0.0 -- 2.0 -- 4.0 5.0 6.0 7.0 8.0 9.0]
c = ma.compressed()
print repr(c)
# array([ 0.  2.  4.  5.  6.  7.  8.  9.])
c.sort()
#Traceback (most recent call last):
#  File "<stdin>", line 1, in <module>
#  File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-pac
kages/#numpy/core/ma.py", line 2132, in not_implemented
#    raise NotImplementedError, "not yet implemented for numpy.ma arrays"
#NotImplementedError: not yet implemented for numpy.ma arrays
d = numpy.array(c)
d.sort()
# this works fine, as expected

Why is "c" in the example above not just a regular numpy array? It is
not a "live" view (based on a quick test), which seems sensible to me.
I've worked around the problem by making a copy (d in the example
above), but it seems most unfortunate to have to copy the data twice.

-- Russsell

_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: numpy masked array oddity

Robert Kern-2
On Mon, May 5, 2008 at 12:19 PM, Russell E. Owen <[hidden email]> wrote:

> The object returned by maskedArray.compressed() appears to be a normal
>  numpy array (based on repr output), but in reality it has some
>  surprising differences:
>
>  import numpy
>  a = numpy.arange(10, dtype=int)
>  b = numpy.zeros(10)
>  b[1] = 1
>  b[3] = 1
>  ma = numpy.core.ma.array(a, mask=b, dtype=float)
>  print ma
>  # [0.0 -- 2.0 -- 4.0 5.0 6.0 7.0 8.0 9.0]
>  c = ma.compressed()
>  print repr(c)
>  # array([ 0.  2.  4.  5.  6.  7.  8.  9.])
>  c.sort()
>  #Traceback (most recent call last):
>  #  File "<stdin>", line 1, in <module>
>  #  File
>  "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-pac
>  kages/#numpy/core/ma.py", line 2132, in not_implemented
>  #    raise NotImplementedError, "not yet implemented for numpy.ma arrays"
>  #NotImplementedError: not yet implemented for numpy.ma arrays
>  d = numpy.array(c)
>  d.sort()
>  # this works fine, as expected
>
>  Why is "c" in the example above not just a regular numpy array? It is
>  not a "live" view (based on a quick test), which seems sensible to me.
>  I've worked around the problem by making a copy (d in the example
>  above), but it seems most unfortunate to have to copy the data twice.

I don't know the reason why it's not an ndarray, but you don't have to
copy the data again to get one:

  c = ma.compressed().view(numpy.ndarray)

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
 -- Umberto Eco
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: numpy masked array oddity

Chris Barker - NOAA Federal
Robert Kern wrote:
> I don't know the reason why it's not an ndarray, but you don't have to
> copy the data again to get one:
>
>   c = ma.compressed().view(numpy.ndarray)

would:

c - numpy.asarray(ma.compressed())

work too?

-CHB



--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

[hidden email]
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: numpy masked array oddity

Pierre GM-2
In reply to this post by Russell E. Owen
On Monday 05 May 2008 13:19:40 Russell E. Owen wrote:
> The object returned by maskedArray.compressed() appears to be a normal
> numpy array (based on repr output), but in reality it has some
> surprising differences:

Russell:

* I assume you're not using the latest version of numpy, are you ? If you
were, the .sort() method would work.

* Currently, the output of MaskedArray.compressed() is indeed a MaskedArray,
where the missing values are skipped. If you need a regular ndarray, just a
view as Robert suggested. Christopher's suggestion is equivalent.

* An alternative would be to force the output of MaskedArray.compressed() to
type(MaskedArray._baseclass), where the _baseclass attribute is the class of
the underlying array: usually it's only ndarray, but it can be any subclass.
Changing this behavior would not break anything in TimeSeries.

* I need to fix a bug in compressed when the underlying array is a matrix: I
can take care of the alternative at the same time. What are the opinions on
that matter ?
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: numpy masked array oddity

Eric Firing
Pierre GM wrote:
> On Monday 05 May 2008 13:19:40 Russell E. Owen wrote:
>> The object returned by maskedArray.compressed() appears to be a normal
>> numpy array (based on repr output), but in reality it has some
>> surprising differences:
>
> Russell:
>
> * I assume you're not using the latest version of numpy, are you ? If you
> were, the .sort() method would work.

He is clearly using the older version; it is accessed via numpy.core.ma.

>
> * Currently, the output of MaskedArray.compressed() is indeed a MaskedArray,
> where the missing values are skipped. If you need a regular ndarray, just a
> view as Robert suggested. Christopher's suggestion is equivalent.
>
> * An alternative would be to force the output of MaskedArray.compressed() to
> type(MaskedArray._baseclass), where the _baseclass attribute is the class of
> the underlying array: usually it's only ndarray, but it can be any subclass.
> Changing this behavior would not break anything in TimeSeries.

This alternative makes sense to me--I expect most use cases would be
most efficient with compressed yielding a plain ndarray.  I don't see
any gain in keeping it as a masked array, and having to manually convert
it.  (I don't see how the _baseclass conversion would work with the
baseclass as a matrix, though.)

Eric

>
> * I need to fix a bug in compressed when the underlying array is a matrix: I
> can take care of the alternative at the same time. What are the opinions on
> that matter ?
> _______________________________________________
> Numpy-discussion mailing list
> [hidden email]
> http://projects.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: numpy masked array oddity

Pierre GM-2
On Monday 05 May 2008 15:10:56 Eric Firing wrote:

> Pierre GM wrote:
> > * An alternative would be to force the output of MaskedArray.compressed()
> > to type(MaskedArray._baseclass), where the _baseclass attribute is the
> > class of the underlying array: usually it's only ndarray, but it can be
> > any subclass. Changing this behavior would not break anything in
> > TimeSeries.
>
> This alternative makes sense to me--I expect most use cases would be
> most efficient with compressed yielding a plain ndarray.  I don't see
> any gain in keeping it as a masked array, and having to manually convert
> it.  (I don't see how the _baseclass conversion would work with the
> baseclass as a matrix, though.)

In fact, it's straightforward:
- ravel the _data part to get a type(_baseclass) object
- use .compress on the _data part, using logical_not(mask) as the condition.
When you have a matrix as _baseclass, the result will be a ravelled version of
the initial matrix.

But yes, it makes indeed more sense not to have a MaskedArray in output.
SVN5126 should now work as discussed.



> Eric
>
> > * I need to fix a bug in compressed when the underlying array is a
> > matrix: I can take care of the alternative at the same time. What are the
> > opinions on that matter ?
> > _______________________________________________
> > Numpy-discussion mailing list
> > [hidden email]
> > http://projects.scipy.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> Numpy-discussion mailing list
> [hidden email]
> http://projects.scipy.org/mailman/listinfo/numpy-discussion


_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: numpy masked array oddity

Eric Firing
Pierre GM wrote:

> On Monday 05 May 2008 15:10:56 Eric Firing wrote:
>> Pierre GM wrote:
>>> * An alternative would be to force the output of MaskedArray.compressed()
>>> to type(MaskedArray._baseclass), where the _baseclass attribute is the
>>> class of the underlying array: usually it's only ndarray, but it can be
>>> any subclass. Changing this behavior would not break anything in
>>> TimeSeries.
>> This alternative makes sense to me--I expect most use cases would be
>> most efficient with compressed yielding a plain ndarray.  I don't see
>> any gain in keeping it as a masked array, and having to manually convert
>> it.  (I don't see how the _baseclass conversion would work with the
>> baseclass as a matrix, though.)
>
> In fact, it's straightforward:
> - ravel the _data part to get a type(_baseclass) object
> - use .compress on the _data part, using logical_not(mask) as the condition.
> When you have a matrix as _baseclass, the result will be a ravelled version of
> the initial matrix.

What I meant was that I don't see that such a ravelled version of a
matrix would be likely to make sense in a linear algebra context, so
leaving it as a matrix is likely to cause confusion rather than
convenience.  Still, it would be consistent, so I am not objecting to it.

Eric
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: numpy masked array oddity

Pierre GM-2
On Monday 05 May 2008 15:35:35 Eric Firing wrote:
> What I meant was that I don't see that such a ravelled version of a
> matrix would be likely to make sense in a linear algebra context, so
> leaving it as a matrix is likely to cause confusion rather than
> convenience.  Still, it would be consistent, so I am not objecting to it.

I understand and concur: ravelling a matrix always brings surprises. On a side
note, .compressed() isn't the method recommended to get rid of missing values
in a 2D array: there are the compress_rows and compress_cols functions for
that. In any case, I doubt that regular matrix users combine their matrices
with missing data...
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion