question on NumPy NaN

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

question on NumPy NaN

Vasileios Gkinis


-------- Original Message --------
Subject: question on NumPy NaN
Date: Tue, 20 May 2008 18:03:00 +0200
From: Vasileios Gkinis [hidden email]
To: [hidden email]


Hi all,

I have a question concerning nan in NumPy.
Lets say i have an array of sample measurements
a = array((2,4,nan))
in NumPy calculating the mean of the elements in array a looks like:

>>> a = array((2,4,nan))
>>> a
array([  2.,   4.,  NaN])
>>> mean(a)
nan

What if i simply dont want nan to propagate and get something that would look like:

>>> a = array((2,4,nan))
>>> a
array([  2.,   4.,  NaN])
>>> mean(a)
3.

Cheers

Vasilis
--
------------------------------------------------------------

------------------------------------------------------------

Vasileios Gkinis, PhD Student

Centre for Ice and Climate

Niels Bohr Institute

Juliane Maries Vej 30, room 321

DK-2100 Copenhagen

Denmark

Office: +45 35325913

[hidden email]


_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: question on NumPy NaN

Anne Archibald
2008/5/20 Vasileios Gkinis <[hidden email]>:

> I have a question concerning nan in NumPy.
> Lets say i have an array of sample measurements
> a = array((2,4,nan))
> in NumPy calculating the mean of the elements in array a looks like:
>
>>>> a = array((2,4,nan))
>>>> a
> array([  2.,   4.,  NaN])
>>>> mean(a)
> nan
>
> What if i simply dont want nan to propagate and get something that would
> look like:
>
>>>> a = array((2,4,nan))
>>>> a
> array([  2.,   4.,  NaN])
>>>> mean(a)
> 3.

For more elaborate handling of missing data, look into "masked
arrays", in numpy.ma. They are designed to deal with exactly this sort
of thing.

Anne
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: question on NumPy NaN

Keith Goodman
On Tue, May 20, 2008 at 9:11 AM, Anne Archibald
<[hidden email]> wrote:

> 2008/5/20 Vasileios Gkinis <[hidden email]>:
>
>> I have a question concerning nan in NumPy.
>> Lets say i have an array of sample measurements
>> a = array((2,4,nan))
>> in NumPy calculating the mean of the elements in array a looks like:
>>
>>>>> a = array((2,4,nan))
>>>>> a
>> array([  2.,   4.,  NaN])
>>>>> mean(a)
>> nan
>>
>> What if i simply dont want nan to propagate and get something that would
>> look like:
>>
>>>>> a = array((2,4,nan))
>>>>> a
>> array([  2.,   4.,  NaN])
>>>>> mean(a)
>> 3.
>
> For more elaborate handling of missing data, look into "masked
> arrays", in numpy.ma. They are designed to deal with exactly this sort
> of thing.

Or

np.nansum(a) / np.isfinite(a).sum()

A nanmean would be nice to have in numpy.
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: question on NumPy NaN

cdavid
Keith Goodman wrote:
> Or
>
> np.nansum(a) / np.isfinite(a).sum()
>
> A nanmean would be nice to have in numpy.
>  

nanmean, nanstd and nanmedian are available in scipy, though.

cheers,

David
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: question on NumPy NaN

Keith Goodman
On Tue, May 20, 2008 at 6:12 PM, David Cournapeau
<[hidden email]> wrote:
> Keith Goodman wrote:
>> Or
>>
>> np.nansum(a) / np.isfinite(a).sum()
>>
>> A nanmean would be nice to have in numpy.
>>
>
> nanmean, nanstd and nanmedian are available in scipy, though.

Thanks for pointing that out. Studying nanmedian, which is twice as
fast as my for-loop implementation, taught me about compress and
apply_along_axis.

>> import numpy.matlib as mp
>> from numpy.matlib import where
>> timeit x[0, where(x.A > 0.5)[1]]
10000 loops, best of 3: 60.8 µs per loop
>> timeit x.compress(x.A.ravel() > 0.5)
10000 loops, best of 3: 44.5 µs per loop

Am I missing something obvious or is 'sort' unnecessary in _nanmedian?
Perhaps it is left over from a time when _nanmedian did not call
median.

def _nanmedian(arr1d):  # This only works on 1d arrays
    """Private function for rank a arrays. Compute the median ignoring Nan.

    :Parameters:
        arr1d : rank 1 ndarray
            input array

    :Results:
        m : float
            the median."""
    cond = 1-np.isnan(arr1d)
    x = np.sort(np.compress(cond,arr1d,axis=-1))
    if x.size == 0:
        return np.nan
    return median(x)
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion