

Is there an efficient way to represent bool arrays with null entries?
Tools like pandas push you very hard into 64 bit float representations  64 bits where 3 will suffice is neither efficient for memory, nor (consequently), speed.
What I’m hoping for is that there’s a structure that is ‘viewed’ as nanable float data, but backed but a more efficient structures internally.
Thanks  Stu
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


Hi Stuart,
On Thu, 18 Apr 2019 09:12:31 0700, Stuart Reynolds wrote:
> Is there an efficient way to represent bool arrays with null entries?
You can use the bool dtype:
In [5]: x = np.array([True, False, True])
In [6]: x
Out[6]: array([ True, False, True])
In [7]: x.dtype
Out[7]: dtype('bool')
You should note that this stores one True/False value per byte, so it is
not optimal in terms of memory use. There is no easy way to do
bitarrays with NumPy, because we use strides to determine how to move
from one memory location to the next.
See also: https://www.reddit.com/r/Python/comments/5oatp5/one_bit_data_type_in_numpy/> What I’m hoping for is that there’s a structure that is ‘viewed’ as
> nanable float data, but backed but a more efficient structures
> internally.
There are good implementations of this idea, such as:
https://github.com/ilanschnell/bitarrayThose structures cannot typically utilize the NumPy machinery, though.
With the new array function interface, you should at least be able to
build something that has something close to the NumPy API.
Best regards,
Stéfan
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


Thanks. I’m aware of bool arrays. I think the tricky part of what I’m looking for is NULLability and interoperability with code the deals with billable data (float arrays).
Currently the options seem to be float arrays, or custom operations that carry (unabstracted) categorical array data representations, such as: 0: false 1: true 2: NULL
... which wouldn’t be compatible with algorithms that use, say, np.isnan. Ideally, it would be nice to have a structure that was floatlike in that it’s compatible with nanaware operations, but it’s storage is just a single byte per cell (or less).
Is float8 a thing?
On Thu, Apr 18, 2019 at 9:46 AM Stefan van der Walt < [hidden email]> wrote: Hi Stuart,
On Thu, 18 Apr 2019 09:12:31 0700, Stuart Reynolds wrote:
> Is there an efficient way to represent bool arrays with null entries?
You can use the bool dtype:
In [5]: x = np.array([True, False, True])
In [6]: x
Out[6]: array([ True, False, True])
In [7]: x.dtype
Out[7]: dtype('bool')
You should note that this stores one True/False value per byte, so it is
not optimal in terms of memory use. There is no easy way to do
bitarrays with NumPy, because we use strides to determine how to move
from one memory location to the next.
See also: https://www.reddit.com/r/Python/comments/5oatp5/one_bit_data_type_in_numpy/
> What I’m hoping for is that there’s a structure that is ‘viewed’ as
> nanable float data, but backed but a more efficient structures
> internally.
There are good implementations of this idea, such as:
https://github.com/ilanschnell/bitarray
Those structures cannot typically utilize the NumPy machinery, though.
With the new array function interface, you should at least be able to
build something that has something close to the NumPy API.
Best regards,
Stéfan
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


One option here would be to use masked arrays:
arr = np.ma.zeros(3, dtype=bool)
arr[0] = True
arr[1] = False
arr[2] = np.ma.masked
giving
masked_array(data=[True, False, ],
mask=[False, False, True],
fill_value=True)
On Thu, 18 Apr 2019 at 10:51, Stuart Reynolds < [hidden email]> wrote:
>
> Thanks. I’m aware of bool arrays.
> I think the tricky part of what I’m looking for is NULLability and interoperability with code the deals with billable data (float arrays).
>
> Currently the options seem to be float arrays, or custom operations that carry (unabstracted) categorical array data representations, such as:
> 0: false
> 1: true
> 2: NULL
>
> ... which wouldn’t be compatible with algorithms that use, say, np.isnan.
> Ideally, it would be nice to have a structure that was floatlike in that it’s compatible with nanaware operations, but it’s storage is just a single byte per cell (or less).
>
> Is float8 a thing?
>
>
> On Thu, Apr 18, 2019 at 9:46 AM Stefan van der Walt < [hidden email]> wrote:
>>
>> Hi Stuart,
>>
>> On Thu, 18 Apr 2019 09:12:31 0700, Stuart Reynolds wrote:
>> > Is there an efficient way to represent bool arrays with null entries?
>>
>> You can use the bool dtype:
>>
>> In [5]: x = np.array([True, False, True])
>>
>> In [6]: x
>> Out[6]: array([ True, False, True])
>>
>> In [7]: x.dtype
>> Out[7]: dtype('bool')
>>
>> You should note that this stores one True/False value per byte, so it is
>> not optimal in terms of memory use. There is no easy way to do
>> bitarrays with NumPy, because we use strides to determine how to move
>> from one memory location to the next.
>>
>> See also: https://www.reddit.com/r/Python/comments/5oatp5/one_bit_data_type_in_numpy/>>
>> > What I’m hoping for is that there’s a structure that is ‘viewed’ as
>> > nanable float data, but backed but a more efficient structures
>> > internally.
>>
>> There are good implementations of this idea, such as:
>>
>> https://github.com/ilanschnell/bitarray>>
>> Those structures cannot typically utilize the NumPy machinery, though.
>> With the new array function interface, you should at least be able to
>> build something that has something close to the NumPy API.
>>
>> Best regards,
>> Stéfan
>> _______________________________________________
>> NumPyDiscussion mailing list
>> [hidden email]
>> https://mail.python.org/mailman/listinfo/numpydiscussion>
> _______________________________________________
> NumPyDiscussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpydiscussion_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


Looks like a good fit. Thanks. One option here would be to use masked arrays:
arr = np.ma.zeros(3, dtype=bool)
arr[0] = True
arr[1] = False
arr[2] = np.ma.masked
giving
masked_array(data=[True, False, ],
mask=[False, False, True],
fill_value=True)
On Thu, 18 Apr 2019 at 10:51, Stuart Reynolds <[hidden email]> wrote:
>
> Thanks. I’m aware of bool arrays.
> I think the tricky part of what I’m looking for is NULLability and interoperability with code the deals with billable data (float arrays).
>
> Currently the options seem to be float arrays, or custom operations that carry (unabstracted) categorical array data representations, such as:
> 0: false
> 1: true
> 2: NULL
>
> ... which wouldn’t be compatible with algorithms that use, say, np.isnan.
> Ideally, it would be nice to have a structure that was floatlike in that it’s compatible with nanaware operations, but it’s storage is just a single byte per cell (or less).
>
> Is float8 a thing?
>
>
> On Thu, Apr 18, 2019 at 9:46 AM Stefan van der Walt <[hidden email]> wrote:
>>
>> Hi Stuart,
>>
>> On Thu, 18 Apr 2019 09:12:31 0700, Stuart Reynolds wrote:
>> > Is there an efficient way to represent bool arrays with null entries?
>>
>> You can use the bool dtype:
>>
>> In [5]: x = np.array([True, False, True])
>>
>> In [6]: x
>> Out[6]: array([ True, False, True])
>>
>> In [7]: x.dtype
>> Out[7]: dtype('bool')
>>
>> You should note that this stores one True/False value per byte, so it is
>> not optimal in terms of memory use. There is no easy way to do
>> bitarrays with NumPy, because we use strides to determine how to move
>> from one memory location to the next.
>>
>> See also: https://www.reddit.com/r/Python/comments/5oatp5/one_bit_data_type_in_numpy/
>>
>> > What I’m hoping for is that there’s a structure that is ‘viewed’ as
>> > nanable float data, but backed but a more efficient structures
>> > internally.
>>
>> There are good implementations of this idea, such as:
>>
>> https://github.com/ilanschnell/bitarray
>>
>> Those structures cannot typically utilize the NumPy machinery, though.
>> With the new array function interface, you should at least be able to
>> build something that has something close to the NumPy API.
>>
>> Best regards,
>> Stéfan
>> _______________________________________________
>> NumPyDiscussion mailing list
>> [hidden email]
>> https://mail.python.org/mailman/listinfo/numpydiscussion
>
> _______________________________________________
> NumPyDiscussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


On Thu, Apr 18, 2019 at 10:52 AM Stuart Reynolds < [hidden email]> wrote:
no, but np.float16 is  so at least only twice as much memory as youo need :) array([ nan, inf, inf], dtype=float16)I think masked arrays are going to be just as much, as they need to carry the mask. CHB On Thu, Apr 18, 2019 at 9:46 AM Stefan van der Walt < [hidden email]> wrote: Hi Stuart,
On Thu, 18 Apr 2019 09:12:31 0700, Stuart Reynolds wrote:
> Is there an efficient way to represent bool arrays with null entries?
You can use the bool dtype:
In [5]: x = np.array([True, False, True])
In [6]: x
Out[6]: array([ True, False, True])
In [7]: x.dtype
Out[7]: dtype('bool')
You should note that this stores one True/False value per byte, so it is
not optimal in terms of memory use. There is no easy way to do
bitarrays with NumPy, because we use strides to determine how to move
from one memory location to the next.
See also: https://www.reddit.com/r/Python/comments/5oatp5/one_bit_data_type_in_numpy/
> What I’m hoping for is that there’s a structure that is ‘viewed’ as
> nanable float data, but backed but a more efficient structures
> internally.
There are good implementations of this idea, such as:
https://github.com/ilanschnell/bitarray
Those structures cannot typically utilize the NumPy machinery, though.
With the new array function interface, you should at least be able to
build something that has something close to the NumPy API.
Best regards,
Stéfan
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion
 Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 5266959 voice 7600 Sand Point Way NE (206) 5266329 fax Seattle, WA 98115 (206) 5266317 main reception [hidden email]
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion

