Splitting MaskedArray into a separate package

classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Splitting MaskedArray into a separate package

mattip
MaskedArray is a strange but useful creature. This NEP proposes to
distribute it as a separate package under the NumPy brand.

As I understand the process, a proposed NEP should be first discussed
here to gauge general acceptance, then after that the details should be
discussed on the pull request itself
https://github.com/numpy/numpy/pull/11146.

Here is the motivation section from the NEP:

> MaskedArrays are a sub-class of the NumPy ``ndarray`` that adds
> masking capabilities, i.e. the ability to ignore or hide certain array
> values during computation.
>
> While historically convenient to distribute this class inside of NumPy,
> improved packaging has made it possible to distribute it separately
> without difficulty.
>
> Motivations for this move include:
>
>  * Focus: the NumPy package should strive to only include the
>    `ndarray` object, and the essential utilities needed to manipulate
>    such arrays.
>  * Complexity: the MaskedArray implementation is non-trivial, and imposes
>    a significant maintenance burden.
>  * Compatibility: MaskedArray objects, being subclasses of `ndarrays`,
>    often cause complications when being used with other packages.
>    Fixing these issues is outside the scope of NumPy development.
>
> This NEP proposes a deprecation pathway through which MaskedArrays
> would still be accessible to users, but no longer as part of the core
> package.

Any thoughts?

Matti and Stefan


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Splitting MaskedArray into a separate package

ralfgommers


On Wed, May 23, 2018 at 12:06 PM, Matti Picus <[hidden email]> wrote:
MaskedArray is a strange but useful creature. This NEP proposes to distribute it as a separate package under the NumPy brand.

As I understand the process, a proposed NEP should be first discussed here to gauge general acceptance, then after that the details should be discussed on the pull request itself https://github.com/numpy/numpy/pull/11146.

Here is the motivation section from the NEP:

MaskedArrays are a sub-class of the NumPy ``ndarray`` that adds
masking capabilities, i.e. the ability to ignore or hide certain array
values during computation.

While historically convenient to distribute this class inside of NumPy,
improved packaging has made it possible to distribute it separately
without difficulty.

Motivations for this move include:

 * Focus: the NumPy package should strive to only include the
   `ndarray` object, and the essential utilities needed to manipulate
   such arrays.
 * Complexity: the MaskedArray implementation is non-trivial, and imposes
   a significant maintenance burden.
 * Compatibility: MaskedArray objects, being subclasses of `ndarrays`,
   often cause complications when being used with other packages.
   Fixing these issues is outside the scope of NumPy development.

Hmm, I wouldn't say it's out of scope at all. Currently it's simply part of numpy.
 

This NEP proposes a deprecation pathway through which MaskedArrays
would still be accessible to users, but no longer as part of the core
package.

Any thoughts?

You're missing an important step I think. You're proposing to deprecate MaskedArray completely (or not?). IIRC this has not been decided or seriously discussed before. 

The complexity is not going away if you intend to keep MaskedArray alive long-term, only in a separate package. It gets worse actually, because now we would need to cross-package regression testing.

Ralf



_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Splitting MaskedArray into a separate package

Eric Firing
In reply to this post by mattip
On 2018/05/23 9:06 AM, Matti Picus wrote:

> MaskedArray is a strange but useful creature. This NEP proposes to
> distribute it as a separate package under the NumPy brand.
>
> As I understand the process, a proposed NEP should be first discussed
> here to gauge general acceptance, then after that the details should be
> discussed on the pull request itself
> https://github.com/numpy/numpy/pull/11146.
>
> Here is the motivation section from the NEP:
>
>> MaskedArrays are a sub-class of the NumPy ``ndarray`` that adds
>> masking capabilities, i.e. the ability to ignore or hide certain array
>> values during computation.
>>
>> While historically convenient to distribute this class inside of NumPy,
>> improved packaging has made it possible to distribute it separately
>> without difficulty.
>>
>> Motivations for this move include:
>>
>>  * Focus: the NumPy package should strive to only include the
>>    `ndarray` object, and the essential utilities needed to manipulate
>>    such arrays.
>>  * Complexity: the MaskedArray implementation is non-trivial, and imposes
>>    a significant maintenance burden.
>>  * Compatibility: MaskedArray objects, being subclasses of `ndarrays`,
>>    often cause complications when being used with other packages.
>>    Fixing these issues is outside the scope of NumPy development.
>>
>> This NEP proposes a deprecation pathway through which MaskedArrays
>> would still be accessible to users, but no longer as part of the core
>> package.
>
> Any thoughts?
>
> Matti and Stefan

I understand at least some of the motivation and potential advantages,
but as it stands, I find this NEP highly alarming.  Masked arrays are
critical to my numpy usage, and I suspect they are critical for many
other use cases as well.  In fact, I would prefer that a high priority
for major numpy development be the more complete integration of masked
array capabilities into numpy, not their removal to a separate package.
I was unhappy to see the effort in that direction a few years ago being
killed.  I didn't agree with every design decision, but overall I
thought it was going in the right direction.

Bad or missing values (and situations where one wants to use a mask to
operate on a subset of an array) are found in many domains of real life;
do you really want python users in those domains to have to fall back on
Matlab-style reliance on nans and/or manual mask manipulations, as the
new maskedarray package is sidelined?

Or is there any realistic prospect for maintenance and improvement of
the package after it is separated out?  Or of mask/missing value
handling being integrated into numpy?  Is the latter option on the table
in any form, or is it DOA?

Side question: does your proposed purification of numpy include
elimination of linalg and random?  Based on the criteria in the NEP, I
would expect it does; so maybe you should have a more ambitious NEP, and
do the purification all in one step as a numpy version 2.0.  (Surely if
masked arrays are purged, the matrix class should be booted out at the
same time.)

Eric
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Splitting MaskedArray into a separate package

Stefan van der Walt
In reply to this post by ralfgommers
On Wed, 23 May 2018 12:29:32 -0700, Ralf Gommers wrote:
> >>  * Compatibility: MaskedArray objects, being subclasses of `ndarrays`,
> >>    often cause complications when being used with other packages.
> >>    Fixing these issues is outside the scope of NumPy development.
> >
> Hmm, I wouldn't say it's out of scope at all. Currently it's simply part of
> numpy.

That is currently the situation, yes.  I think this was meant more as
"we'd preferably not like to think about MaskedArrays any differently
than we do about other external packages, such as dask".  I.e., not
support specific hacks to make it work.

> You're missing an important step I think. You're proposing to deprecate
> MaskedArray completely (or not?). IIRC this has not been decided or
> seriously discussed before.

Good point, which certainly needs to be discussed.  My thought was to
move it out into a separate package that could be maintained more in the
spirit of a scikit by people who care deeply about its functionality.

Best regards,
Stéfan
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Splitting MaskedArray into a separate package

ralfgommers


On Wed, May 23, 2018 at 1:03 PM, Stefan van der Walt <[hidden email]> wrote:
On Wed, 23 May 2018 12:29:32 -0700, Ralf Gommers wrote:
> >>  * Compatibility: MaskedArray objects, being subclasses of `ndarrays`,
> >>    often cause complications when being used with other packages.
> >>    Fixing these issues is outside the scope of NumPy development.
> >
> Hmm, I wouldn't say it's out of scope at all. Currently it's simply part of
> numpy.

That is currently the situation, yes.  I think this was meant more as
"we'd preferably not like to think about MaskedArrays any differently
than we do about other external packages, such as dask".  I.e., not
support specific hacks to make it work.

> You're missing an important step I think. You're proposing to deprecate
> MaskedArray completely (or not?). IIRC this has not been decided or
> seriously discussed before.

Good point, which certainly needs to be discussed.  My thought was to
move it out into a separate package that could be maintained more in the
spirit of a scikit by people who care deeply about its functionality.

That would be good in principle, but it's only possible that way once the specific hacks you refer to above are removed. As long as MaskedArray depends on implementation details of ndarray, evolving them in lock-step will be necessary. And that is much easier when they're in the same package.

Regarding whether a split-off package will actually be developed, I think that depends on having at least one champion for it stepping up. If we just move it over into github.com/numpy/maskedarray, I think it will get less rather than more attention.

Cheers,
Ralf
 

Best regards,
Stéfan
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Splitting MaskedArray into a separate package

Stefan van der Walt
In reply to this post by Eric Firing
Hi Eric,

On May 23, 2018 13:25:44 Eric Firing <[hidden email]> wrote:

> On 2018/05/23 9:06 AM, Matti Picus wrote:
> I understand at least some of the motivation and potential advantages,
> but as it stands, I find this NEP highly alarming.

I am not at my computer right now, so I will respond in more detail later.
But I wanted to address your statement above:

I see a NEP as an opportunity to discuss and flesh out an idea, and I
certainly hope that you there's no reason for alarm.

I do not expect to know whether this is a good idea before discussions
conclude, so I appreciate your feedback. If we cannot find good support for
the idea, with very specific benefits, it should simply be dropped.

But, I think there's a lot to learn from the conversation in the meantime
w.r.t. exactly how streamlined people want NumPy to be, how core
functionality can perhaps be strengthened by becoming a customer of our own
API, how to optimally maintain sub-components, etc.

Best regards,
Stéfan


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Splitting MaskedArray into a separate package

Ilhan Polat
As far as I understand from the discussion above, I think the opposite would be a better strategy for the sanity of our scarce but brave maintainers. I would argue that if there is a maintenance burden, then the ballasts seem to be the linalg and random indeed. Similar pain points exist in SciPy too. There are a lot of issues that has been already thought of, years ago but never materialized (be it backwards compatibility, lack of champions and so on) because they are not the priority of the maintaining team. It is very common that a discussion ends with "yes, we should probably make it a ufunc" and then fades away. I feel that if there were less things to worry about more people would step up and "do it".

I would also argue that highest expectancy from NumPy would be having a really sound data structure basis with more ufuncs, more array manipulation tricks and so on. Masked arrays, imho, fall into that category. Hence, if the codebase gets more refined in that respect and less stuff to maintain, less moving parts, I think there would be a more coherent overall picture and more focused action plan. Now the attention of maintainers seem to be divided into a lot of orthogonal issues which is not a bad thing per se but tedious at times. Currently NumPy has a lot of code that really doesn't need to bother and can delegate to higher level packages like SciPy or any other subpackage. It sounds like NumPy 2.0 but actually more of a gradual thinning out.




On Wed, May 23, 2018 at 10:51 PM, Stefan van der Walt <[hidden email]> wrote:
Hi Eric,

On May 23, 2018 13:25:44 Eric Firing <[hidden email]> wrote:

On 2018/05/23 9:06 AM, Matti Picus wrote:
I understand at least some of the motivation and potential advantages,
but as it stands, I find this NEP highly alarming.

I am not at my computer right now, so I will respond in more detail later. But I wanted to address your statement above:

I see a NEP as an opportunity to discuss and flesh out an idea, and I certainly hope that you there's no reason for alarm.

I do not expect to know whether this is a good idea before discussions conclude, so I appreciate your feedback. If we cannot find good support for the idea, with very specific benefits, it should simply be dropped.

But, I think there's a lot to learn from the conversation in the meantime w.r.t. exactly how streamlined people want NumPy to be, how core functionality can perhaps be strengthened by becoming a customer of our own API, how to optimally maintain sub-components, etc.

Best regards,
Stéfan



_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Splitting MaskedArray into a separate package

Matthew Brett
In reply to this post by Stefan van der Walt
Hi,


On Wed, May 23, 2018 at 9:51 PM, Stefan van der Walt
<[hidden email]> wrote:

> Hi Eric,
>
> On May 23, 2018 13:25:44 Eric Firing <[hidden email]> wrote:
>
>> On 2018/05/23 9:06 AM, Matti Picus wrote:
>> I understand at least some of the motivation and potential advantages,
>> but as it stands, I find this NEP highly alarming.
>
>
> I am not at my computer right now, so I will respond in more detail later.
> But I wanted to address your statement above:
>
> I see a NEP as an opportunity to discuss and flesh out an idea, and I
> certainly hope that you there's no reason for alarm.
>
> I do not expect to know whether this is a good idea before discussions
> conclude, so I appreciate your feedback. If we cannot find good support for
> the idea, with very specific benefits, it should simply be dropped.
>
> But, I think there's a lot to learn from the conversation in the meantime
> w.r.t. exactly how streamlined people want NumPy to be, how core
> functionality can perhaps be strengthened by becoming a customer of our own
> API, how to optimally maintain sub-components, etc.

Can I ask what the plans are for supporting missing values, inside or
outside numpy?  Is there are successor to MaskedArray - and is this
part of the succession plan?

Cheers,

Matthew
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Splitting MaskedArray into a separate package

Allan Haldane
In reply to this post by Eric Firing
On 05/23/2018 04:02 PM, Eric Firing wrote:
> Bad or missing values (and situations where one wants to use a mask to
> operate on a subset of an array) are found in many domains of real life;
> do you really want python users in those domains to have to fall back on
> Matlab-style reliance on nans and/or manual mask manipulations, as the
> new maskedarray package is sidelined?

I also think that missing value support is important to include inside
numpy, just as it is included in other numerical packages like R and Julia.

The time is ripe to write a new and better MaskedArray, because
__array_ufunc__ exists now. With some other numpy devs a few months ago
we also played with rewriting MA using __array_ufunc__ and fixing up all
the bugs and inconsistencies we have discovered over time (eg, getting
rid of the Masked constant). Both Eric and I started working on some
code changes, but never submitted PRs. See a little bit of discussion
here (there was some more elsewhere I can't find now):

https://github.com/numpy/numpy/pull/9792#issuecomment-333346420

As I say there, numpy's current MA support is pretty poor compared to R
- Wes McKinney partly justified his desire to move pandas away from
numpy because of it. We have a lot to gain by implementing it nicely.

We already have an NEP discussing possible ways forward:
https://docs.scipy.org/doc/numpy-1.14.0/neps/missing-data.html

I was pretty excited by discussion above, and still am. I want to get
back to it after I finish more immediate priorities - finishing
printing/loading/saving fixes and structured array fixes.

But Masked-Array-2 is on my list of desired long-term enhancements for
numpy.

Allan


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Splitting MaskedArray into a separate package

Stefan van der Walt
In reply to this post by Matthew Brett
On May 23, 2018 14:28:05 Matthew Brett <[hidden email]> wrote:
>
> Can I ask what the plans are for supporting missing values, inside or
> outside numpy?  Is there are successor to MaskedArray - and is this
> part of the succession plan?

I am not aware of any concrete plans, maybe others can chime in?

It's a bit strange, the words that are used in this thread: "succession",
"purification", "elimination", and "purge". I don't have my knife out for
MaskedArrays; I merged a lot of Pierre's work myself. I simply suspect
there may be a better and more supporting home/project configuration for
it, perhaps still under the NumPy umbrella.


Best regards,
Stéfan


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Splitting MaskedArray into a separate package

Sebastian Berg
In reply to this post by Allan Haldane
On Wed, 2018-05-23 at 17:33 -0400, Allan Haldane wrote:

> On 05/23/2018 04:02 PM, Eric Firing wrote:
> > Bad or missing values (and situations where one wants to use a mask
> > to
> > operate on a subset of an array) are found in many domains of real
> > life;
> > do you really want python users in those domains to have to fall
> > back on
> > Matlab-style reliance on nans and/or manual mask manipulations, as
> > the
> > new maskedarray package is sidelined?
>
> I also think that missing value support is important to include
> inside
> numpy, just as it is included in other numerical packages like R and
> Julia.
>
> The time is ripe to write a new and better MaskedArray, because
> __array_ufunc__ exists now. With some other numpy devs a few months
> ago
> we also played with rewriting MA using __array_ufunc__ and fixing up
> all
> the bugs and inconsistencies we have discovered over time (eg,
> getting
> rid of the Masked constant). Both Eric and I started working on some
> code changes, but never submitted PRs. See a little bit of discussion
> here (there was some more elsewhere I can't find now):
>
> https://github.com/numpy/numpy/pull/9792#issuecomment-333346420
>
> As I say there, numpy's current MA support is pretty poor compared to
> R
> - Wes McKinney partly justified his desire to move pandas away from
> numpy because of it. We have a lot to gain by implementing it nicely.
>
> We already have an NEP discussing possible ways forward:
> https://docs.scipy.org/doc/numpy-1.14.0/neps/missing-data.html
>
> I was pretty excited by discussion above, and still am. I want to get
> back to it after I finish more immediate priorities - finishing
> printing/loading/saving fixes and structured array fixes.
>
> But Masked-Array-2 is on my list of desired long-term enhancements
> for
> numpy.
Well, if we plan to replace it within numpy, I think we should wait
until then for any move on deprecation (after which it seems like the
obviously right choice)?

If we do not plan to replace it within numpy, we need to discuss a bit
how it might affect infrastructure (multiple implementations....).

There is the other discussion about how to replace it. By opening
up/creating new masked dtypes or similar (cool but unclear how
complex/long term) or `__array_ufunc__` based (relatively simple, will
get rid of the nastier hacks that are currently needed).

Or even both, just on different time scales?

My first gut feeling about the proposal is: I love the idea to get rid
of it... but lets not do it, it does feel like it makes too much
infrastructure unclear.

- Sebastian


>
> Allan
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Splitting MaskedArray into a separate package

Matthew Brett
In reply to this post by Stefan van der Walt
Hi,

On Wed, May 23, 2018 at 10:42 PM, Stefan van der Walt
<[hidden email]> wrote:

> On May 23, 2018 14:28:05 Matthew Brett <[hidden email]> wrote:
>>
>>
>> Can I ask what the plans are for supporting missing values, inside or
>> outside numpy?  Is there are successor to MaskedArray - and is this
>> part of the succession plan?
>
>
> I am not aware of any concrete plans, maybe others can chime in?
>
> It's a bit strange, the words that are used in this thread: "succession",
> "purification", "elimination", and "purge". I don't have my knife out for
> MaskedArrays; I merged a lot of Pierre's work myself. I simply suspect there
> may be a better and more supporting home/project configuration for it,
> perhaps still under the NumPy umbrella.

The NEP notes that MaskedArray imposes a significant maintenance
burden, as a motivation for removing it.  I'm sure you'd predict that
the Numpy developers are likely to spend less time on it, if it moves
to its own package.  I guess the hope would be that others would take
over, but is that likely?  What if they don't?

Would it be reasonable to develop an alternative plan for missing
arrays in concert with this NEP, maybe along the lines that Allan
mentioned, above?

Cheers,

Matthew
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Splitting MaskedArray into a separate package

Stefan van der Walt
In reply to this post by ralfgommers
On Wed, 23 May 2018 13:30:49 -0700, Ralf Gommers wrote:
> > Good point, which certainly needs to be discussed.  My thought was to
> > move it out into a separate package that could be maintained more in the
> > spirit of a scikit by people who care deeply about its functionality.
> >
> That would be good in principle, but it's only possible that way once the
> specific hacks you refer to above are removed. As long as MaskedArray
> depends on implementation details of ndarray, evolving them in lock-step
> will be necessary. And that is much easier when they're in the same
> package.

Yes, I agree: no special hacks should exist inside of NumPy for
MaskedArrays.  We should, in this instance, become consumers of our
public facing API, and refactor that API as necessary to support it.

> Regarding whether a split-off package will actually be developed, I think
> that depends on having at least one champion for it stepping up. If we just
> move it over into github.com/numpy/maskedarray, I think it will get less
> rather than more attention.

Wouldn't this be a good test of whether MaskedArrays are as valuable as
is being argued?  If so, a community will form around it, and if not it
may fade into obscurity.

Perhaps there is a fear that, in the transition period (i.e., before
potential contributors realize that the NumPy core team is no longer
doing active maintenance) the project may flounder.  But I suspect that
is unlikely to happen, as long as we keep an eye on its test suite from
the NumPy side (perhaps execute its test suite as part of NumPy CI).

Why is the scikit model successful, considering packages could just as
well be part of SciPy?  I would guess: a strong sense of ownership, the
ability to rapidly evolve, better focus, and a lower barrier to entry
(fewer moving pieces) may all play a role.  When you own a small
package, you know no-one else will take care of problems, so you pay
careful attention.

Best regards,
Stéfan
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Splitting MaskedArray into a separate package

Matthew Rocklin
In reply to this post by Matthew Brett
Hi All,

Disclaimer: I don't spend any hours actually maintaining Numpy, so please don't take my comments here with much weight.

My gut reaction here is that if removing masked array allows Numpy to evolve more quickly then this excites me. 

It could be that a plan goes something like the following:
  1. Remove masked array to a separate package, pin it to current versions of Numpy.
  2. Evolve Numpy to the point where making new array types becomes attractive
  3. Make a new masked array with that new functionality that doesn't have the problems of the current implementation
Of course this is a simplistic view of the world, and it could also be that this triggers a forking event.  However, hopefully it gets a general theme across though that there is value to allowing Numpy to move quickly, and that it might make sense for some feature-sets to miss out on that evolution for a time for the greater good of the ecosystem's evolution.

-matt

On Wed, May 23, 2018 at 6:08 PM, Matthew Brett <[hidden email]> wrote:
Hi,

On Wed, May 23, 2018 at 10:42 PM, Stefan van der Walt
<[hidden email]> wrote:
> On May 23, 2018 14:28:05 Matthew Brett <[hidden email]> wrote:
>>
>>
>> Can I ask what the plans are for supporting missing values, inside or
>> outside numpy?  Is there are successor to MaskedArray - and is this
>> part of the succession plan?
>
>
> I am not aware of any concrete plans, maybe others can chime in?
>
> It's a bit strange, the words that are used in this thread: "succession",
> "purification", "elimination", and "purge". I don't have my knife out for
> MaskedArrays; I merged a lot of Pierre's work myself. I simply suspect there
> may be a better and more supporting home/project configuration for it,
> perhaps still under the NumPy umbrella.

The NEP notes that MaskedArray imposes a significant maintenance
burden, as a motivation for removing it.  I'm sure you'd predict that
the Numpy developers are likely to spend less time on it, if it moves
to its own package.  I guess the hope would be that others would take
over, but is that likely?  What if they don't?

Would it be reasonable to develop an alternative plan for missing
arrays in concert with this NEP, maybe along the lines that Allan
mentioned, above?

Cheers,

Matthew
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Splitting MaskedArray into a separate package

Stefan van der Walt
In reply to this post by Eric Firing
Hi Eric,

On Wed, 23 May 2018 10:02:22 -1000, Eric Firing wrote:
> Masked arrays are critical to my numpy usage, and I suspect they are
> critical for many other use cases as well.

That's good to know; and the goal of this NEP should be to improve your
siatuion, not make it worse.

> In fact, I would prefer that a high priority for major numpy
> development be the more complete integration of masked array capabilities
> into numpy, not their removal to a separate package.
>
> I was unhappy to see
> the effort in that direction a few years ago being killed.  I didn't agree
> with every design decision, but overall I thought it was going in the right
> direction.

I see this and the NEP as orthogonal issues.  MaskedArrays, one
particular version of the masked value solution, has never truly been a
first class citizen.

If we could instead implement masked arrays such that it simply sits on
top of existing NumPy functionality (using, e.g., special dtypes or
bitmasks), re-using all the standard machinery, that would be a natural
fit in the core of NumPy, and would negate the need for MaskedArrays.
But we haven't reached that point yet, and I am not aware of any current
proposal to do so.

> Bad or missing values (and situations where one wants to use a mask to
> operate on a subset of an array) are found in many domains of real life; do
> you really want python users in those domains to have to fall back on
> Matlab-style reliance on nans and/or manual mask manipulations, as the new
> maskedarray package is sidelined?

This is not too far from the current status quo, I would argue.  The
functionality exists, but it is "bolted on" rather than "built in".  And
my guess is that the component will benefit from some extra attention
that it is not getting as part of the current package.

> Or is there any realistic prospect for maintenance and improvement of the
> package after it is separated out?

In order to prevent the package from being "sidelined", we would have to
strengthen this part of the story.

> Side question: does your proposed purification of numpy include elimination
> of linalg and random?  Based on the criteria in the NEP, I would expect it
> does; so maybe you should have a more ambitious NEP, and do the purification
> all in one step as a numpy version 2.0.  (Surely if masked arrays are
> purged, the matrix class should be booted out at the same time.)

That's an interesting question, and one I have wondered about.  Would it
make sense to ship just the core ndarray object?  I don't know.  It
probably depends a lot on whether we can define clear API boundaries,
whether this kind of split is desired from the average user's
perspective, and whether it could benefit the development of the
subcomponents.

W.r.t. matrices, I think you're setting a trap for me here, but I'm
going to step into it anyway ;)

https://mail.python.org/pipermail/numpy-discussion/2013-July/067254.html

It is, then, not the first time I argued in favor of moving certain
components out of NumPy onto their own packages.  I would probably have
written that NEP this time around, had it not been for the many strings
attached via SciPy sparse (and therefore sklearn etc.).  Before matrix
deprecation can be discussed further, therefore, we need to implement
sparse *arrays* for SciPy (and some efforts are slowly underway).

See also:

https://mail.python.org/pipermail/numpy-discussion/2017-January/076290.html
http://numpy-discussion.10968.n7.nabble.com/Deprecate-matrices-in-1-15-and-remove-in-1-17-tp44968.html

Best regards,
Stéfan
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Splitting MaskedArray into a separate package

Benjamin Root
users of a package does not equate to maintainers of a package. Scikits are successful because scientists that have specialty in a field can contribute code and support the packages using their domain knowledge. How many people here are specialists in masked/missing value computation?

Would I like to see better missing value support in numpy? Sure, but until then, MaskedArrays are what we have and it is still better than just using NaNs all over the place.

Cheers!
Ben Root

On Wed, May 23, 2018 at 7:38 PM, Stefan van der Walt <[hidden email]> wrote:
Hi Eric,

On Wed, 23 May 2018 10:02:22 -1000, Eric Firing wrote:
> Masked arrays are critical to my numpy usage, and I suspect they are
> critical for many other use cases as well.

That's good to know; and the goal of this NEP should be to improve your
siatuion, not make it worse.

> In fact, I would prefer that a high priority for major numpy
> development be the more complete integration of masked array capabilities
> into numpy, not their removal to a separate package.
>
> I was unhappy to see
> the effort in that direction a few years ago being killed.  I didn't agree
> with every design decision, but overall I thought it was going in the right
> direction.

I see this and the NEP as orthogonal issues.  MaskedArrays, one
particular version of the masked value solution, has never truly been a
first class citizen.

If we could instead implement masked arrays such that it simply sits on
top of existing NumPy functionality (using, e.g., special dtypes or
bitmasks), re-using all the standard machinery, that would be a natural
fit in the core of NumPy, and would negate the need for MaskedArrays.
But we haven't reached that point yet, and I am not aware of any current
proposal to do so.

> Bad or missing values (and situations where one wants to use a mask to
> operate on a subset of an array) are found in many domains of real life; do
> you really want python users in those domains to have to fall back on
> Matlab-style reliance on nans and/or manual mask manipulations, as the new
> maskedarray package is sidelined?

This is not too far from the current status quo, I would argue.  The
functionality exists, but it is "bolted on" rather than "built in".  And
my guess is that the component will benefit from some extra attention
that it is not getting as part of the current package.

> Or is there any realistic prospect for maintenance and improvement of the
> package after it is separated out?

In order to prevent the package from being "sidelined", we would have to
strengthen this part of the story.

> Side question: does your proposed purification of numpy include elimination
> of linalg and random?  Based on the criteria in the NEP, I would expect it
> does; so maybe you should have a more ambitious NEP, and do the purification
> all in one step as a numpy version 2.0.  (Surely if masked arrays are
> purged, the matrix class should be booted out at the same time.)

That's an interesting question, and one I have wondered about.  Would it
make sense to ship just the core ndarray object?  I don't know.  It
probably depends a lot on whether we can define clear API boundaries,
whether this kind of split is desired from the average user's
perspective, and whether it could benefit the development of the
subcomponents.

W.r.t. matrices, I think you're setting a trap for me here, but I'm
going to step into it anyway ;)

https://mail.python.org/pipermail/numpy-discussion/2013-July/067254.html

It is, then, not the first time I argued in favor of moving certain
components out of NumPy onto their own packages.  I would probably have
written that NEP this time around, had it not been for the many strings
attached via SciPy sparse (and therefore sklearn etc.).  Before matrix
deprecation can be discussed further, therefore, we need to implement
sparse *arrays* for SciPy (and some efforts are slowly underway).

See also:

https://mail.python.org/pipermail/numpy-discussion/2017-January/076290.html
http://numpy-discussion.10968.n7.nabble.com/Deprecate-matrices-in-1-15-and-remove-in-1-17-tp44968.html

Best regards,
Stéfan
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Splitting MaskedArray into a separate package

Benjamin Root
As further evidence of a widely used package that is often considered "critical" to an ecosystem that gets negligible support, look no further than Basemap. It went almost two years without any commits before I took it up (and then only because my employer needed a couple of fixes).

I worry that a masked array package would turn into Basemap.

Ben Root


On Wed, May 23, 2018 at 10:52 PM, Benjamin Root <[hidden email]> wrote:
users of a package does not equate to maintainers of a package. Scikits are successful because scientists that have specialty in a field can contribute code and support the packages using their domain knowledge. How many people here are specialists in masked/missing value computation?

Would I like to see better missing value support in numpy? Sure, but until then, MaskedArrays are what we have and it is still better than just using NaNs all over the place.

Cheers!
Ben Root

On Wed, May 23, 2018 at 7:38 PM, Stefan van der Walt <[hidden email]> wrote:
Hi Eric,

On Wed, 23 May 2018 10:02:22 -1000, Eric Firing wrote:
> Masked arrays are critical to my numpy usage, and I suspect they are
> critical for many other use cases as well.

That's good to know; and the goal of this NEP should be to improve your
siatuion, not make it worse.

> In fact, I would prefer that a high priority for major numpy
> development be the more complete integration of masked array capabilities
> into numpy, not their removal to a separate package.
>
> I was unhappy to see
> the effort in that direction a few years ago being killed.  I didn't agree
> with every design decision, but overall I thought it was going in the right
> direction.

I see this and the NEP as orthogonal issues.  MaskedArrays, one
particular version of the masked value solution, has never truly been a
first class citizen.

If we could instead implement masked arrays such that it simply sits on
top of existing NumPy functionality (using, e.g., special dtypes or
bitmasks), re-using all the standard machinery, that would be a natural
fit in the core of NumPy, and would negate the need for MaskedArrays.
But we haven't reached that point yet, and I am not aware of any current
proposal to do so.

> Bad or missing values (and situations where one wants to use a mask to
> operate on a subset of an array) are found in many domains of real life; do
> you really want python users in those domains to have to fall back on
> Matlab-style reliance on nans and/or manual mask manipulations, as the new
> maskedarray package is sidelined?

This is not too far from the current status quo, I would argue.  The
functionality exists, but it is "bolted on" rather than "built in".  And
my guess is that the component will benefit from some extra attention
that it is not getting as part of the current package.

> Or is there any realistic prospect for maintenance and improvement of the
> package after it is separated out?

In order to prevent the package from being "sidelined", we would have to
strengthen this part of the story.

> Side question: does your proposed purification of numpy include elimination
> of linalg and random?  Based on the criteria in the NEP, I would expect it
> does; so maybe you should have a more ambitious NEP, and do the purification
> all in one step as a numpy version 2.0.  (Surely if masked arrays are
> purged, the matrix class should be booted out at the same time.)

That's an interesting question, and one I have wondered about.  Would it
make sense to ship just the core ndarray object?  I don't know.  It
probably depends a lot on whether we can define clear API boundaries,
whether this kind of split is desired from the average user's
perspective, and whether it could benefit the development of the
subcomponents.

W.r.t. matrices, I think you're setting a trap for me here, but I'm
going to step into it anyway ;)

https://mail.python.org/pipermail/numpy-discussion/2013-July/067254.html

It is, then, not the first time I argued in favor of moving certain
components out of NumPy onto their own packages.  I would probably have
written that NEP this time around, had it not been for the many strings
attached via SciPy sparse (and therefore sklearn etc.).  Before matrix
deprecation can be discussed further, therefore, we need to implement
sparse *arrays* for SciPy (and some efforts are slowly underway).

See also:

https://mail.python.org/pipermail/numpy-discussion/2017-January/076290.html
http://numpy-discussion.10968.n7.nabble.com/Deprecate-matrices-in-1-15-and-remove-in-1-17-tp44968.html

Best regards,
Stéfan
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion



_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Splitting MaskedArray into a separate package

Sebastian Berg
In reply to this post by Sebastian Berg
On Wed, 2018-05-23 at 23:48 +0200, Sebastian Berg wrote:
> On Wed, 2018-05-23 at 17:33 -0400, Allan Haldane wrote:

<snip>

>
> If we do not plan to replace it within numpy, we need to discuss a
> bit
> how it might affect infrastructure (multiple implementations....).
>
> There is the other discussion about how to replace it. By opening
> up/creating new masked dtypes or similar (cool but unclear how
> complex/long term) or `__array_ufunc__` based (relatively simple,
> will
> get rid of the nastier hacks that are currently needed).
>
> Or even both, just on different time scales?
>
I also somewhat like the idea of taking it out (once we have a first
replacement) in the case that we have a plan to do a better/lower level
replacement at a later point within numpy.
Removal generally has its merits, but if a (mid term) replacement will
come in any case, it would be nice to get those started first if
possible.
Otherwise downstream might end up having to fix up things twice.

- Sebastian


> My first gut feeling about the proposal is: I love the idea to get
> rid
> of it... but lets not do it, it does feel like it makes too much
> infrastructure unclear.
>
> - Sebastian
>
>
> >
> > Allan
> >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > [hidden email]
> > https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Splitting MaskedArray into a separate package

einstein.edison
I also somewhat like the idea of taking it out (once we have a first
replacement) in the case that we have a plan to do a better/lower level
replacement at a later point within numpy.
Removal generally has its merits, but if a (mid term) replacement will
come in any case, it would be nice to get those started first if
possible.
Otherwise downstream might end up having to fix up things twice.

- Sebastian

I also like the idea of designing a replacement first (using modern array protocols, perhaps in a separate repository) and then deprecating MaskedArray second. Deprecating an entire class in NumPy seems counterproductive, although I will admit I’ve never found use from it. From this thread, it’s clear that others have, though.

Sent from Astro for Mac

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Splitting MaskedArray into a separate package

Allan Haldane
In reply to this post by Sebastian Berg
On 05/24/2018 11:31 AM, Sebastian Berg wrote:

> I also somewhat like the idea of taking it out (once we have a first
> replacement) in the case that we have a plan to do a better/lower level
> replacement at a later point within numpy.
> Removal generally has its merits, but if a (mid term) replacement will
> come in any case, it would be nice to get those started first if
> possible.
> Otherwise downstream might end up having to fix up things twice.
>
> - Sebastian

Yes, I think the way forward is to start working on a new masked array
while keeping the old one in place.

Once it has progressed a little and we can step back and look at it, we
can consider how to switch over. I imagine we would have both present in
numpy under different names for a while.

Also, I think it would be nice to work on it soon because it is a chance
for us to eat our own dogfood in the __array_ufunc__ interface, which is
not yet set in stone so we can fix any problems we discover with it.

Allan
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
12