Hi all,
At the prodding [1] of Sebastian, I’m starting a discussion on the decision to deprecate np.{bool,float,int}. This deprecation broke our prerelease testing in scikit-image (which, hooray for rcs!), and resulted in a large amount of code churn to fix [2]. To be honest, I do think *some* sort of deprecation is needed, because for the longest time I thought that np.float was what np.float_ actually is. I think it would be worthwhile to move to *that*, though it’s an even more invasive deprecation than the currently proposed one. Writing `x = np.zeros(5, dtype=int)` is somewhat magical, because someone with a strict typing mindset (there’s an increasing number!) might expect that this is an array of pointers to Python ints. This is why I’ve always preferred to write `dtype=np.int`, resulting in the current code churn. I don’t know what the best answer is, just sparking the discussion Sebastian wants to see. ;) For skimage we’ve already merged a fix (even if it is one of dubious quality, as Stéfan points out [3] ;), so I don’t have too much stake in the outcome. Juan. [1]: https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-739334463 [2]: https://github.com/scikit-image/scikit-image/pull/5103 [3]: https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-739368765 _______________________________________________ NumPy-Discussion mailing list [hidden email] https://mail.python.org/mailman/listinfo/numpy-discussion |
On Sun, Dec 6, 2020 at 12:31 AM Juan Nunez-Iglesias <[hidden email]> wrote:
> > Hi all, > > At the prodding [1] of Sebastian, I’m starting a discussion on the decision to deprecate np.{bool,float,int}. This deprecation broke our prerelease testing in scikit-image (which, hooray for rcs!), and resulted in a large amount of code churn to fix [2]. > > To be honest, I do think *some* sort of deprecation is needed, because for the longest time I thought that np.float was what np.float_ actually is. I think it would be worthwhile to move to *that*, though it’s an even more invasive deprecation than the currently proposed one. Writing `x = np.zeros(5, dtype=int)` is somewhat magical, because someone with a strict typing mindset (there’s an increasing number!) might expect that this is an array of pointers to Python ints. This is why I’ve always preferred to write `dtype=np.int`, resulting in the current code churn. > > I don’t know what the best answer is, just sparking the discussion Sebastian wants to see. ;) For skimage we’ve already merged a fix (even if it is one of dubious quality, as Stéfan points out [3] ;), so I don’t have too much stake in the outcome. Hi Juan, Let me start with a disclaimer that I'm an end user, and as such it's very easy for me to be bold when it comes to deprecations :) But I experienced the same thing that you describe in https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-739429373 : > [I]t was very surprising to me when I found out that np.float is float. For the longest time I thought that np.float was equivalent to "whatever the default float value is on my platform", and considered it best practice to use that instead of plain float. 😅 I think that is a common misconception. And I'm pretty sure the vast majority of end users faces this. The proper np.float32 and other types are intuitive enough that people don't go out of their way to read the documentation in detail, and it's highly unexpected that some `np.*` types are mere aliases. Now, this should probably not be a problem as long as people only stick these aliases into `dtype` keyword arguments, because that works as expected (based on the wrong premise). But once you extrapolate from the `dtype=np.int` behaviour to "`np.int` must be my native numpy int type" you can easily get subtle bugs. For instance, you might expect `isinstance(this_type, np.int)` to give you True if `this_type` is the type of an item of an array with `dtype=np.int`. To be fair I'm not sure that I've ever been bitten by this personally... but once you're aware of the pitfall it seems really ominous. I guess one helpful question is this: among all the code churn needed to fix the breakage did you find any bugs that were revealed by the deprecation? If that's the case (in scikit-image or any other large downstream library) then that would be a good argument for going forward with the deprecation. Cheers, András > Juan. > > [1]: https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-739334463 > [2]: https://github.com/scikit-image/scikit-image/pull/5103 > [3]: https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-739368765 > _______________________________________________ > NumPy-Discussion mailing list > [hidden email] > https://mail.python.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list [hidden email] https://mail.python.org/mailman/listinfo/numpy-discussion |
In reply to this post by Juan Nunez-Iglesias-2
On Sat, Dec 5, 2020 at 4:31 PM Juan Nunez-Iglesias <[hidden email]> wrote: Hi all, I checked pandas and astropy and both have several uses of the deprecated types but should be easy to fix. I suppose the question is if we want to make them fix things right now :) Chuck _______________________________________________ NumPy-Discussion mailing list [hidden email] https://mail.python.org/mailman/listinfo/numpy-discussion |
I guess if the answer is to stop people from
there is a good fix for that doesn’t involve deprecating If the answer is to deprecate
then one can add a warning to the It just doesn’t seem worthwhile to to stop people from using “I want this to be a numpy integer, not necessarily a python integer”. On Sat, Dec 5, 2020 at 10:14 PM Charles R Harris <[hidden email]> wrote:
_______________________________________________ NumPy-Discussion mailing list [hidden email] https://mail.python.org/mailman/listinfo/numpy-discussion |
On Sat, Dec 5, 2020 at 9:24 PM Mark Harfouche <[hidden email]> wrote:
The problem is that there is assuredly code that inadvertently relies upon this (mis)feature. If we change the behavior of np.int() to create np.int64() objects instead of int() objects, it is likely to result in breaking some user code. Even with a prior warning, this breakage may be surprising and very hard to track down. In contrast, it's much safer to simply remove np.int entirely, because if users ignore the deprecation they end up with an error. This is a general feature for deprecations: it's much safer to remove functionality than it is to change behavior. So on the whole, I think this is the right call.
_______________________________________________ NumPy-Discussion mailing list [hidden email] https://mail.python.org/mailman/listinfo/numpy-discussion |
On Sun, Dec 6, 2020 at 12:52 AM Stephan Hoyer <[hidden email]> wrote:
FWIW (and IIRC), this was the original misfeature. `np.int`, `np.bool`, and `np.float` were aliases for their corresponding default scalar types in the first numpy releases. However, too many people were doing `from numpy import *` and covering up the builtins. We renamed these aliases with trailing underscores to avoid that problem, but too many people (even in those early days) still had uses of `dtype=np.int`. Making `np.int is int` was the backwards-compatibility hack. Robert Kern _______________________________________________ NumPy-Discussion mailing list [hidden email] https://mail.python.org/mailman/listinfo/numpy-discussion |
In reply to this post by Charles R Harris
On Sat, 2020-12-05 at 20:12 -0700, Charles R Harris wrote:
> On Sat, Dec 5, 2020 at 4:31 PM Juan Nunez-Iglesias <[hidden email]> > wrote: > > > Hi all, > > > > At the prodding [1] of Sebastian, I’m starting a discussion on the > > decision to deprecate np.{bool,float,int}. This deprecation broke > > our > > prerelease testing in scikit-image (which, hooray for rcs!), and > > resulted > > in a large amount of code churn to fix [2]. > > > > To be honest, I do think *some* sort of deprecation is needed, > > because for > > the longest time I thought that np.float was what np.float_ > > actually is. I > > think it would be worthwhile to move to *that*, though it’s an even > > more > > invasive deprecation than the currently proposed one. Writing `x = > > np.zeros(5, dtype=int)` is somewhat magical, because someone with a > > strict > > typing mindset (there’s an increasing number!) might expect that > > this is an > > array of pointers to Python ints. This is why I’ve always preferred > > to > > write `dtype=np.int`, resulting in the current code churn. > > > > I don’t know what the best answer is, just sparking the discussion > > Sebastian wants to see. ;) For skimage we’ve already merged a fix > > (even if > > it is one of dubious quality, as Stéfan points out [3] ;), so I > > don’t have > > too much stake in the outcome. > > > > Juan. > > > > [1]: > > https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-739334463 > > [2]: https://github.com/scikit-image/scikit-image/pull/5103 > > [3]: > > https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-739368765 > > > > I checked pandas and astropy and both have several uses of the > deprecated > types but should be easy to fix. I suppose the question is if we want > to > make them fix things *right now* :) > The reason why I thought it might be good to bring this up again is that I am not sure clear on how painful the deprecation is; which should be weighed against the benefit. And the benefit here is only moderate. Thus, with the things now in and a few more people exposed to it, if anyone thinks its a bad idea or that we should delay, I am all ears. Cheers, Sebastian > Chuck > _______________________________________________ > NumPy-Discussion mailing list > [hidden email] > https://mail.python.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list [hidden email] https://mail.python.org/mailman/listinfo/numpy-discussion signature.asc (849 bytes) Download Attachment |
On Sun, Dec 6, 2020 at 4:23 PM Sebastian Berg <[hidden email]> wrote: On Sat, 2020-12-05 at 20:12 -0700, Charles R Harris wrote: It will be painful as in "lots of churn", but the fixes are straightforward. And it's clear many knowledgeable users didn't know they were aliases, so there is something to gain here. Whether or not we revert the deprecation, I'd be in favor of improving the docs to answer the most common questions and pitfalls, like: - What happens when I use Python builtin types with the dtype keyword? - How do I check if something is an integer array? Or a NumPy or Python integer? - What are default integer, float and complex precisions on all platforms? - How do I iterate over all floating point dtypes when writing tests? - Which of the many equivalent dtypes should I prefer? --> use float64, not float_ or double - warning: float128 and float96 do not exist on all platforms Related: it's still easy to have things leak into the namespace unintentionally - `np.sys` and `np.os` exist too. I think we can probably clean those up without a deprecation, but we should write some more public API tests that prevent this kind of thing. Cheers, Ralf
_______________________________________________ NumPy-Discussion mailing list [hidden email] https://mail.python.org/mailman/listinfo/numpy-discussion |
If the CI noise in downstream libraries is particularly painful, we could switch to `PendingDeprecationWarning` instead of `DeprecationWarning` to make it easier to add the warnings to an ignore list. I think this might make the warning less visible to end users though, who are the users that this deprecation was really aimed at. Eric On Mon, 7 Dec 2020 at 11:39, Ralf Gommers <[hidden email]> wrote:
_______________________________________________ NumPy-Discussion mailing list [hidden email] https://mail.python.org/mailman/listinfo/numpy-discussion |
In reply to this post by Juan Nunez-Iglesias-2
Regarding np.bool specifically, if you want to deprecate this, you
might want to discuss this with us at the array API standard https://github.com/data-apis/array-api (which is currently in RFC stage). The spec uses bool as the name for the boolean dtype. Would it make sense for NumPy to change np.bool to just be the boolean dtype object? Unlike int and float, there is no ambiguity with bool, and NumPy clearly doesn't have any issues with shadowing builtin names in its namespace. Aaron Meurer On Sat, Dec 5, 2020 at 4:31 PM Juan Nunez-Iglesias <[hidden email]> wrote: > > Hi all, > > At the prodding [1] of Sebastian, I’m starting a discussion on the decision to deprecate np.{bool,float,int}. This deprecation broke our prerelease testing in scikit-image (which, hooray for rcs!), and resulted in a large amount of code churn to fix [2]. > > To be honest, I do think *some* sort of deprecation is needed, because for the longest time I thought that np.float was what np.float_ actually is. I think it would be worthwhile to move to *that*, though it’s an even more invasive deprecation than the currently proposed one. Writing `x = np.zeros(5, dtype=int)` is somewhat magical, because someone with a strict typing mindset (there’s an increasing number!) might expect that this is an array of pointers to Python ints. This is why I’ve always preferred to write `dtype=np.int`, resulting in the current code churn. > > I don’t know what the best answer is, just sparking the discussion Sebastian wants to see. ;) For skimage we’ve already merged a fix (even if it is one of dubious quality, as Stéfan points out [3] ;), so I don’t have too much stake in the outcome. > > Juan. > > [1]: https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-739334463 > [2]: https://github.com/scikit-image/scikit-image/pull/5103 > [3]: https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-739368765 > _______________________________________________ > NumPy-Discussion mailing list > [hidden email] > https://mail.python.org/mailman/listinfo/numpy-discussion NumPy-Discussion mailing list [hidden email] https://mail.python.org/mailman/listinfo/numpy-discussion |
On Mon, 2020-12-07 at 14:18 -0700, Aaron Meurer wrote:
> Regarding np.bool specifically, if you want to deprecate this, you > might want to discuss this with us at the array API standard > https://github.com/data-apis/array-api (which is currently in RFC > stage). The spec uses bool as the name for the boolean dtype. > > Would it make sense for NumPy to change np.bool to just be the > boolean > dtype object? Unlike int and float, there is no ambiguity with bool, > and NumPy clearly doesn't have any issues with shadowing builtin > names > in its namespace. as `np.bool_`). I am not sure I like the idea of immediately shadowing the builtin. That is a switch we can avoid flipping (without warning); `np.bool_` and `bool` are fairly different beasts? [1] OTOH, if someone wants to entertain switching... It could be interesting to see how (unfixed) downstream projects react to it. One approach would be: * Go ahead for now (deprecate) * Add a FutureWarning at some point that we _will_ start to export `np.bool` again (but `from numpy import *` is a problem?) * Aim to make `np.bool is np.bool_` at some point in the (far) future. It is multi-step (and I recall opinions that multi-step is bad). Although, I think the main argument against it was to not force users to modify code more than once. And I do not think that happens here. Of course we could use the `FutureWarning` right away, but I don't mind taking it slow. Cheers, Sebastian [1] I admit, probably almost nobody would notice. And usually using a Python `bool` is better... > > Aaron Meurer > > On Sat, Dec 5, 2020 at 4:31 PM Juan Nunez-Iglesias <[hidden email]> > wrote: > > Hi all, > > > > At the prodding [1] of Sebastian, I’m starting a discussion on the > > decision to deprecate np.{bool,float,int}. This deprecation broke > > our prerelease testing in scikit-image (which, hooray for rcs!), > > and resulted in a large amount of code churn to fix [2]. > > > > To be honest, I do think *some* sort of deprecation is needed, > > because for the longest time I thought that np.float was what > > np.float_ actually is. I think it would be worthwhile to move to > > *that*, though it’s an even more invasive deprecation than the > > currently proposed one. Writing `x = np.zeros(5, dtype=int)` is > > somewhat magical, because someone with a strict typing mindset > > (there’s an increasing number!) might expect that this is an array > > of pointers to Python ints. This is why I’ve always preferred to > > write `dtype=np.int`, resulting in the current code churn. > > > > I don’t know what the best answer is, just sparking the discussion > > Sebastian wants to see. ;) For skimage we’ve already merged a fix > > (even if it is one of dubious quality, as Stéfan points out [3] ;), > > so I don’t have too much stake in the outcome. > > > > Juan. > > > > [1]: > > https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-739334463 > > [2]: https://github.com/scikit-image/scikit-image/pull/5103 > > [3]: > > https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-739368765 > > _______________________________________________ > > NumPy-Discussion mailing list > > [hidden email] > > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > [hidden email] > https://mail.python.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list [hidden email] https://mail.python.org/mailman/listinfo/numpy-discussion signature.asc (849 bytes) Download Attachment |
On Wed, Dec 9, 2020 at 9:41 AM Sebastian Berg
<[hidden email]> wrote: > > On Mon, 2020-12-07 at 14:18 -0700, Aaron Meurer wrote: > > Regarding np.bool specifically, if you want to deprecate this, you > > might want to discuss this with us at the array API standard > > https://github.com/data-apis/array-api (which is currently in RFC > > stage). The spec uses bool as the name for the boolean dtype. > > > > Would it make sense for NumPy to change np.bool to just be the > > boolean > > dtype object? Unlike int and float, there is no ambiguity with bool, > > and NumPy clearly doesn't have any issues with shadowing builtin > > names > > in its namespace. > > We could keep the Python alias around (which for `dtype=` is the same > as `np.bool_`). > > I am not sure I like the idea of immediately shadowing the builtin. > That is a switch we can avoid flipping (without warning); `np.bool_` > and `bool` are fairly different beasts? [1] NumPy already shadows a lot of builtins, in many cases, in ways that are incompatible with existing ones. It's not something I would have done personally, but it's been this way for a long time. Aaron Meurer > OTOH, if someone wants to entertain switching... It could be > interesting to see how (unfixed) downstream projects react to it. > > One approach would be: > > * Go ahead for now (deprecate) > * Add a FutureWarning at some point that we _will_ start to export > `np.bool` again (but `from numpy import *` is a problem?) > * Aim to make `np.bool is np.bool_` at some point in the (far) future. > > It is multi-step (and I recall opinions that multi-step is bad). > Although, I think the main argument against it was to not force users > to modify code more than once. And I do not think that happens here. > > Of course we could use the `FutureWarning` right away, but I don't mind > taking it slow. > > Cheers, > > Sebastian > > > > [1] I admit, probably almost nobody would notice. And usually using a > Python `bool` is better... > > > > > > Aaron Meurer > > > > On Sat, Dec 5, 2020 at 4:31 PM Juan Nunez-Iglesias <[hidden email]> > > wrote: > > > Hi all, > > > > > > At the prodding [1] of Sebastian, I’m starting a discussion on the > > > decision to deprecate np.{bool,float,int}. This deprecation broke > > > our prerelease testing in scikit-image (which, hooray for rcs!), > > > and resulted in a large amount of code churn to fix [2]. > > > > > > To be honest, I do think *some* sort of deprecation is needed, > > > because for the longest time I thought that np.float was what > > > np.float_ actually is. I think it would be worthwhile to move to > > > *that*, though it’s an even more invasive deprecation than the > > > currently proposed one. Writing `x = np.zeros(5, dtype=int)` is > > > somewhat magical, because someone with a strict typing mindset > > > (there’s an increasing number!) might expect that this is an array > > > of pointers to Python ints. This is why I’ve always preferred to > > > write `dtype=np.int`, resulting in the current code churn. > > > > > > I don’t know what the best answer is, just sparking the discussion > > > Sebastian wants to see. ;) For skimage we’ve already merged a fix > > > (even if it is one of dubious quality, as Stéfan points out [3] ;), > > > so I don’t have too much stake in the outcome. > > > > > > Juan. > > > > > > [1]: > > > https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-739334463 > > > [2]: https://github.com/scikit-image/scikit-image/pull/5103 > > > [3]: > > > https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-739368765 > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > [hidden email] > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > > NumPy-Discussion mailing list > > [hidden email] > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > [hidden email] > https://mail.python.org/mailman/listinfo/numpy-discussion NumPy-Discussion mailing list [hidden email] https://mail.python.org/mailman/listinfo/numpy-discussion |
On Wed, Dec 9, 2020 at 4:08 PM Aaron Meurer <[hidden email]> wrote: On Wed, Dec 9, 2020 at 9:41 AM Sebastian Berg Sometimes, we had the function first before Python added them to the builtins (e.g. sum(), any(), all(), IIRC). I think max() and min() are the main ones that we added after Python did, and we explicitly exclude them from __all__ to avoid clobbering the builtins. Shadowing the types (bool, int, float) historically tended to be more problematic than those functions. The first releases of numpy _did_ have those as the scalar types. That empirically turned out to cause more problems for people than sum() or any(), so we renamed the scalar types to have the trailing underscore. We only left the shadowed names as aliases for the builtins because enough people still had `dtype=np.float` in their code that we didn't want to break. All that said, "from numpy import *" is less common these days. We have been pretty successful at getting people on board with the np campaign. Robert Kern _______________________________________________ NumPy-Discussion mailing list [hidden email] https://mail.python.org/mailman/listinfo/numpy-discussion |
In reply to this post by Aaron Meurer
On Wed, Dec 9, 2020 at 1:07 PM Aaron Meurer <[hidden email]> wrote: On Wed, Dec 9, 2020 at 9:41 AM Sebastian Berg It may be defensible to keep np.bool as an alias for Python's bool even when we remove the other aliases. np.int_ and np.float_ have fixed precision, which makes them somewhat different from the builtin types. NumPy has a whole bunch of different precisions for integer and floats, so this distinction matters. In contrast, there is only one boolean dtype in NumPy, which matches Python's bool. So we wouldn't have to worry, for example, about whether a user has requested a specific precision explicitly. This comes up in issues like type-promotion where libraries like JAX and PyTorch have special case logic for most Python types vs NumPy dtypes (but booleans are the same for both):
_______________________________________________ NumPy-Discussion mailing list [hidden email] https://mail.python.org/mailman/listinfo/numpy-discussion |
On Wed, 2020-12-09 at 13:37 -0800, Stephan Hoyer wrote:
> On Wed, Dec 9, 2020 at 1:07 PM Aaron Meurer <[hidden email]> > wrote: > > > On Wed, Dec 9, 2020 at 9:41 AM Sebastian Berg > > <[hidden email]> wrote: > > > > > > On Mon, 2020-12-07 at 14:18 -0700, Aaron Meurer wrote: > > > > Regarding np.bool specifically, if you want to deprecate this, > > > > you > > > > might want to discuss this with us at the array API standard > > > > https://github.com/data-apis/array-api (which is currently in > > > > RFC > > > > stage). The spec uses bool as the name for the boolean dtype. > > > > > > > > Would it make sense for NumPy to change np.bool to just be the > > > > boolean > > > > dtype object? Unlike int and float, there is no ambiguity with > > > > bool, > > > > and NumPy clearly doesn't have any issues with shadowing > > > > builtin > > > > names > > > > in its namespace. > > > > > > We could keep the Python alias around (which for `dtype=` is the > > > same > > > as `np.bool_`). > > > > > > I am not sure I like the idea of immediately shadowing the > > > builtin. > > > That is a switch we can avoid flipping (without warning); > > > `np.bool_` > > > and `bool` are fairly different beasts? [1] > > > > NumPy already shadows a lot of builtins, in many cases, in ways > > that > > are incompatible with existing ones. It's not something I would > > have > > done personally, but it's been this way for a long time. > > > > It may be defensible to keep np.bool as an alias for Python's bool > even > when we remove the other aliases. all compatible to a Python integer, but rather the "default" integer (which happens to be the same as C `long` currently). So we could focus on `np.int`, `np.long`. I am a bit unsure whether you would prefer that or are mainly pointing out the possibility? Right now, my main take-away from the discussion is that it would be good to clarify the release notes a bit more. Using `float` for a dtype seems fine to me, but I prefer mentioning `np.float64` over `np.float_`. For integers, I wonder if we should also suggest `np.int64`, even – or because – if the default integer on many systems is currently `np.int_`? Cheers, Sebastian > > np.int_ and np.float_ have fixed precision, which makes them somewhat > different from the builtin types. NumPy has a whole bunch of > different > precisions for integer and floats, so this distinction matters. > > In contrast, there is only one boolean dtype in NumPy, which matches > Python's bool. So we wouldn't have to worry, for example, about > whether a > user has requested a specific precision explicitly. This comes up in > issues > like type-promotion where libraries like JAX and PyTorch have special > case > logic for most Python types vs NumPy dtypes (but booleans are the > same for > both): > https://jax.readthedocs.io/en/latest/type_promotion.html > > > > > > > Aaron Meurer > > > > > OTOH, if someone wants to entertain switching... It could be > > > interesting to see how (unfixed) downstream projects react to it. > > > > > > One approach would be: > > > > > > * Go ahead for now (deprecate) > > > * Add a FutureWarning at some point that we _will_ start to > > > export > > > `np.bool` again (but `from numpy import *` is a problem?) > > > * Aim to make `np.bool is np.bool_` at some point in the (far) > > > future. > > > > > > It is multi-step (and I recall opinions that multi-step is bad). > > > Although, I think the main argument against it was to not force > > > users > > > to modify code more than once. And I do not think that happens > > > here. > > > > > > Of course we could use the `FutureWarning` right away, but I > > > don't mind > > > taking it slow. > > > > > > Cheers, > > > > > > Sebastian > > > > > > > > > > > > [1] I admit, probably almost nobody would notice. And usually > > > using a > > > Python `bool` is better... > > > > > > > > > > > > > > Aaron Meurer > > > > > > > > On Sat, Dec 5, 2020 at 4:31 PM Juan Nunez-Iglesias < > > > > [hidden email]> > > > > wrote: > > > > > Hi all, > > > > > > > > > > At the prodding [1] of Sebastian, I’m starting a discussion > > > > > on the > > > > > decision to deprecate np.{bool,float,int}. This deprecation > > > > > broke > > > > > our prerelease testing in scikit-image (which, hooray for > > > > > rcs!), > > > > > and resulted in a large amount of code churn to fix [2]. > > > > > > > > > > To be honest, I do think *some* sort of deprecation is > > > > > needed, > > > > > because for the longest time I thought that np.float was what > > > > > np.float_ actually is. I think it would be worthwhile to move > > > > > to > > > > > *that*, though it’s an even more invasive deprecation than > > > > > the > > > > > currently proposed one. Writing `x = np.zeros(5, dtype=int)` > > > > > is > > > > > somewhat magical, because someone with a strict typing > > > > > mindset > > > > > (there’s an increasing number!) might expect that this is an > > > > > array > > > > > of pointers to Python ints. This is why I’ve always preferred > > > > > to > > > > > write `dtype=np.int`, resulting in the current code churn. > > > > > > > > > > I don’t know what the best answer is, just sparking the > > > > > discussion > > > > > Sebastian wants to see. ;) For skimage we’ve already merged a > > > > > fix > > > > > (even if it is one of dubious quality, as Stéfan points out > > > > > [3] ;), > > > > > so I don’t have too much stake in the outcome. > > > > > > > > > > Juan. > > > > > > > > > > [1]: > > > > > > > https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-739334463 > > > > > [2]: https://github.com/scikit-image/scikit-image/pull/5103 > > > > > [3]: > > > > > > > https://github.com/scikit-image/scikit-image/pull/5103#issuecomment-739368765 > > > > > _______________________________________________ > > > > > NumPy-Discussion mailing list > > > > > [hidden email] > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > > > > NumPy-Discussion mailing list > > > > [hidden email] > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > [hidden email] > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > > NumPy-Discussion mailing list > > [hidden email] > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > [hidden email] > https://mail.python.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list [hidden email] https://mail.python.org/mailman/listinfo/numpy-discussion signature.asc (849 bytes) Download Attachment |
On Thu, Dec 10, 2020 at 7:25 PM Sebastian Berg <[hidden email]> wrote: On Wed, 2020-12-09 at 13:37 -0800, Stephan Hoyer wrote: I'd agree with that.
Not sure what you mean with focus, focus on describing in the release notes? Deprecating `np.int` seems like the most beneficial part of this whole exercise. Right now, my main take-away from the discussion is that it would be I agree. I think we should recommend sane, descriptive names that do the right thing. So ideally we'd have people spell their dtype specifiers as dtype=bool # or np.bool dtype=np.float64 dtype=np.int64 dtype=np.complex128 The names with underscores at the end make little sense from a UX perspective. And the C equivalents (single/double/etc) made sense 15 years ago, but with the user base of today - the majority of whom will not know C fluently or at all - also don't make too much sense. The `dtype=int` or `dtype=np.int_` behaviour flopping between 32 and 64 bits is likely to be a pitfall much more often than it is what the user actually needs, so shouldn't be recommended and probably deserves a warning in the docs. Cheers, Ralf
_______________________________________________ NumPy-Discussion mailing list [hidden email] https://mail.python.org/mailman/listinfo/numpy-discussion |
On Thu, 2020-12-10 at 20:38 +0100, Ralf Gommers wrote:
> On Thu, Dec 10, 2020 at 7:25 PM Sebastian Berg < > [hidden email]> > wrote: > > > On Wed, 2020-12-09 at 13:37 -0800, Stephan Hoyer wrote: > > > On Wed, Dec 9, 2020 at 1:07 PM Aaron Meurer <[hidden email]> > > > wrote: > > > > > > > On Wed, Dec 9, 2020 at 9:41 AM Sebastian Berg > > > > <[hidden email]> wrote: > > > > > > > > > > On Mon, 2020-12-07 at 14:18 -0700, Aaron Meurer wrote: > > > > > > Regarding np.bool specifically, if you want to deprecate > > > > > > this, > > > > > > you > > > > > > might want to discuss this with us at the array API > > > > > > standard > > > > > > https://github.com/data-apis/array-api (which is currently > > > > > > in > > > > > > RFC > > > > > > stage). The spec uses bool as the name for the boolean > > > > > > dtype. > > > > > > > > > > > > Would it make sense for NumPy to change np.bool to just be > > > > > > the > > > > > > boolean > > > > > > dtype object? Unlike int and float, there is no ambiguity > > > > > > with > > > > > > bool, > > > > > > and NumPy clearly doesn't have any issues with shadowing > > > > > > builtin > > > > > > names > > > > > > in its namespace. > > > > > > > > > > We could keep the Python alias around (which for `dtype=` is > > > > > the > > > > > same > > > > > as `np.bool_`). > > > > > > > > > > I am not sure I like the idea of immediately shadowing the > > > > > builtin. > > > > > That is a switch we can avoid flipping (without warning); > > > > > `np.bool_` > > > > > and `bool` are fairly different beasts? [1] > > > > > > > > NumPy already shadows a lot of builtins, in many cases, in ways > > > > that > > > > are incompatible with existing ones. It's not something I would > > > > have > > > > done personally, but it's been this way for a long time. > > > > > > > > > > It may be defensible to keep np.bool as an alias for Python's > > > bool > > > even when we remove the other aliases. > > > > I'd agree with that. > > > > That is true, `int` is probably the most confusing, since it is not > > at > > all compatible to a Python integer, but rather the "default" > > integer > > (which happens to be the same as C `long` currently). > > > > So we could focus on `np.int`, `np.long`. I am a bit unsure > > whether > > you would prefer that or are mainly pointing out the possibility? > > > > Not sure what you mean with focus, focus on describing in the release > notes? Deprecating `np.int` seems like the most beneficial part of > this > whole exercise. > and a "carefully chosen" set. To be honest, I don't mind either way, so any stronger opinion will tip the scale for me personally (my default currently is to update the release notes to recommend the more descriptive names). There are probably more doc updates that would be nice, I will suggest updating a separate issue for that. > Right now, my main take-away from the discussion is that it would be > > good to clarify the release notes a bit more. > > > > Using `float` for a dtype seems fine to me, but I prefer mentioning > > `np.float64` over `np.float_`. > > For integers, I wonder if we should also suggest `np.int64`, even – > > or > > because – if the default integer on many systems is currently > > `np.int_`? > > > > I agree. I think we should recommend sane, descriptive names that do > the > right thing. So ideally we'd have people spell their dtype specifiers > as > dtype=bool # or np.bool > dtype=np.float64 > dtype=np.int64 > dtype=np.complex128 > The names with underscores at the end make little sense from a UX > perspective. And the C equivalents (single/double/etc) made sense 15 > years > ago, but with the user base of today - the majority of whom will not > know C > fluently or at all - also don't make too much sense. > > The `dtype=int` or `dtype=np.int_` behaviour flopping between 32 and > 64 > bits is likely to be a pitfall much more often than it is what the > user > actually needs, so shouldn't be recommended and probably deserves a > warning > in the docs. integer dtype to use, because it is the integer that NumPy uses for all things related to indexing and array sizes. (I would be happy to dig out my PR making `np.intp` the default NumPy integer.) Cheers, Sebastian > > Cheers, > Ralf > > > > > > > > > > np.int_ and np.float_ have fixed precision, which makes them > > > somewhat > > > different from the builtin types. NumPy has a whole bunch of > > > different > > > precisions for integer and floats, so this distinction matters. > > > > > > In contrast, there is only one boolean dtype in NumPy, which > > > matches > > > Python's bool. So we wouldn't have to worry, for example, about > > > whether a > > > user has requested a specific precision explicitly. This comes up > > > in > > > issues > > > like type-promotion where libraries like JAX and PyTorch have > > > special > > > case > > > logic for most Python types vs NumPy dtypes (but booleans are the > > > same for > > > both): > > > https://jax.readthedocs.io/en/latest/type_promotion.html > > > > > _______________________________________________ > NumPy-Discussion mailing list > [hidden email] > https://mail.python.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list [hidden email] https://mail.python.org/mailman/listinfo/numpy-discussion signature.asc (849 bytes) Download Attachment |
>
you might want to discuss this with us at the array API standard > https://github.com/data-apis/array-api (which is currently in RFC > stage). The spec uses bool as the name for the boolean dtype. I don't fully understand this argument - `np.bool` is already not the boolean dtype. Either: * The spec is suggesting that `pkg.bool` be some arbitrary object that can be passed into a dtype argument and will produce a boolean array. If this is the case, the spec could also just require that `dtype=builtins.bool` have this behavior. * The spec is suggesting that `pkg.bool` is some rich dtype object. Ignoring the question of whether this should be `np.bool_` or `np.dtype(np.bool_)`, it's currently neither, and changing it will break users relying on `np.bool(True) is True`. That's not to say this isn't a sensible thing for the specification to have, it's just something that numpy can't conform to without breaking code. While it would be great if `np.bool_` could be spelt `np.bool`, I really don't think we can make that change without a long deprecation first (if at all). Eric On Thu, 10 Dec 2020 at 20:00, Sebastian Berg <[hidden email]> wrote: On Thu, 2020-12-10 at 20:38 +0100, Ralf Gommers wrote: _______________________________________________ NumPy-Discussion mailing list [hidden email] https://mail.python.org/mailman/listinfo/numpy-discussion |
On Fri, Dec 11, 2020 at 9:47 AM Eric Wieser <[hidden email]> wrote:
Yes, this.
It can have richer behaviour, there's no constraints there - but it's not necessary.
Given that that standard API would be in a new namespace (given backwards compat we can't possibly introduce it in the main namespace), there `bool` can be the numpy boolean dtype (if desired). The key point is that `bool_` is a terrible name, and keeping `np.bool` that you can use as a dtype specifier is desirable. Cheers, Ralf
_______________________________________________ NumPy-Discussion mailing list [hidden email] https://mail.python.org/mailman/listinfo/numpy-discussion |
In reply to this post by Sebastian Berg
On Thu, Dec 10, 2020 at 9:00 PM Sebastian Berg <[hidden email]> wrote: On Thu, 2020-12-10 at 20:38 +0100, Ralf Gommers wrote: Just deprecation `np.int` may make sense. That will already raise awareness, and leaving `np.float` as-is may prevent a lot of churn. And we could then still deprecate `np.float` later. I also don't feel strongly about `float` either way though. I'm not sure why you'd specifically touch `long`, it's not really relevant and it's not a builtin. Cheers, Ralf To be honest, I don't mind either way, so any stronger opinion will tip _______________________________________________ NumPy-Discussion mailing list [hidden email] https://mail.python.org/mailman/listinfo/numpy-discussion |
Free forum by Nabble | Edit this page |