Hi all,
I am curious about the correct argument for "normalizing" and forwarding arguments in `__array_function__` and `__array_ufunc__`. I have listed the three rules as I understand how the "right" way is below. Mainly this is just to write down the rules that I think we should aim for in case it comes up. I admit, I do have "hidden agendas" where it may come up: * pytorch breaks rule 2 in their copy of `__array_function__`, because it allows to easily optimize away some overheads. * `__array_ufunc__` breaks rule 1 (it allows too much) and I think we should be OK to change that [1]. * A few `__array_function__`s break rule 3. That is completely harmless, but it might be nice to just be clear whether we consider it technically wrong. [2] Rules ----- 1. If an argument is invalid in NumPy it is considered and error. For example: np.log(arr, my_weird_argument=True) is always an error even if the `__array_function__` implementation of `arr` would support it. NEP 18 explicitly says that allowing forwarding could be done, but will not be done at this time. 2. Arguments must only be forwarded if they are passed in: np.mean(cupy_array) ends up as `cupy.mean(cupy_array)` and not: cupy.mean(cupy_array, axis=None, dtype=None, out=None, keepdims=False, where=True) meaning that CuPy does not need to implement all of those kwargs and NumPy can add new ones without breaking anyones code. 3. NumPy should not check the *validity* of the arguments. For example: `np.add.reduce(xarray, axis="long")` should probably work in xarray. (`xarray.DataArray` does not actually implement the above.) But a string cannot be used as an axis in NumPy. Cheers, Sebastian [1] I think `dask` breaks this rule by using an `output_dtypes` keyword. I would just consider this a valid exception and keep allowing it. In fact, `output_dtypes` may very well be a useful argument for NumPy itself. `dtype` looks like it serves that purpose, but it does not have quite the same meaning. This has been discussed also here: https://github.com/numpy/numpy/issues/8892 [2] This is done for performance reasons, although it is entirely avoidable. However, avoiding it might just add a bunch of annoying code unless part of a larger maintenance effort. _______________________________________________ NumPy-Discussion mailing list [hidden email] https://mail.python.org/mailman/listinfo/numpy-discussion signature.asc (849 bytes) Download Attachment |
numpy 1.19.3 installs fine.
numpy 1.19.4 appears to install but does not work. (Details below. The supplied tinyurl appears relevant.) Alan Isaac PS test> python38 -m pip install -U numpy Collecting numpy Using cached numpy-1.19.4-cp38-cp38-win_amd64.whl (13.0 MB) Installing collected packages: numpy Successfully installed numpy-1.19.4 PS test> python38 Python 3.8.2 (tags/v3.8.2:7b3ab59, Feb 25 2020, 23:03:10) [MSC v.1916 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import numpy ** On entry to DGEBAL parameter number 3 had an illegal value ** On entry to DGEHRD parameter number 2 had an illegal value ** On entry to DORGHR DORGQR parameter number 2 had an illegal value ** On entry to DHSEQR parameter number 4 had an illegal value Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Program Files\Python38\lib\site-packages\numpy\__init__.py", line 305, in <module> _win_os_check() File "C:\Program Files\Python38\lib\site-packages\numpy\__init__.py", line 302, in _win_os_check raise RuntimeError(msg.format(__file__)) from None RuntimeError: The current Numpy installation ('C:\\Program Files\\Python38\\lib\\site-packages\\numpy\\__init__.py') fails to pass a sanity check due to a bug in the windows runtime. See this issue for more information: https://tinyurl.com/y3dm3h86 >>> _______________________________________________ NumPy-Discussion mailing list [hidden email] https://mail.python.org/mailman/listinfo/numpy-discussion |
Yes this is known and we are waiting MS to roll out a solution for this. Here are more details https://developercommunity2.visualstudio.com/t/fmod-after-an-update-to-windows-2004-is-causing-a/1207405 On Thu, Dec 3, 2020 at 12:57 AM Alan G. Isaac <[hidden email]> wrote: numpy 1.19.3 installs fine. _______________________________________________ NumPy-Discussion mailing list [hidden email] https://mail.python.org/mailman/listinfo/numpy-discussion |
On Thu, 2020-12-03 at 01:13 +0100, Ilhan Polat wrote:
> Yes this is known and we are waiting MS to roll out a solution for > this. > Here are more details > https://developercommunity2.visualstudio.com/t/fmod-after-an-update-to-windows-2004-is-causing-a/1207405 I think one workaround was `pip install numpy==1.19.3`, which uses a different OpenBLAS that avoids the worst of the windows bug. - Sebastian > > On Thu, Dec 3, 2020 at 12:57 AM Alan G. Isaac <[hidden email]> > wrote: > > > numpy 1.19.3 installs fine. > > numpy 1.19.4 appears to install but does not work. > > (Details below. The supplied tinyurl appears relevant.) > > Alan Isaac > > > > PS test> python38 -m pip install -U numpy > > Collecting numpy > > Using cached numpy-1.19.4-cp38-cp38-win_amd64.whl (13.0 MB) > > Installing collected packages: numpy > > Successfully installed numpy-1.19.4 > > PS test> python38 > > Python 3.8.2 (tags/v3.8.2:7b3ab59, Feb 25 2020, 23:03:10) [MSC > > v.1916 64 > > bit (AMD64)] on win32 > > Type "help", "copyright", "credits" or "license" for more > > information. > > >>> import numpy > > ** On entry to DGEBAL parameter number 3 had an illegal value > > ** On entry to DGEHRD parameter number 2 had an illegal value > > ** On entry to DORGHR DORGQR parameter number 2 had an illegal > > value > > ** On entry to DHSEQR parameter number 4 had an illegal value > > Traceback (most recent call last): > > File "<stdin>", line 1, in <module> > > File "C:\Program Files\Python38\lib\site- > > packages\numpy\__init__.py", > > line 305, in <module> > > _win_os_check() > > File "C:\Program Files\Python38\lib\site- > > packages\numpy\__init__.py", > > line 302, in _win_os_check > > raise RuntimeError(msg.format(__file__)) from None > > RuntimeError: The current Numpy installation ('C:\\Program > > Files\\Python38\\lib\\site-packages\\numpy\\__init__.py') fails to > > pass a > > sanity check due to a bug > > in the windows runtime. See this issue for more information: > > https://tinyurl.com/y3dm3h86 > > >>> > > _______________________________________________ > > NumPy-Discussion mailing list > > [hidden email] > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > [hidden email] > https://mail.python.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list [hidden email] https://mail.python.org/mailman/listinfo/numpy-discussion signature.asc (849 bytes) Download Attachment |
In reply to this post by Sebastian Berg
Hi Sebastian, Looking at these three rules, they all seem to stem from one simple question: do we desire for a single code snippet to be runnable on multiple array implementations? On Wed, Dec 2, 2020, at 15:34, Sebastian Berg wrote:
Relaxing this rule will mean that code working for one array implementation (which has this keyword) may not work for another.
This may ultimately make it harder for array implementors (they will only see errors once someone tries to pass in an argument that they forgot to implement). Perhaps better to pass all so they know what they're dealing with?
Getting back to the original question: if this code is to be run on multiple implementations, we should ensure that no strange values pass through. Personally, I like the idea of a single API that works on multiple backends. As such, I would 1) not pass through unknown arguments, 2) always pass through all arguments, and 3) validate inputs to each call. Best regards, Stéfan _______________________________________________ NumPy-Discussion mailing list [hidden email] https://mail.python.org/mailman/listinfo/numpy-discussion |
In reply to this post by Ilhan Polat
"current expectation is that this [fix] will be able to be released near the end of January 2021"
! On 12/2/2020 7:13 PM, Ilhan Polat wrote: > Yes this is known and we are waiting MS to roll out a solution for this. Here are more details > https://developercommunity2.visualstudio.com/t/fmod-after-an-update-to-windows-2004-is-causing-a/1207405 > <https://developercommunity2.visualstudio.com/t/fmod-after-an-update-to-windows-2004-is-causing-a/1207405> _______________________________________________ NumPy-Discussion mailing list [hidden email] https://mail.python.org/mailman/listinfo/numpy-discussion |
The final message in the numpy issue suggests a possible fix:
https://github.com/numpy/numpy/issues/16744#issuecomment-727098973 """ My preference would be to have a 1.19.5 that uses OpenBLAS 0.3.9 on Linux and 0.3.12 on Windows, but I don't know if this is possible. """ I don't know if that would fix the problem or how hard that would be to implement but I am seeing an increasing flow of problem reports in different fora about this issue and I'm surprised that the plan is to wait for MS to (possibly) release a fix in two month's time. -- Oscar On Thu, 3 Dec 2020 at 21:14, Alan G. Isaac <[hidden email]> wrote: > > "current expectation is that this [fix] will be able to be released near the end of January 2021" > ! > > > On 12/2/2020 7:13 PM, Ilhan Polat wrote: > > Yes this is known and we are waiting MS to roll out a solution for this. Here are more details > > https://developercommunity2.visualstudio.com/t/fmod-after-an-update-to-windows-2004-is-causing-a/1207405 > > <https://developercommunity2.visualstudio.com/t/fmod-after-an-update-to-windows-2004-is-causing-a/1207405> > _______________________________________________ > NumPy-Discussion mailing list > [hidden email] > https://mail.python.org/mailman/listinfo/numpy-discussion NumPy-Discussion mailing list [hidden email] https://mail.python.org/mailman/listinfo/numpy-discussion |
I believe the next releases (1.19.x or 1.20.0, whichever if first) of NumPy should have 0.3.12 built using a smaller buffersize which will make this import error stop appearing on Windows, irrespective of the MS-provided fix. From: [hidden email] The final message in the numpy issue suggests a possible fix: https://github.com/numpy/numpy/issues/16744#issuecomment-727098973 """ My preference would be to have a 1.19.5 that uses OpenBLAS 0.3.9 on Linux and 0.3.12 on Windows, but I don't know if this is possible. """ I don't know if that would fix the problem or how hard that would be to implement but I am seeing an increasing flow of problem reports in different fora about this issue and I'm surprised that the plan is to wait for MS to (possibly) release a fix in two month's time. -- Oscar On Thu, 3 Dec 2020 at 21:14, Alan G. Isaac <[hidden email]> wrote: > > "current expectation is that this [fix] will be able to be released near the end of January 2021" > ! > > > On 12/2/2020 7:13 PM, Ilhan Polat wrote: > > Yes this is known and we are waiting MS to roll out a solution for this. Here are more details > > https://developercommunity2.visualstudio.com/t/fmod-after-an-update-to-windows-2004-is-causing-a/1207405 > > <https://developercommunity2.visualstudio.com/t/fmod-after-an-update-to-windows-2004-is-causing-a/1207405> > _______________________________________________ > NumPy-Discussion mailing list > https://mail.python.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list https://mail.python.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list [hidden email] https://mail.python.org/mailman/listinfo/numpy-discussion |
In reply to this post by Stefan van der Walt
On Wed, 2020-12-02 at 21:07 -0800, Stefan van der Walt wrote:
> Hi Sebastian, > > Looking at these three rules, they all seem to stem from one simple > question: do we desire for a single code snippet to be runnable on > multiple array implementations? > > On Wed, Dec 2, 2020, at 15:34, Sebastian Berg wrote: > > 1. If an argument is invalid in NumPy it is considered and error. > > For example: > > > > np.log(arr, my_weird_argument=True) > > > > is always an error even if the `__array_function__` > > implementation > > of `arr` would support it. > > NEP 18 explicitly says that allowing forwarding could be done, > > but > > will not be done at this time. > > Relaxing this rule will mean that code working for one array > implementation (which has this keyword) may not work for another. Indeed, while NEP 18 mentions it, I personally don't see why we should relax it. (The NEP 13 implementation does so, but this is an unintentional, and not optimal, implementation detail.) > > > 2. Arguments must only be forwarded if they are passed in: > > > > np.mean(cupy_array) > > > > ends up as `cupy.mean(cupy_array)` and not: > > > > cupy.mean(cupy_array, axis=None, dtype=None, out=None, > > keepdims=False, where=True) > > > > meaning that CuPy does not need to implement all of those kwargs > > and > > NumPy can add new ones without breaking anyones code. > > This may ultimately make it harder for array implementors (they will > only see errors once someone tries to pass in an argument that they > forgot to implement). Perhaps better to pass all so they know what > they're dealing with? `obj.mean()`, but compared to protocols which explicitly ask for NumPy compatibility, those method forwards are not as clearly defined. So maybe we should actually pass on everything (including the default value?), that is actually safer if we ever update the default. The downside would remain that a newer NumPy is likely to cause a break until the project updates (e.g. if we add a keyword argument). If we were open to this (plus an insignificant change in subclass handling), it would be easy to at least half the overhead of Python `__array_function__` dispatching. That is because it would allow us to inline (in python): def function(arg1, arg2, kwarg1=None): dispatched = dispatch((arg1,), arg1, arg2, kwarg1=kwarg1) if dispatched is not NotImplemented: return dispatched # normal code here (some argument validation could come first) This may look strange, but has to go through 1-2 function calls where currently we go through 4. The other change, would also allow us to remove *all* overhead for functions defined in C. > > > 3. NumPy should not check the *validity* of the arguments. For > > example: > > `np.add.reduce(xarray, axis="long")` should probably work in > > xarray. > > (`xarray.DataArray` does not actually implement the above.) > > But a string cannot be used as an axis in NumPy. > > Getting back to the original question: if this code is to be run on > multiple implementations, we should ensure that no strange values > pass through. > > Personally, I like the idea of a single API that works on multiple > backends. As such, I would 1) not pass through unknown arguments, 2) > always pass through all arguments, and 3) validate inputs to each > call. Thanks for the input! I think point 2) is in the sense the most interesting, because the approach `pytorch` takes to remove the overhead of array-function gets very complicated without it. In the end, parsing validity should maybe be considered an implementation detail... I.e. if there is a good reason why validating is a problem, we can stop doing it and otherwise there is no need to worry about it. (Although for ufuncs, I would go the non-validating route for now personally.) Cheers, Sebastian > > Best regards, > Stéfan > _______________________________________________ > NumPy-Discussion mailing list > [hidden email] > https://mail.python.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list [hidden email] https://mail.python.org/mailman/listinfo/numpy-discussion signature.asc (849 bytes) Download Attachment |
In reply to this post by Sebastian Berg
On Wed, Dec 2, 2020 at 3:39 PM Sebastian Berg <[hidden email]> wrote: 1. If an argument is invalid in NumPy it is considered and error. From my perspective, this is a good thing: it ensures that NumPy's API is only used for features that exist in NumPy. Otherwise I can imagine causing considerable confusion. If you want to use my_weird_argument, you can call my_weird_library.log() instead. 2. Arguments must only be forwarded if they are passed in: My reasoning here was two-fold: 1. To avoid the unfortunate situation for functions like np.mean(), where NumPy jumps through considerable hoops to avoid passing extra arguments in an ad-hoc way to preserve backwards compatibility 2. To make it easy for a library to implement "incomplete" versions of NumPy's API, by simply omitting arguments. The idea was that NumPy's API is open to partial implementations, but not extension. 3. NumPy should not check the *validity* of the arguments. For example: I don't think libraries should be encouraged to abuse NumPy's API to mean something else. Certainly I would not use this in xarray :). If we could check the validity of arguments cheaply, that would be fine by me. But I didn't think it was worth adding overhead to every function call. Perhaps type annotations could be relied on for these sorts of checks? I am pretty happy considering not checking the validity of arguments to be an implementation detail for now. _______________________________________________ NumPy-Discussion mailing list [hidden email] https://mail.python.org/mailman/listinfo/numpy-discussion |
On Sat, 2020-12-05 at 22:05 -0800, Stephan Hoyer wrote:
> On Wed, Dec 2, 2020 at 3:39 PM Sebastian Berg < > [hidden email]> > wrote: > > > 1. If an argument is invalid in NumPy it is considered and error. > > For example: > > > > np.log(arr, my_weird_argument=True) > > > > is always an error even if the `__array_function__` > > implementation > > of `arr` would support it. > > NEP 18 explicitly says that allowing forwarding could be done, > > but > > will not be done at this time. > > > > From my perspective, this is a good thing: it ensures that NumPy's > API is > only used for features that exist in NumPy. Otherwise I can imagine > causing > considerable confusion. > > If you want to use my_weird_argument, you can call > my_weird_library.log() > instead. > > > > 2. Arguments must only be forwarded if they are passed in: > > > > np.mean(cupy_array) > > > > ends up as `cupy.mean(cupy_array)` and not: > > > > cupy.mean(cupy_array, axis=None, dtype=None, out=None, > > keepdims=False, where=True) > > > > meaning that CuPy does not need to implement all of those kwargs > > and > > NumPy can add new ones without breaking anyones code. > > > > My reasoning here was two-fold: > 1. To avoid the unfortunate situation for functions like np.mean(), > where > NumPy jumps through considerable hoops to avoid passing extra > arguments in > an ad-hoc way to preserve backwards compatibility > 2. To make it easy for a library to implement "incomplete" versions > of > NumPy's API, by simply omitting arguments. > > The idea was that NumPy's API is open to partial implementations, but > not > extension. > don't need to use `**kwargs` to be able to know which arguments were not passed). I guess the alternative is to force everyone to keep up, but you are of course allowed to raise a NotImplementedError (which is actually nicer for users, probably). > > > 3. NumPy should not check the *validity* of the arguments. For > > example: > > `np.add.reduce(xarray, axis="long")` should probably work in > > xarray. > > (`xarray.DataArray` does not actually implement the above.) > > But a string cannot be used as an axis in NumPy. > > > > I don't think libraries should be encouraged to abuse NumPy's API to > mean > something else. Certainly I would not use this in xarray :). > > If we could check the validity of arguments cheaply, that would be > fine by > me. But I didn't think it was worth adding overhead to every function > call. > Perhaps type annotations could be relied on for these sorts of > checks? I am > pretty happy considering not checking the validity of arguments to be > an > implementation detail for now. because it assumes the call graph is: array_funciton_implementation(...): impl = find_implementation() # impl may be the default return impl() Unlike for __array_ufunc__ where it is: ufunc.__call__(*args, **kwargs): if needs_to_defer(args, kwargs): return defered_result() If you assume that NumPy is the main consumer, especially on the C- side, validating the arguments (e.g. integer axis, NumPy dtypes) can make things more comfortable. "inlining" the dispatching as the second case, makes things quite a bit more comfortable, but is not necessary. However, it requires a small change to the default __array_function__ (i.e. you have to change the meaning to "defaul __array_function__" is the same as no array function.) Cheers, Sebastian > _______________________________________________ > NumPy-Discussion mailing list > [hidden email] > https://mail.python.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list [hidden email] https://mail.python.org/mailman/listinfo/numpy-discussion signature.asc (849 bytes) Download Attachment |
Free forum by Nabble | Edit this page |