New package to speed up ufunc inner loops

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

New package to speed up ufunc inner loops

mattip
Administrator
Hi. On behalf of Quansight and RTOSHoldings, I would like to introduce
"pnumpy", a package to speed up NumPy.

https://quansight.github.io/numpy-threading-extensions/stable/index.html


What is in it?

- use "PyUFunc_ReplaceLoopBySignature" to hook all the UFunc inner loops

- When the inner loop is called with a large enough array, chunk the
data and perform the iteration via a thread pool

- Add a different memory allocator for "ndarray" data (will require an
appropriate API from NumPy)

- Allow using optimized loops above and beyond what NumPy provides

- Allow logging inner loop calls and parameters to learn about the
current process and perhaps tune the performance accordingly


The first release contains the hooking mechanism and the thread pool,
the rest has been prototyped but is not ready for release. The idea
behind the package is that a third-party package can try things out and
iterate much faster than NumPy. If some of the ideas bear fruit, and do
not add an undue maintenance burden to NumPy, the code can be ported to
NumPy. I am not sure NumPy wishes to take upon itself the burden of
managing threads, but a third-party package may be able to.


I am writing to the mailing list both to announce the pre-release under
the wrong name, and, in accordance with the fair play rules[1], to
request use of the "numpy" name in the package. We had considered many
options, in the end would like to propose "pnumpy" (the p is either
"parallel" or "performant" or "preliminary", whatever you desire).


Matti


[1] https://numpy.org/neps/nep-0036-fair-play.html#fair-play-rules

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: New package to speed up ufunc inner loops

ralfgommers


On Tue, Nov 3, 2020 at 3:54 PM Matti Picus <[hidden email]> wrote:
Hi. On behalf of Quansight and RTOSHoldings, I would like to introduce
"pnumpy", a package to speed up NumPy.

https://quansight.github.io/numpy-threading-extensions/stable/index.html


What is in it?

- use "PyUFunc_ReplaceLoopBySignature" to hook all the UFunc inner loops

- When the inner loop is called with a large enough array, chunk the
data and perform the iteration via a thread pool

- Add a different memory allocator for "ndarray" data (will require an
appropriate API from NumPy)

- Allow using optimized loops above and beyond what NumPy provides

- Allow logging inner loop calls and parameters to learn about the
current process and perhaps tune the performance accordingly


The first release contains the hooking mechanism and the thread pool,
the rest has been prototyped but is not ready for release. The idea
behind the package is that a third-party package can try things out and
iterate much faster than NumPy. If some of the ideas bear fruit, and do
not add an undue maintenance burden to NumPy, the code can be ported to
NumPy. I am not sure NumPy wishes to take upon itself the burden of
managing threads, but a third-party package may be able to.


I am writing to the mailing list both to announce the pre-release under
the wrong name, and, in accordance with the fair play rules[1], to
request use of the "numpy" name in the package. We had considered many
options, in the end would like to propose "pnumpy" (the p is either
"parallel" or "performant" or "preliminary", whatever you desire).

Thanks Matti!

Obviously as another Quansight employee I have a conflict of interest here, so let me just say I wasn't involved with choosing the `pnumpy` name but did already comment internally on using "numpy" as part of the package name would probably be fine, given that Matti is the main author and the intent is to migrate the useful parts into NumPy itself.

Hopefully someone else can comment, maybe Stéfan as the "fair play" NEP author?

Cheers,
Ralf




Matti


[1] https://numpy.org/neps/nep-0036-fair-play.html#fair-play-rules

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: New package to speed up ufunc inner loops

Aaron Meurer
I hope this isn't too off topic, but this "fair play" NEP reads like
it is a set of additional restrictions on the NumPy license, which if
it is, would make NumPy no longer open source by the OSI definition. I
think the NEP should be much clearer that these are requests but not
requirements.

Aaron Meurer

On Wed, Nov 4, 2020 at 2:44 PM Ralf Gommers <[hidden email]> wrote:

>
>
>
> On Tue, Nov 3, 2020 at 3:54 PM Matti Picus <[hidden email]> wrote:
>>
>> Hi. On behalf of Quansight and RTOSHoldings, I would like to introduce
>> "pnumpy", a package to speed up NumPy.
>>
>> https://quansight.github.io/numpy-threading-extensions/stable/index.html
>>
>>
>> What is in it?
>>
>> - use "PyUFunc_ReplaceLoopBySignature" to hook all the UFunc inner loops
>>
>> - When the inner loop is called with a large enough array, chunk the
>> data and perform the iteration via a thread pool
>>
>> - Add a different memory allocator for "ndarray" data (will require an
>> appropriate API from NumPy)
>>
>> - Allow using optimized loops above and beyond what NumPy provides
>>
>> - Allow logging inner loop calls and parameters to learn about the
>> current process and perhaps tune the performance accordingly
>>
>>
>> The first release contains the hooking mechanism and the thread pool,
>> the rest has been prototyped but is not ready for release. The idea
>> behind the package is that a third-party package can try things out and
>> iterate much faster than NumPy. If some of the ideas bear fruit, and do
>> not add an undue maintenance burden to NumPy, the code can be ported to
>> NumPy. I am not sure NumPy wishes to take upon itself the burden of
>> managing threads, but a third-party package may be able to.
>>
>>
>> I am writing to the mailing list both to announce the pre-release under
>> the wrong name, and, in accordance with the fair play rules[1], to
>> request use of the "numpy" name in the package. We had considered many
>> options, in the end would like to propose "pnumpy" (the p is either
>> "parallel" or "performant" or "preliminary", whatever you desire).
>
>
> Thanks Matti!
>
> Obviously as another Quansight employee I have a conflict of interest here, so let me just say I wasn't involved with choosing the `pnumpy` name but did already comment internally on using "numpy" as part of the package name would probably be fine, given that Matti is the main author and the intent is to migrate the useful parts into NumPy itself.
>
> Hopefully someone else can comment, maybe Stéfan as the "fair play" NEP author?
>
> Cheers,
> Ralf
>
>
>>
>>
>> Matti
>>
>>
>> [1] https://numpy.org/neps/nep-0036-fair-play.html#fair-play-rules
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> [hidden email]
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: New package to speed up ufunc inner loops

Robert Kern-2
On Wed, Nov 4, 2020 at 4:49 PM Aaron Meurer <[hidden email]> wrote:
I hope this isn't too off topic, but this "fair play" NEP reads like
it is a set of additional restrictions on the NumPy license, which if
it is, would make NumPy no longer open source by the OSI definition. I
think the NEP should be much clearer that these are requests but not
requirements.

FWIW, I don't read the NEP like that. Aside from the trademark on the name "NumPy", which _are_ enforceable requirements but are orthogonal to the copyright license, I see enough "request-like" language on everything else.

--
Robert Kern

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: New package to speed up ufunc inner loops

Sebastian Berg
In reply to this post by mattip
On Tue, 2020-11-03 at 17:54 +0200, Matti Picus wrote:
> Hi. On behalf of Quansight and RTOSHoldings, I would like to
> introduce
> "pnumpy", a package to speed up NumPy.
>
> https://quansight.github.io/numpy-threading-extensions/stable/index.html
>

Nice to see these efforts especially with intention of possible
upstreaming.  I hope we can improve the NumPy infrastructure to make
these tries much easier and powerful in the future! (And as I
mentioned, I had such things in mind with NEP 43, albeit as a possible
later extension, not an explicit goal.)

I am a bit curious about the actual performance improvements even
without allowing more flexibility on the NumPy side, my gut feeling
would be fairly large variations with sometimes big improvements due to
parallelization bug often only added overheads due to NumPy not giving
you deep enough control?


As to the name, I don't have an issue with using `pnumpy`, although I
was never hugely concerned about it.

Initially I thought a longer name might be nicer, but the old(?)
accelerated-numpy or fast_numpy_loops doesn't seem that much clearer to
me.  I guess in the end, I think its just important to be clear that
this type of project patches/modifies NumPy and is not associated with
it directly.

It seams `pnumpy` is currently taken on PyPI with a small amount of
downloads: https://pypistats.org/packages/pnumpy
(Although I wonder how many are actual users.), though.

Cheers,

Sebastian


>
> What is in it?
>
> - use "PyUFunc_ReplaceLoopBySignature" to hook all the UFunc inner
> loops
>
> - When the inner loop is called with a large enough array, chunk the
> data and perform the iteration via a thread pool
>
> - Add a different memory allocator for "ndarray" data (will require
> an
> appropriate API from NumPy)
>
> - Allow using optimized loops above and beyond what NumPy provides
>
> - Allow logging inner loop calls and parameters to learn about the
> current process and perhaps tune the performance accordingly
>
>
> The first release contains the hooking mechanism and the thread
> pool,
> the rest has been prototyped but is not ready for release. The idea
> behind the package is that a third-party package can try things out
> and
> iterate much faster than NumPy. If some of the ideas bear fruit, and
> do
> not add an undue maintenance burden to NumPy, the code can be ported
> to
> NumPy. I am not sure NumPy wishes to take upon itself the burden of
> managing threads, but a third-party package may be able to.
>
>
> I am writing to the mailing list both to announce the pre-release
> under
> the wrong name, and, in accordance with the fair play rules[1], to
> request use of the "numpy" name in the package. We had considered
> many
> options, in the end would like to propose "pnumpy" (the p is either
> "parallel" or "performant" or "preliminary", whatever you desire).
>
>
> Matti
>
>
> [1] https://numpy.org/neps/nep-0036-fair-play.html#fair-play-rules
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
>

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: New package to speed up ufunc inner loops

Stefan van der Walt
In reply to this post by Aaron Meurer
On Wed, Nov 4, 2020, at 13:47, Aaron Meurer wrote:
> I hope this isn't too off topic, but this "fair play" NEP reads like
> it is a set of additional restrictions on the NumPy license, which if
> it is, would make NumPy no longer open source by the OSI definition. I
> think the NEP should be much clearer that these are requests but not
> requirements.

Specifically, the NEP is worded as follows:

"""
This document aims to define a minimal set of rules that, when followed, will be considered good-faith efforts in line with the expectations of the NumPy developers.

...

When in doubt, please talk to us first. We may suggest an alternative; at minimum, we’ll be prepared.
"""

There is no language of forced restriction.

The heading in question is "Do not reuse the NumPy name for projects not developed by the NumPy community".  Matti is a member of our community, and while the project may be sponsored by others, he is doing exactly what the NEP recommends: discussing the issue with the community.

Community members should weigh in if they see an issue with the naming.  I don't think this is a particularly good name for a package (not easy to pronounce, does not indicate functionality of the package), but I don't personally have an issue with it either.

Best regards,
Stéfan
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: New package to speed up ufunc inner loops

Aaron Meurer
In reply to this post by Robert Kern-2
On Wed, Nov 4, 2020 at 3:02 PM Robert Kern <[hidden email]> wrote:

>
> On Wed, Nov 4, 2020 at 4:49 PM Aaron Meurer <[hidden email]> wrote:
>>
>> I hope this isn't too off topic, but this "fair play" NEP reads like
>> it is a set of additional restrictions on the NumPy license, which if
>> it is, would make NumPy no longer open source by the OSI definition. I
>> think the NEP should be much clearer that these are requests but not
>> requirements.
>
>
> FWIW, I don't read the NEP like that. Aside from the trademark on the name "NumPy", which _are_ enforceable requirements but are orthogonal to the copyright license, I see enough "request-like" language on everything else.

To be clear, I don't read it like that either. But I also implicitly
understand that this is the intention of the document, because I know
that NumPy wouldn't actually place restrictions like these on its
license. My point is just that the document ought to be clearer about
this, as I can easily see someone misinterpreting it, especially if
they aren't close enough to the community that they would implicitly
understand that it is only a set of guidelines.

> There is no language of forced restriction.

The language you quoted reads ambiguously to me. It isn't forced, but
it also isn't obviously nonforced. "Please talk to us first" is the
sort of language I would expect to see for software that is
commercially licensed and can only be used with permission. All the
bullet points say "do not", which sounds forced to me. And the
trademark thing makes it even more confusing because even if you read
the rest as "only guidelines", it isn't clear if this is somehow an
exception.

Again, *I* understand the purpose of this document, but I think the
way it is currently written it could easily be misinterpreted by
someone else.

Aaron Meurer

>
> --
> Robert Kern
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: New package to speed up ufunc inner loops

Stefan van der Walt
On Wed, Nov 4, 2020, at 14:54, Aaron Meurer wrote:
> Again, *I* understand the purpose of this document, but I think the
> way it is currently written it could easily be misinterpreted by
> someone else.

Misinterpreted in what way? That they would think we have an ability to enforce the guidelines? We *are* trying to encourage certain behavior here. If they read it and, our of abundant caution reach out to us, that's a fine outcome.

What negative outcomes do you foresee?

Stéfan
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: New package to speed up ufunc inner loops

Robert Kern-2
In reply to this post by Aaron Meurer
On Wed, Nov 4, 2020 at 5:55 PM Aaron Meurer <[hidden email]> wrote:
On Wed, Nov 4, 2020 at 3:02 PM Robert Kern <[hidden email]> wrote:
>
> On Wed, Nov 4, 2020 at 4:49 PM Aaron Meurer <[hidden email]> wrote:
>>
>> I hope this isn't too off topic, but this "fair play" NEP reads like
>> it is a set of additional restrictions on the NumPy license, which if
>> it is, would make NumPy no longer open source by the OSI definition. I
>> think the NEP should be much clearer that these are requests but not
>> requirements.
>
>
> FWIW, I don't read the NEP like that. Aside from the trademark on the name "NumPy", which _are_ enforceable requirements but are orthogonal to the copyright license, I see enough "request-like" language on everything else.

To be clear, I don't read it like that either. But I also implicitly
understand that this is the intention of the document, because I know
that NumPy wouldn't actually place restrictions like these on its
license. My point is just that the document ought to be clearer about
this, as I can easily see someone misinterpreting it, especially if
they aren't close enough to the community that they would implicitly
understand that it is only a set of guidelines.

> There is no language of forced restriction.

The language you quoted reads ambiguously to me. It isn't forced, but
it also isn't obviously nonforced. "Please talk to us first" is the
sort of language I would expect to see for software that is
commercially licensed and can only be used with permission. All the
bullet points say "do not", which sounds forced to me. And the
trademark thing makes it even more confusing because even if you read
the rest as "only guidelines", it isn't clear if this is somehow an
exception.

If you pick out an individual sentence and consider it in isolation, sure. But there's a significant amount of context in the Abstract, Motivation, and Scope sections that preface the rules. And the discussion of many of the rules explicitly discusses ways to "break" the rules if you have to. We use "rule" language in many contexts besides legally-enforceable contracts and licenses.

Again, *I* understand the purpose of this document, but I think the
way it is currently written it could easily be misinterpreted by
someone else.

I'm willing to wait for someone to actually misinterpret it.

That's not to say that there isn't clearer language that could be drafted. The NEP is still in Draft stage. But if you think it could be clearer, please propose specific edits to the draft. Like with unclear documentation, it's the person who finds the current docs insufficient/confusing/unclear that is in the best position to recommend the language that would have helped them. Collaboration helps.
 
--
Robert Kern

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: New package to speed up ufunc inner loops

Aaron Meurer
> Misinterpreted in what way? That they would think we have an ability to enforce the guidelines? We *are* trying to encourage certain behavior here. If they read it and, our of abundant caution reach out to us, that's a fine outcome.
> What negative outcomes do you foresee?

That it is a legal requirement, as part of the license to use NumPy.
The negative outcome is that someone reads the document and believes
NumPy to not actually be open source software.

> That's not to say that there isn't clearer language that could be drafted. The NEP is still in Draft stage. But if you think it could be clearer, please propose specific edits to the draft. Like with unclear documentation, it's the person who finds the current docs insufficient/confusing/unclear that is in the best position to recommend the language that would have helped them. Collaboration helps.

I disagree. The best person to write documentation is the person who
actually understands the package. I already noted that I don't
actually understand the actual situation with the trademark, for
instance.

I don't really understand why there is pushback for making NEP
clearer. Also "like with unclear documentation", if someone says that
documentation is unclear, you should take their word for it that it
actually is, and improve it, rather than somehow trying to argue that
they actually aren't confused.

But as I noted, this is already off topic for the original discussion
here, and since there's apparently no interest in improving the NEP
wording, I'll drop it.

Aaron Meurer
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: New package to speed up ufunc inner loops

Stefan van der Walt
On Wed, Nov 4, 2020, at 16:21, Aaron Meurer wrote:
> But as I noted, this is already off topic for the original discussion
> here, and since there's apparently no interest in improving the NEP
> wording, I'll drop it.

I was trying to understand where, specifically, the language falls short, and what to do about improving it.

Perhaps a sentence making it clear that this is not a licensing issue will assuage your concerns? If not, please help me understand where statements are overly strong, unclear, or insufficient in coverage.

Best regards,
Stéfan
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: New package to speed up ufunc inner loops

Robert Kern-2
In reply to this post by Aaron Meurer
On Wed, Nov 4, 2020 at 7:22 PM Aaron Meurer <[hidden email]> wrote:

> That's not to say that there isn't clearer language that could be drafted. The NEP is still in Draft stage. But if you think it could be clearer, please propose specific edits to the draft. Like with unclear documentation, it's the person who finds the current docs insufficient/confusing/unclear that is in the best position to recommend the language that would have helped them. Collaboration helps.

I disagree. The best person to write documentation is the person who
actually understands the package. I already noted that I don't
actually understand the actual situation with the trademark, for
instance.

Rather, I meant that the best person to fix confusing language is the person who was confused, after consultation with the authors/experts come to a consensus about what was intended.
 
I don't really understand why there is pushback for making NEP
clearer. Also "like with unclear documentation", if someone says that
documentation is unclear, you should take their word for it that it
actually is, and improve it, rather than somehow trying to argue that
they actually aren't confused.

I'm not. I'm saying that I don't know how to make it more clear to those people because I'm not experiencing it like they are. The things I could think to add are the same kinds of things that were already stated explicitly in the Abstract, Motivation, and Scope. It seems like Stefan is in the same boat. Authors need editors, but the editor can't just say "rewrite!" I don't know what kind of assumptions and context this hypothetical reader is bringing to this reading that are leading to confusion. Sometimes it's clear, but not for me, here (and more relevantly, Stefan).

Do you think this needs a complete revamp? Or just an additional sentence to explicitly state that this does not add additional legal restrictions to the copyright license?

--
Robert Kern

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion