Addition of new distributions: Polya-gamma

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Addition of new distributions: Polya-gamma

zoj613
This post was updated on .
Hi All,

I would like to know if Numpy accepts addition of new distributions since the implementation of the Generator interface. If so, what is the criteria for a particular distribution to be accepted? The reason why i'm asking is because I would like to propose adding the Polya-gamma distribution to numpy, for the following reasons:

1) Polya-gamma random variables are commonly used as auxiliary variables during data augmentation in Bayesian sampling algorithms, which have wide-spread usage in Statistics and recently, Machine learning.
2) Since this distribution is mostly useful for random sampling, it since appropriate to have it in numpy and not projects like scipy [1].
3) The only python/C++ implementation of the sampler available is licensed under GPLv3 which I believe limits copying into packages that choose to use a different license [2].
4) Numpy's random API makes adding the distribution painless.

I have done preliminary work on this by implementing the distribution sampler as decribed in [3]; see: https://github.com/numpy/numpy/compare/master...zoj613:polyagamma .
There is a more efficient sampling algorithm described in a later paper [4], but I chose not to start with that one unless I know it is worth investing time in.

I would appreciate your thoughts on this proposal.

Regards,
Zolisa


Refs:
[1] https://github.com/scipy/scipy/issues/11009
[2] https://github.com/slinderman/pypolyagamma
[3] https://arxiv.org/pdf/1205.0310v1.pdf
[4] https://arxiv.org/pdf/1405.0506.pdf

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Addition of new distributions: Polya-gamma

Tom Swirly
I'm just a lurker, but I spent a minute or two to look at that commit, which looks to be high quality.  While I personally have not used this distribution, people I know use it all the time (for ML).


A quibble:

#define NPY_PI 3.141592653589793238462643383279502884 /* pi */

and the following defines which appear in numpy/random/src/distributions/random_polyagamma.c are already defined in numpy/core/include/numpy/npy_math.h

Probably it would be better to include that file instead, if it isn't already included.


DISCLAIMER: I checked none of the math other than passing my eyes over it.



On Sun, Dec 27, 2020 at 12:05 PM Zolisa Bleki <[hidden email]> wrote:
Hi All,

I would like to know if Numpy accepts addition of new distributions since the implementation of the Generator interface. If so, what is the criteria for a particular distribution to be accepted? The reason why i'm asking is because I would like to propose adding the Polya-gamma distribution to numpy, for the following reasons:

1) Polya-gamma random variables are commonly used as auxiliary variables during data augmentation in Bayesian sampling algorithms, which have wide-spread usage in Statistics and recently, Machine learning.
2) Since this distribution is mostly useful for random sampling, it since appropriate to have it in numpy and not projects like scipy [1].
3) The only python/C++ implementation of the sampler available is licensed under GPLv3 which I believe limits copying into packages that choose to use a different license [2].
4) Numpy's random API makes adding the distribution painless.

I have done preliminary work on this by implementing the distribution sampler as decribed in [3]; see: https://github.com/numpy/numpy/compare/master...zoj613:polyagamma .
There is a more efficient sampling algorithm described in a later paper [4], but I chose not to start with that one unless I know it is worth investing time in.

I would appreciate your thoughts on this proposal.

Regards,
Disclaimer - University of Cape Town This email is subject to UCT policies and email disclaimer published on our website at http://www.uct.ac.za/main/email-disclaimer or obtainable from +27 21 650 9111. If this email is not related to the business of UCT, it is sent by the sender in an individual capacity. Please report security incidents or abuse via https://csirt.uct.ac.za/page/report-an-incident.php.
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion


--

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Addition of new distributions: Polya-gamma

Stephan Hoyer-2
Thanks for putting together this clean implementation!

My concern is that Polya-Gamma is not popular enough to warrant inclusion in NumPy, which tries very hard to limit scope these days. For example, Polya-Gamma isn’t implemented in scioy.stats and doesn’t have a Wikipedia page, both of which are generally much more inclusive than NumPy.

On Sun, Dec 27, 2020 at 3:29 AM Tom Swirly <[hidden email]> wrote:
I'm just a lurker, but I spent a minute or two to look at that commit, which looks to be high quality.  While I personally have not used this distribution, people I know use it all the time (for ML).


A quibble:

#define NPY_PI 3.141592653589793238462643383279502884 /* pi */

and the following defines which appear in numpy/random/src/distributions/random_polyagamma.c are already defined in numpy/core/include/numpy/npy_math.h

Probably it would be better to include that file instead, if it isn't already included.


DISCLAIMER: I checked none of the math other than passing my eyes over it.



On Sun, Dec 27, 2020 at 12:05 PM Zolisa Bleki <[hidden email]> wrote:
Hi All,

I would like to know if Numpy accepts addition of new distributions since the implementation of the Generator interface. If so, what is the criteria for a particular distribution to be accepted? The reason why i'm asking is because I would like to propose adding the Polya-gamma distribution to numpy, for the following reasons:

1) Polya-gamma random variables are commonly used as auxiliary variables during data augmentation in Bayesian sampling algorithms, which have wide-spread usage in Statistics and recently, Machine learning.
2) Since this distribution is mostly useful for random sampling, it since appropriate to have it in numpy and not projects like scipy [1].
3) The only python/C++ implementation of the sampler available is licensed under GPLv3 which I believe limits copying into packages that choose to use a different license [2].
4) Numpy's random API makes adding the distribution painless.

I have done preliminary work on this by implementing the distribution sampler as decribed in [3]; see: https://github.com/numpy/numpy/compare/master...zoj613:polyagamma .
There is a more efficient sampling algorithm described in a later paper [4], but I chose not to start with that one unless I know it is worth investing time in.

I would appreciate your thoughts on this proposal.

Regards,
Disclaimer - University of Cape Town This email is subject to UCT policies and email disclaimer published on our website at http://www.uct.ac.za/main/email-disclaimer or obtainable from +27 21 650 9111. If this email is not related to the business of UCT, it is sent by the sender in an individual capacity. Please report security incidents or abuse via https://csirt.uct.ac.za/page/report-an-incident.php.
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion


--
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Addition of new distributions: Polya-gamma

Robert Kern-2
In reply to this post by zoj613
My view is that we will not add more non-uniform distribution (i.e. "named" statistical probability distributions like Polya-Gamma) methods to `Generator`. I think that we might add a couple more methods to handle some more fundamental issues (like sampling from the unit interval with control over whether each boundary is open or closed, maybe one more variation on shuffling) that helps write randomized algorithms. Now that we have the C and Cython APIs which allow one to implement non-uniform distributions in other packages, we strongly encourage that.

As I commented on the linked PR, `scipy.stats` would be a reasonable place for a Polya-Gamma sampling function, even if it's not feasible to implement an `rv_continuous` class for it. You have convinced me that the nature of the Polya-Gamma distribution warrants this. The only issue is that scipy still depends on a pre-`Generator` version of numpy. So I recommend implementing this function in your own package with an eye towards contributing it to scipy later.

On Sun, Dec 27, 2020 at 6:05 AM Zolisa Bleki <[hidden email]> wrote:
Hi All,

I would like to know if Numpy accepts addition of new distributions since the implementation of the Generator interface. If so, what is the criteria for a particular distribution to be accepted? The reason why i'm asking is because I would like to propose adding the Polya-gamma distribution to numpy, for the following reasons:

1) Polya-gamma random variables are commonly used as auxiliary variables during data augmentation in Bayesian sampling algorithms, which have wide-spread usage in Statistics and recently, Machine learning.
2) Since this distribution is mostly useful for random sampling, it since appropriate to have it in numpy and not projects like scipy [1].
3) The only python/C++ implementation of the sampler available is licensed under GPLv3 which I believe limits copying into packages that choose to use a different license [2].
4) Numpy's random API makes adding the distribution painless.

I have done preliminary work on this by implementing the distribution sampler as decribed in [3]; see: https://github.com/numpy/numpy/compare/master...zoj613:polyagamma .
There is a more efficient sampling algorithm described in a later paper [4], but I chose not to start with that one unless I know it is worth investing time in.

I would appreciate your thoughts on this proposal.

Regards,
Disclaimer - University of Cape Town This email is subject to UCT policies and email disclaimer published on our website at http://www.uct.ac.za/main/email-disclaimer or obtainable from +27 21 650 9111. If this email is not related to the business of UCT, it is sent by the sender in an individual capacity. Please report security incidents or abuse via https://csirt.uct.ac.za/page/report-an-incident.php.
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion


--
Robert Kern

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Addition of new distributions: Polya-gamma

zoj613
Thanks for the insightful comments. I totally understand the concerns with
maintenance. After some thought, I agree that maybe it would be best to have
it as a separate package and hopefully get it added to scipy in the future?
I will continue development at https://github.com/zoj613/polya-gamma for
those interested in using the code or contributing.



--
Sent from: http://numpy-discussion.10968.n7.nabble.com/
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Addition of new distributions: Polya-gamma

zoj613
As a follow-up to my last post here and comments from Robert Kern. I have
just released version 1.0.0 of the sampler on PyPI. The work is pretty much
done (I hope). Feedback would be very much appreciated.



--
Sent from: http://numpy-discussion.10968.n7.nabble.com/
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion