8 Real Distributions

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

8 Real Distributions

Michael Lance
TLDR;
I think this could be a useful contribution to NumPy, but I want to get feedback on where it should go (either in NumPy or elsewhere).
I have functions using numpy.random which invoke the 8 "Real" data sets as estimated by Ted Micceri in 1989. These can be useful in Monte Carlo simulations. 

Background info:

Parametric inferential statistics generally assume normal distributions (though kurtosis presents less of an issue than skew). However, in "nature", distributions are often not normal. In 1989, Ted Micceri's study (http://psycnet.apa.org/record/1989-14214-001) on real data sets resulted in the estimation of 8 "Real" distributions. Using these distributions in simulations help to produce more realistic types I and II error rate and power estimates, particularly for smaller samples. 
A similar module is currently available in Fortran called realpops.


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: 8 Real Distributions

ralfgommers


On Fri, Jan 17, 2020 at 3:46 AM Michael Lance <[hidden email]> wrote:
TLDR;
I think this could be a useful contribution to NumPy, but I want to get feedback on where it should go (either in NumPy or elsewhere).
I have functions using numpy.random which invoke the 8 "Real" data sets as estimated by Ted Micceri in 1989. These can be useful in Monte Carlo simulations. 

Thanks for the suggestion Michael. This seems too specialized for NumPy. Also, it's not 100% clear whether you want to add functions or data sets; NumPy doesn't want to ship any data sets. It sounds to me like these would be best in their own package.

Cheers,
Ralf


Background info:

Parametric inferential statistics generally assume normal distributions (though kurtosis presents less of an issue than skew). However, in "nature", distributions are often not normal. In 1989, Ted Micceri's study (http://psycnet.apa.org/record/1989-14214-001) on real data sets resulted in the estimation of 8 "Real" distributions. Using these distributions in simulations help to produce more realistic types I and II error rate and power estimates, particularly for smaller samples. 
A similar module is currently available in Fortran called realpops.

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: 8 Real Distributions

Michael Lance
Hi Ralf,

These are functions that generate data sets when invoked, like Numpy already does with mathematical distributions.




On Sat, Jan 18, 2020, 12:32 PM Ralf Gommers <[hidden email]> wrote:


On Fri, Jan 17, 2020 at 3:46 AM Michael Lance <[hidden email]> wrote:
TLDR;
I think this could be a useful contribution to NumPy, but I want to get feedback on where it should go (either in NumPy or elsewhere).
I have functions using numpy.random which invoke the 8 "Real" data sets as estimated by Ted Micceri in 1989. These can be useful in Monte Carlo simulations. 

Thanks for the suggestion Michael. This seems too specialized for NumPy. Also, it's not 100% clear whether you want to add functions or data sets; NumPy doesn't want to ship any data sets. It sounds to me like these would be best in their own package.

Cheers,
Ralf


Background info:

Parametric inferential statistics generally assume normal distributions (though kurtosis presents less of an issue than skew). However, in "nature", distributions are often not normal. In 1989, Ted Micceri's study (http://psycnet.apa.org/record/1989-14214-001) on real data sets resulted in the estimation of 8 "Real" distributions. Using these distributions in simulations help to produce more realistic types I and II error rate and power estimates, particularly for smaller samples. 
A similar module is currently available in Fortran called realpops.

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: 8 Real Distributions

Robert Kern-2
On Tue, Jan 21, 2020 at 12:37 AM Michael Lance <[hidden email]> wrote:
Hi Ralf,

These are functions that generate data sets when invoked, like Numpy already does with mathematical distributions.

It's something of a judgement call. These distribution functions are defined by tables. The most efficient way to implement them in Python would be to just make them static arrays in a module that one uses `Generator.choice()` on instead of a function that generates the arrays, just to pick out one value and throw away the array again.

Regardless of what you want to call them or how you want to implement them, a third-party module is the right place for them.

--
Robert Kern

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: 8 Real Distributions

Michael Lance
Okay, thanks! 3rd party it is.


On Mon, Jan 20, 2020, 10:54 AM Robert Kern <[hidden email]> wrote:
On Tue, Jan 21, 2020 at 12:37 AM Michael Lance <[hidden email]> wrote:
Hi Ralf,

These are functions that generate data sets when invoked, like Numpy already does with mathematical distributions.

It's something of a judgement call. These distribution functions are defined by tables. The most efficient way to implement them in Python would be to just make them static arrays in a module that one uses `Generator.choice()` on instead of a function that generates the arrays, just to pick out one value and throw away the array again.

Regardless of what you want to call them or how you want to implement them, a third-party module is the right place for them.

--
Robert Kern
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion