

TLDR; I think this could be a useful contribution to NumPy, but I want to get feedback on where it should go (either in NumPy or elsewhere). I have functions using numpy.random which invoke the 8 "Real" data sets as estimated by Ted Micceri in 1989. These can be useful in Monte Carlo simulations.
Background info:
Parametric inferential statistics generally assume normal distributions (though kurtosis presents less of an issue than skew). However, in "nature", distributions are often not normal. In 1989, Ted Micceri's study ( http://psycnet.apa.org/record/198914214001) on real data sets resulted in the estimation of 8 "Real" distributions. Using these distributions in simulations help to produce more realistic types I and II error rate and power estimates, particularly for smaller samples. A similar module is currently available in Fortran called realpops.
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


TLDR; I think this could be a useful contribution to NumPy, but I want to get feedback on where it should go (either in NumPy or elsewhere). I have functions using numpy.random which invoke the 8 "Real" data sets as estimated by Ted Micceri in 1989. These can be useful in Monte Carlo simulations.
Thanks for the suggestion Michael. This seems too specialized for NumPy. Also, it's not 100% clear whether you want to add functions or data sets; NumPy doesn't want to ship any data sets. It sounds to me like these would be best in their own package.
Cheers,
Ralf
Background info:
Parametric inferential statistics generally assume normal distributions (though kurtosis presents less of an issue than skew). However, in "nature", distributions are often not normal. In 1989, Ted Micceri's study ( http://psycnet.apa.org/record/198914214001) on real data sets resulted in the estimation of 8 "Real" distributions. Using these distributions in simulations help to produce more realistic types I and II error rate and power estimates, particularly for smaller samples. A similar module is currently available in Fortran called realpops.
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


Hi Ralf,
These are functions that generate data sets when invoked, like Numpy already does with mathematical distributions.
TLDR; I think this could be a useful contribution to NumPy, but I want to get feedback on where it should go (either in NumPy or elsewhere). I have functions using numpy.random which invoke the 8 "Real" data sets as estimated by Ted Micceri in 1989. These can be useful in Monte Carlo simulations.
Thanks for the suggestion Michael. This seems too specialized for NumPy. Also, it's not 100% clear whether you want to add functions or data sets; NumPy doesn't want to ship any data sets. It sounds to me like these would be best in their own package.
Cheers,
Ralf
Background info:
Parametric inferential statistics generally assume normal distributions (though kurtosis presents less of an issue than skew). However, in "nature", distributions are often not normal. In 1989, Ted Micceri's study ( http://psycnet.apa.org/record/198914214001) on real data sets resulted in the estimation of 8 "Real" distributions. Using these distributions in simulations help to produce more realistic types I and II error rate and power estimates, particularly for smaller samples. A similar module is currently available in Fortran called realpops.
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


On Tue, Jan 21, 2020 at 12:37 AM Michael Lance < [hidden email]> wrote: Hi Ralf,
These are functions that generate data sets when invoked, like Numpy already does with mathematical distributions.
It's something of a judgement call. These distribution functions are defined by tables. The most efficient way to implement them in Python would be to just make them static arrays in a module that one uses `Generator.choice()` on instead of a function that generates the arrays, just to pick out one value and throw away the array again.
Regardless of what you want to call them or how you want to implement them, a thirdparty module is the right place for them.
 Robert Kern
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion


Okay, thanks! 3rd party it is.
On Tue, Jan 21, 2020 at 12:37 AM Michael Lance < [hidden email]> wrote: Hi Ralf,
These are functions that generate data sets when invoked, like Numpy already does with mathematical distributions.
It's something of a judgement call. These distribution functions are defined by tables. The most efficient way to implement them in Python would be to just make them static arrays in a module that one uses `Generator.choice()` on instead of a function that generates the arrays, just to pick out one value and throw away the array again.
Regardless of what you want to call them or how you want to implement them, a thirdparty module is the right place for them.
 Robert Kern
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________
NumPyDiscussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpydiscussion

