NEP: Random Number Generator Policy

classic Classic list List threaded Threaded
56 messages Options
123
Reply | Threaded
Open this post in threaded view
|

NEP: Random Number Generator Policy

Robert Kern-2
As promised distressingly many months ago, I have written up a NEP about relaxing the stream-compatibility policy that we currently have.


I particularly invite comment on the two lists of methods that we still would make strict compatibility guarantees for.

---

==============================
Random Number Generator Policy
==============================

:Author: Robert Kern <[hidden email]>
:Status: Draft
:Type: Standards Track
:Created: 2018-05-24


Abstract
--------

For the past decade, NumPy has had a strict backwards compatibility policy for
the number stream of all of its random number distributions.  Unlike other
numerical components in ``numpy``, which are usually allowed to return
different when results when they are modified if they remain correct, we have
obligated the random number distributions to always produce the exact same
numbers in every version.  The objective of our stream-compatibility guarantee
was to provide exact reproducibility for simulations across numpy versions in
order to promote reproducible research.  However, this policy has made it very
difficult to enhance any of the distributions with faster or more accurate
algorithms.  After a decade of experience and improvements in the surrounding
ecosystem of scientific software, we believe that there are now better ways to
achieve these objectives.  We propose relaxing our strict stream-compatibility
policy to remove the obstacles that are in the way of accepting contributions
to our random number generation capabilities.


The Status Quo
--------------

Our current policy, in full:

    A fixed seed and a fixed series of calls to ``RandomState`` methods using the
    same parameters will always produce the same results up to roundoff error
    except when the values were incorrect.  Incorrect values will be fixed and
    the NumPy version in which the fix was made will be noted in the relevant
    docstring.  Extension of existing parameter ranges and the addition of new
    parameters is allowed as long the previous behavior remains unchanged.

This policy was first instated in Nov 2008 (in essence; the full set of weasel
words grew over time) in response to a user wanting to be sure that the
simulations that formed the basis of their scientific publication could be
reproduced years later, exactly, with whatever version of ``numpy`` that was
current at the time.  We were keen to support reproducible research, and it was
still early in the life of ``numpy.random``.  We had not seen much cause to
change the distribution methods all that much.

We also had not thought very thoroughly about the limits of what we really
could promise (and by “we” in this section, we really mean Robert Kern, let’s
be honest).  Despite all of the weasel words, our policy overpromises
compatibility.  The same version of ``numpy`` built on different platforms, or
just in a different way could cause changes in the stream, with varying degrees
of rarity.  The biggest is that the ``.multivariate_normal()`` method relies on
``numpy.linalg`` functions.  Even on the same platform, if one links ``numpy``
with a different LAPACK, ``.multivariate_normal()`` may well return completely
different results.  More rarely, building on a different OS or CPU can cause
differences in the stream.  We use C ``long`` integers internally for integer
distribution (it seemed like a good idea at the time), and those can vary in
size depending on the platform.  Distribution methods can overflow their
internal C ``longs`` at different breakpoints depending on the platform and
cause all of the random variate draws that follow to be different.

And even if all of that is controlled, our policy still does not provide exact
guarantees across versions.  We still do apply bug fixes when correctness is at
stake.  And even if we didn’t do that, any nontrivial program does more than
just draw random numbers.  They do computations on those numbers, transform
those with numerical algorithms from the rest of ``numpy``, which is not
subject to so strict a policy.  Trying to maintain stream-compatibility for our
random number distributions does not help reproducible research for these
reasons.

The standard practice now for bit-for-bit reproducible research is to pin all
of the versions of code of your software stack, possibly down to the OS itself.
The landscape for accomplishing this is much easier today than it was in 2008.
We now have ``pip``.  We now have virtual machines.  Those who need to
reproduce simulations exactly now can (and ought to) do so by using the exact
same version of ``numpy``.  We do not need to maintain stream-compatibility
across ``numpy`` versions to help them.

Our stream-compatibility guarantee has hindered our ability to make
improvements to ``numpy.random``.  Several first-time contributors have
submitted PRs to improve the distributions, usually by implementing a faster,
or more accurate algorithm than the one that is currently there.
Unfortunately, most of them would have required breaking the stream to do so.
Blocked by our policy, and our inability to work around that policy, many of
those contributors simply walked away.


Implementation
--------------

We propose first freezing ``RandomState`` as it is and developing a new RNG
subsystem alongside it.  This allows anyone who has been relying on our old
stream-compatibility guarantee to have plenty of time to migrate.
``RandomState`` will be considered deprecated, but with a long deprecation
cycle, at least a few years.  Deprecation warnings will start silent but become
increasingly noisy over time.  Bugs in the current state of the code will *not*
be fixed if fixing them would impact the stream.  However, if changes in the
rest of ``numpy`` would break something in the ``RandomState`` code, we will
fix ``RandomState`` to continue working (for example, some change in the
C API).  No new features will be added to ``RandomState``.  Users should
migrate to the new subsystem as they are able to.

Work on a proposed `new PRNG subsystem
<https://github.com/bashtage/randomgen>`_ is already underway.  The specifics
of the new design are out of scope for this NEP and up for much discussion, but
we will discuss general policies that will guide the evolution of whatever code
is adopted.

First, we will maintain API source compatibility just as we do with the rest of
``numpy``.  If we *must* make a breaking change, we will only do so with an
appropriate deprecation period and warnings.

Second, breaking stream-compatibility in order to introduce new features or
improve performance will be *allowed* with *caution*.  Such changes will be
considered features, and as such will be no faster than the standard release
cadence of features (i.e. on ``X.Y`` releases, never ``X.Y.Z``).  Slowness is
not a bug.  Correctness bug fixes that break stream-compatibility can happen on
bugfix releases, per usual, but developers should consider if they can wait
until the next feature release.  We encourage developers to strongly weight
user’s pain from the break in stream-compatibility against the improvements.
One example of a worthwhile improvement would be to change algorithms for
a significant increase in performance, for example, moving from the `Box-Muller
of Gaussian variate generation to the faster `Ziggurat algorithm
unworthy improvement would be tweaking the Ziggurat tables just a little bit.

Any new design for the RNG subsystem will provide a choice of different core
uniform PRNG algorithms.  We will be more strict about a select subset of
methods on these core PRNG objects.  They MUST guarantee stream-compatibility
for a minimal, specified set of methods which are chosen to make it easier to
compose them to build other distributions.  Namely,

    * ``.bytes()``
    * ``.random_uintegers()``
    * ``.random_sample()``

Furthermore, the new design should also provide one generator class (we shall
call it ``StableRandom`` for discussion purposes) that provides a slightly
broader subset of distribution methods for which stream-compatibility is
*guaranteed*.  The point of ``StableRandom`` is to provide something that can
be used in unit tests so projects that currently have tests which rely on the
precise stream can be migrated off of ``RandomState``.  For the best
transition, ``StableRandom`` should use as its core uniform PRNG the current
MT19937 algorithm.  As best as possible, the API for the distribution methods
that are provided on ``StableRandom`` should match their counterparts on
``RandomState``.  They should provide the same stream that the current version
of ``RandomState`` does.  Because their intended use is for unit tests, we do
not need the performance improvements from the new algorithms that will be
introduced by the new subsystem.

The list of ``StableRandom`` methods should be chosen to support unit tests:

    * ``.randint()``
    * ``.uniform()``
    * ``.normal()``
    * ``.standard_normal()``
    * ``.choice()``
    * ``.shuffle()``
    * ``.permutation()``


Not Versioning
--------------

For a long time, we considered that the way to allow algorithmic improvements
while maintaining the stream was to apply some form of versioning.  That is,
every time we make a stream change in one of the distributions, we increment
some version number somewhere.  ``numpy.random`` would keep all past versions
of the code, and there would be a way to get the old versions.  Proposals of
how to do this exactly varied widely, but we will not exhaustively list them
here.  We spent years going back and forth on these designs and were not able
to find one that sufficed.  Let that time lost, and more importantly, the
contributors that we lost while we dithered, serve as evidence against the
notion.

Concretely, adding in versioning makes maintenance of ``numpy.random``
difficult.  Necessarily, we would be keeping lots of versions of the same code
around.  Adding a new algorithm safely would still be quite hard.

But most importantly, versioning is fundamentally difficult to *use* correctly.
We want to make it easy and straightforward to get the latest, fastest, best
versions of the distribution algorithms; otherwise, what's the point?  The way
to make that easy is to make the latest the default.  But the default will
necessarily change from release to release, so the user’s code would need to be
altered anyway to specify the specific version that one wants to replicate.

Adding in versioning to maintain stream-compatibility would still only provide
the same level of stream-compatibility that we currently do, with all of the
limitations described earlier.  Given that the standard practice for such needs
is to pin the release of ``numpy`` as a whole, versioning ``RandomState`` alone
is superfluous.


Discussion
----------



Copyright
---------

This document has been placed in the public domain.


--
Robert Kern

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: NEP: Random Number Generator Policy

Warren Weckesser-2


On Sat, Jun 2, 2018 at 3:04 PM, Robert Kern <[hidden email]> wrote:
As promised distressingly many months ago, I have written up a NEP about relaxing the stream-compatibility policy that we currently have.


I particularly invite comment on the two lists of methods that we still would make strict compatibility guarantees for.

---


Thanks, Robert.   It looks like you are neatly cutting the Gordian Knot of API versioning in numpy.random!  I don't have any specific comments, except that it will be great to have *something* other than the status quo, so we can starting improving the existing numpy.random functions.

Warren



==============================
Random Number Generator Policy
==============================

:Author: Robert Kern <[hidden email]>
:Status: Draft
:Type: Standards Track
:Created: 2018-05-24


Abstract
--------

For the past decade, NumPy has had a strict backwards compatibility policy for
the number stream of all of its random number distributions.  Unlike other
numerical components in ``numpy``, which are usually allowed to return
different when results when they are modified if they remain correct, we have
obligated the random number distributions to always produce the exact same
numbers in every version.  The objective of our stream-compatibility guarantee
was to provide exact reproducibility for simulations across numpy versions in
order to promote reproducible research.  However, this policy has made it very
difficult to enhance any of the distributions with faster or more accurate
algorithms.  After a decade of experience and improvements in the surrounding
ecosystem of scientific software, we believe that there are now better ways to
achieve these objectives.  We propose relaxing our strict stream-compatibility
policy to remove the obstacles that are in the way of accepting contributions
to our random number generation capabilities.


The Status Quo
--------------

Our current policy, in full:

    A fixed seed and a fixed series of calls to ``RandomState`` methods using the
    same parameters will always produce the same results up to roundoff error
    except when the values were incorrect.  Incorrect values will be fixed and
    the NumPy version in which the fix was made will be noted in the relevant
    docstring.  Extension of existing parameter ranges and the addition of new
    parameters is allowed as long the previous behavior remains unchanged.

This policy was first instated in Nov 2008 (in essence; the full set of weasel
words grew over time) in response to a user wanting to be sure that the
simulations that formed the basis of their scientific publication could be
reproduced years later, exactly, with whatever version of ``numpy`` that was
current at the time.  We were keen to support reproducible research, and it was
still early in the life of ``numpy.random``.  We had not seen much cause to
change the distribution methods all that much.

We also had not thought very thoroughly about the limits of what we really
could promise (and by “we” in this section, we really mean Robert Kern, let’s
be honest).  Despite all of the weasel words, our policy overpromises
compatibility.  The same version of ``numpy`` built on different platforms, or
just in a different way could cause changes in the stream, with varying degrees
of rarity.  The biggest is that the ``.multivariate_normal()`` method relies on
``numpy.linalg`` functions.  Even on the same platform, if one links ``numpy``
with a different LAPACK, ``.multivariate_normal()`` may well return completely
different results.  More rarely, building on a different OS or CPU can cause
differences in the stream.  We use C ``long`` integers internally for integer
distribution (it seemed like a good idea at the time), and those can vary in
size depending on the platform.  Distribution methods can overflow their
internal C ``longs`` at different breakpoints depending on the platform and
cause all of the random variate draws that follow to be different.

And even if all of that is controlled, our policy still does not provide exact
guarantees across versions.  We still do apply bug fixes when correctness is at
stake.  And even if we didn’t do that, any nontrivial program does more than
just draw random numbers.  They do computations on those numbers, transform
those with numerical algorithms from the rest of ``numpy``, which is not
subject to so strict a policy.  Trying to maintain stream-compatibility for our
random number distributions does not help reproducible research for these
reasons.

The standard practice now for bit-for-bit reproducible research is to pin all
of the versions of code of your software stack, possibly down to the OS itself.
The landscape for accomplishing this is much easier today than it was in 2008.
We now have ``pip``.  We now have virtual machines.  Those who need to
reproduce simulations exactly now can (and ought to) do so by using the exact
same version of ``numpy``.  We do not need to maintain stream-compatibility
across ``numpy`` versions to help them.

Our stream-compatibility guarantee has hindered our ability to make
improvements to ``numpy.random``.  Several first-time contributors have
submitted PRs to improve the distributions, usually by implementing a faster,
or more accurate algorithm than the one that is currently there.
Unfortunately, most of them would have required breaking the stream to do so.
Blocked by our policy, and our inability to work around that policy, many of
those contributors simply walked away.


Implementation
--------------

We propose first freezing ``RandomState`` as it is and developing a new RNG
subsystem alongside it.  This allows anyone who has been relying on our old
stream-compatibility guarantee to have plenty of time to migrate.
``RandomState`` will be considered deprecated, but with a long deprecation
cycle, at least a few years.  Deprecation warnings will start silent but become
increasingly noisy over time.  Bugs in the current state of the code will *not*
be fixed if fixing them would impact the stream.  However, if changes in the
rest of ``numpy`` would break something in the ``RandomState`` code, we will
fix ``RandomState`` to continue working (for example, some change in the
C API).  No new features will be added to ``RandomState``.  Users should
migrate to the new subsystem as they are able to.

Work on a proposed `new PRNG subsystem
<https://github.com/bashtage/randomgen>`_ is already underway.  The specifics
of the new design are out of scope for this NEP and up for much discussion, but
we will discuss general policies that will guide the evolution of whatever code
is adopted.

First, we will maintain API source compatibility just as we do with the rest of
``numpy``.  If we *must* make a breaking change, we will only do so with an
appropriate deprecation period and warnings.

Second, breaking stream-compatibility in order to introduce new features or
improve performance will be *allowed* with *caution*.  Such changes will be
considered features, and as such will be no faster than the standard release
cadence of features (i.e. on ``X.Y`` releases, never ``X.Y.Z``).  Slowness is
not a bug.  Correctness bug fixes that break stream-compatibility can happen on
bugfix releases, per usual, but developers should consider if they can wait
until the next feature release.  We encourage developers to strongly weight
user’s pain from the break in stream-compatibility against the improvements.
One example of a worthwhile improvement would be to change algorithms for
a significant increase in performance, for example, moving from the `Box-Muller
of Gaussian variate generation to the faster `Ziggurat algorithm
unworthy improvement would be tweaking the Ziggurat tables just a little bit.

Any new design for the RNG subsystem will provide a choice of different core
uniform PRNG algorithms.  We will be more strict about a select subset of
methods on these core PRNG objects.  They MUST guarantee stream-compatibility
for a minimal, specified set of methods which are chosen to make it easier to
compose them to build other distributions.  Namely,

    * ``.bytes()``
    * ``.random_uintegers()``
    * ``.random_sample()``

Furthermore, the new design should also provide one generator class (we shall
call it ``StableRandom`` for discussion purposes) that provides a slightly
broader subset of distribution methods for which stream-compatibility is
*guaranteed*.  The point of ``StableRandom`` is to provide something that can
be used in unit tests so projects that currently have tests which rely on the
precise stream can be migrated off of ``RandomState``.  For the best
transition, ``StableRandom`` should use as its core uniform PRNG the current
MT19937 algorithm.  As best as possible, the API for the distribution methods
that are provided on ``StableRandom`` should match their counterparts on
``RandomState``.  They should provide the same stream that the current version
of ``RandomState`` does.  Because their intended use is for unit tests, we do
not need the performance improvements from the new algorithms that will be
introduced by the new subsystem.

The list of ``StableRandom`` methods should be chosen to support unit tests:

    * ``.randint()``
    * ``.uniform()``
    * ``.normal()``
    * ``.standard_normal()``
    * ``.choice()``
    * ``.shuffle()``
    * ``.permutation()``


Not Versioning
--------------

For a long time, we considered that the way to allow algorithmic improvements
while maintaining the stream was to apply some form of versioning.  That is,
every time we make a stream change in one of the distributions, we increment
some version number somewhere.  ``numpy.random`` would keep all past versions
of the code, and there would be a way to get the old versions.  Proposals of
how to do this exactly varied widely, but we will not exhaustively list them
here.  We spent years going back and forth on these designs and were not able
to find one that sufficed.  Let that time lost, and more importantly, the
contributors that we lost while we dithered, serve as evidence against the
notion.

Concretely, adding in versioning makes maintenance of ``numpy.random``
difficult.  Necessarily, we would be keeping lots of versions of the same code
around.  Adding a new algorithm safely would still be quite hard.

But most importantly, versioning is fundamentally difficult to *use* correctly.
We want to make it easy and straightforward to get the latest, fastest, best
versions of the distribution algorithms; otherwise, what's the point?  The way
to make that easy is to make the latest the default.  But the default will
necessarily change from release to release, so the user’s code would need to be
altered anyway to specify the specific version that one wants to replicate.

Adding in versioning to maintain stream-compatibility would still only provide
the same level of stream-compatibility that we currently do, with all of the
limitations described earlier.  Given that the standard practice for such needs
is to pin the release of ``numpy`` as a whole, versioning ``RandomState`` alone
is superfluous.


Discussion
----------



Copyright
---------

This document has been placed in the public domain.


--
Robert Kern

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion



_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: NEP: Random Number Generator Policy

Eric Wieser
In reply to this post by Robert Kern-2

You make a bunch of good points refuting reproducible research as an argument for not changing the random number streams.

However, there’s a second use-case you don’t address - unit tests. For better or worse, downstream, or even our own, unit tests use a seeded random number generator as a shorthand to produce some arbirary array, and then hard-code the expected output in their tests. Breaking stream compatibility will break these tests.

I don’t think writing tests in this way is particularly good idea, but unfortunately they do still exist.

It would be good to address this use case in the NEP, even if the conclusion is just “changing the stream will break tests of this form”

Eric

On Sat, 2 Jun 2018 at 12:05 Robert Kern robert.kern@... wrote:

As promised distressingly many months ago, I have written up a NEP about relaxing the stream-compatibility policy that we currently have.


I particularly invite comment on the two lists of methods that we still would make strict compatibility guarantees for.

---

==============================
Random Number Generator Policy
==============================

:Author: Robert Kern <[hidden email]>
:Status: Draft
:Type: Standards Track
:Created: 2018-05-24


Abstract
--------

For the past decade, NumPy has had a strict backwards compatibility policy for
the number stream of all of its random number distributions.  Unlike other
numerical components in ``numpy``, which are usually allowed to return
different when results when they are modified if they remain correct, we have
obligated the random number distributions to always produce the exact same
numbers in every version.  The objective of our stream-compatibility guarantee
was to provide exact reproducibility for simulations across numpy versions in
order to promote reproducible research.  However, this policy has made it very
difficult to enhance any of the distributions with faster or more accurate
algorithms.  After a decade of experience and improvements in the surrounding
ecosystem of scientific software, we believe that there are now better ways to
achieve these objectives.  We propose relaxing our strict stream-compatibility
policy to remove the obstacles that are in the way of accepting contributions
to our random number generation capabilities.


The Status Quo
--------------

Our current policy, in full:

    A fixed seed and a fixed series of calls to ``RandomState`` methods using the
    same parameters will always produce the same results up to roundoff error
    except when the values were incorrect.  Incorrect values will be fixed and
    the NumPy version in which the fix was made will be noted in the relevant
    docstring.  Extension of existing parameter ranges and the addition of new
    parameters is allowed as long the previous behavior remains unchanged.

This policy was first instated in Nov 2008 (in essence; the full set of weasel
words grew over time) in response to a user wanting to be sure that the
simulations that formed the basis of their scientific publication could be
reproduced years later, exactly, with whatever version of ``numpy`` that was
current at the time.  We were keen to support reproducible research, and it was
still early in the life of ``numpy.random``.  We had not seen much cause to
change the distribution methods all that much.

We also had not thought very thoroughly about the limits of what we really
could promise (and by “we” in this section, we really mean Robert Kern, let’s
be honest).  Despite all of the weasel words, our policy overpromises
compatibility.  The same version of ``numpy`` built on different platforms, or
just in a different way could cause changes in the stream, with varying degrees
of rarity.  The biggest is that the ``.multivariate_normal()`` method relies on
``numpy.linalg`` functions.  Even on the same platform, if one links ``numpy``
with a different LAPACK, ``.multivariate_normal()`` may well return completely
different results.  More rarely, building on a different OS or CPU can cause
differences in the stream.  We use C ``long`` integers internally for integer
distribution (it seemed like a good idea at the time), and those can vary in
size depending on the platform.  Distribution methods can overflow their
internal C ``longs`` at different breakpoints depending on the platform and
cause all of the random variate draws that follow to be different.

And even if all of that is controlled, our policy still does not provide exact
guarantees across versions.  We still do apply bug fixes when correctness is at
stake.  And even if we didn’t do that, any nontrivial program does more than
just draw random numbers.  They do computations on those numbers, transform
those with numerical algorithms from the rest of ``numpy``, which is not
subject to so strict a policy.  Trying to maintain stream-compatibility for our
random number distributions does not help reproducible research for these
reasons.

The standard practice now for bit-for-bit reproducible research is to pin all
of the versions of code of your software stack, possibly down to the OS itself.
The landscape for accomplishing this is much easier today than it was in 2008.
We now have ``pip``.  We now have virtual machines.  Those who need to
reproduce simulations exactly now can (and ought to) do so by using the exact
same version of ``numpy``.  We do not need to maintain stream-compatibility
across ``numpy`` versions to help them.

Our stream-compatibility guarantee has hindered our ability to make
improvements to ``numpy.random``.  Several first-time contributors have
submitted PRs to improve the distributions, usually by implementing a faster,
or more accurate algorithm than the one that is currently there.
Unfortunately, most of them would have required breaking the stream to do so.
Blocked by our policy, and our inability to work around that policy, many of
those contributors simply walked away.


Implementation
--------------

We propose first freezing ``RandomState`` as it is and developing a new RNG
subsystem alongside it.  This allows anyone who has been relying on our old
stream-compatibility guarantee to have plenty of time to migrate.
``RandomState`` will be considered deprecated, but with a long deprecation
cycle, at least a few years.  Deprecation warnings will start silent but become
increasingly noisy over time.  Bugs in the current state of the code will *not*
be fixed if fixing them would impact the stream.  However, if changes in the
rest of ``numpy`` would break something in the ``RandomState`` code, we will
fix ``RandomState`` to continue working (for example, some change in the
C API).  No new features will be added to ``RandomState``.  Users should
migrate to the new subsystem as they are able to.

Work on a proposed `new PRNG subsystem
<https://github.com/bashtage/randomgen>`_ is already underway.  The specifics
of the new design are out of scope for this NEP and up for much discussion, but
we will discuss general policies that will guide the evolution of whatever code
is adopted.

First, we will maintain API source compatibility just as we do with the rest of
``numpy``.  If we *must* make a breaking change, we will only do so with an
appropriate deprecation period and warnings.

Second, breaking stream-compatibility in order to introduce new features or
improve performance will be *allowed* with *caution*.  Such changes will be
considered features, and as such will be no faster than the standard release
cadence of features (i.e. on ``X.Y`` releases, never ``X.Y.Z``).  Slowness is
not a bug.  Correctness bug fixes that break stream-compatibility can happen on
bugfix releases, per usual, but developers should consider if they can wait
until the next feature release.  We encourage developers to strongly weight
user’s pain from the break in stream-compatibility against the improvements.
One example of a worthwhile improvement would be to change algorithms for
a significant increase in performance, for example, moving from the `Box-Muller
of Gaussian variate generation to the faster `Ziggurat algorithm
unworthy improvement would be tweaking the Ziggurat tables just a little bit.

Any new design for the RNG subsystem will provide a choice of different core
uniform PRNG algorithms.  We will be more strict about a select subset of
methods on these core PRNG objects.  They MUST guarantee stream-compatibility
for a minimal, specified set of methods which are chosen to make it easier to
compose them to build other distributions.  Namely,

    * ``.bytes()``
    * ``.random_uintegers()``
    * ``.random_sample()``

Furthermore, the new design should also provide one generator class (we shall
call it ``StableRandom`` for discussion purposes) that provides a slightly
broader subset of distribution methods for which stream-compatibility is
*guaranteed*.  The point of ``StableRandom`` is to provide something that can
be used in unit tests so projects that currently have tests which rely on the
precise stream can be migrated off of ``RandomState``.  For the best
transition, ``StableRandom`` should use as its core uniform PRNG the current
MT19937 algorithm.  As best as possible, the API for the distribution methods
that are provided on ``StableRandom`` should match their counterparts on
``RandomState``.  They should provide the same stream that the current version
of ``RandomState`` does.  Because their intended use is for unit tests, we do
not need the performance improvements from the new algorithms that will be
introduced by the new subsystem.

The list of ``StableRandom`` methods should be chosen to support unit tests:

    * ``.randint()``
    * ``.uniform()``
    * ``.normal()``
    * ``.standard_normal()``
    * ``.choice()``
    * ``.shuffle()``
    * ``.permutation()``


Not Versioning
--------------

For a long time, we considered that the way to allow algorithmic improvements
while maintaining the stream was to apply some form of versioning.  That is,
every time we make a stream change in one of the distributions, we increment
some version number somewhere.  ``numpy.random`` would keep all past versions
of the code, and there would be a way to get the old versions.  Proposals of
how to do this exactly varied widely, but we will not exhaustively list them
here.  We spent years going back and forth on these designs and were not able
to find one that sufficed.  Let that time lost, and more importantly, the
contributors that we lost while we dithered, serve as evidence against the
notion.

Concretely, adding in versioning makes maintenance of ``numpy.random``
difficult.  Necessarily, we would be keeping lots of versions of the same code
around.  Adding a new algorithm safely would still be quite hard.

But most importantly, versioning is fundamentally difficult to *use* correctly.
We want to make it easy and straightforward to get the latest, fastest, best
versions of the distribution algorithms; otherwise, what's the point?  The way
to make that easy is to make the latest the default.  But the default will
necessarily change from release to release, so the user’s code would need to be
altered anyway to specify the specific version that one wants to replicate.

Adding in versioning to maintain stream-compatibility would still only provide
the same level of stream-compatibility that we currently do, with all of the
limitations described earlier.  Given that the standard practice for such needs
is to pin the release of ``numpy`` as a whole, versioning ``RandomState`` alone
is superfluous.


Discussion
----------



Copyright
---------

This document has been placed in the public domain.


--
Robert Kern
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: NEP: Random Number Generator Policy

Robert Kern-2
On Sun, Jun 3, 2018 at 4:35 PM Eric Wieser <[hidden email]> wrote:

You make a bunch of good points refuting reproducible research as an argument for not changing the random number streams.

However, there’s a second use-case you don’t address - unit tests. For better or worse, downstream, or even our own, unit tests use a seeded random number generator as a shorthand to produce some arbirary array, and then hard-code the expected output in their tests. Breaking stream compatibility will break these tests.

I don’t think writing tests in this way is particularly good idea, but unfortunately they do still exist.

It would be good to address this use case in the NEP, even if the conclusion is just “changing the stream will break tests of this form”


I do! Search for "unit test" or "StableRandom". :-) 

--
Robert Kern

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: NEP: Random Number Generator Policy

Stephan Hoyer-2
In reply to this post by Robert Kern-2
On Sat, Jun 2, 2018 at 12:06 PM Robert Kern <[hidden email]> wrote:
We propose first freezing ``RandomState`` as it is and developing a new RNG
subsystem alongside it.  This allows anyone who has been relying on our old
stream-compatibility guarantee to have plenty of time to migrate.
``RandomState`` will be considered deprecated, but with a long deprecation
cycle, at least a few years.  Deprecation warnings will start silent but become
increasingly noisy over time.  Bugs in the current state of the code will *not*
be fixed if fixing them would impact the stream.  However, if changes in the
rest of ``numpy`` would break something in the ``RandomState`` code, we will
fix ``RandomState`` to continue working (for example, some change in the
C API).  No new features will be added to ``RandomState``.  Users should
migrate to the new subsystem as they are able to.

Robert, thanks for this proposal. I think it makes a lot of sense and will help maintain the long-term viability of numpy.random.

The main clarification I would like to see addressed is what "freezing RandomState" means for top level functions in numpy.random. I think we could safely swap out the underlying implementation if numpy.random.seed() is not explicitly called, but how would we handle cases where a seed is explicitly set?

You and I both agree that this is an anti-pattern for numpy.random, but certainly there is plenty of code that relies on the stability of random numbers when seeds are set by np.random.seed(). Similar to the case for RandomState, we would presumably need to start issuing warnings when seed() is explicitly called, which begs the question of what (if anything) we propose to replace seed() with. I suppose this will be your next NEP :).

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: NEP: Random Number Generator Policy

Robert Kern-2
In reply to this post by Robert Kern-2
Moving some of the Github PR comments here:

Implementation
--------------

We propose first freezing ``RandomState`` as it is and developing a new RNG
subsystem alongside it.  This allows anyone who has been relying on our old
stream-compatibility guarantee to have plenty of time to migrate.
``RandomState`` will be considered deprecated, but with a long deprecation
cycle, at least a few years.

> RandomState could pretty easily be spun out into a stand-alone package, if useful. It is effectively a stand-alone submodule already.

Indeed. That would be a graceful forever-home for the code for anyone who needs it. However, I'd still only make that switch after at least a few years of deprecation inside numpy. And maybe a 2.0.0 release.
 
Any new design for the RNG subsystem will provide a choice of different core
uniform PRNG algorithms.  We will be more strict about a select subset of
methods on these core PRNG objects.  They MUST guarantee stream-compatibility
for a minimal, specified set of methods which are chosen to make it easier to
compose them to build other distributions.  Namely,

    * ``.bytes()``
    * ``.random_uintegers()``
    * ``.random_sample()``

BTW, `random_uintegers()` is a new method in Kevin Sheppard's `randomgen`, and I am referring to its semantics here.

> One of these (bytes, uintegers) seems redundant. uintegers should probably by 64 bit.

Because different core generators have different "native" outputs (MT19937, PCG32 output `uint32`s, PCG64 outputs `uint64`s, and some that I hope we never implement natively output doubles), there are some simple, but non-trivial choices to make to support each of these. I would like the core generator's author to make those choices and maintain them. They're not hard, but they are the kind of thing that ought to be decided once and consistently.

I am of the opinion that `uintegers` should support at least `uint32` and `uint64` as those are the most common native outputs among core generators. There should be a maintained way to get that native format (and yes, I'd rather have the user be explicit about it than have `random_native_uint()` in addition to `random_uint64()`).
 
This argument extends to `.bytes()`, too, now that I think about it. A stream of bytes is a native format for some generators, too, like if we decide to hook up /dev/urandom or other file-backed interface.

Hmm, what do you think about adding `random_interval()` to this list? And raising that up to the Python API level (a la what Python 3 did with exposing `secrets.randbelow()` as a primitive)?

Many, many uses of this method would be with numbers much less than 1<<32 (e.g. Fisher-Yates shuffle), and for the 32-bit native PRNGs could mean using half as many core PRNG draws if `random_interval()` is implemented along with the core PRNG to make use of that fact.

The list of ``StableRandom`` methods should be chosen to support unit tests:

    * ``.randint()``
    * ``.uniform()``
    * ``.normal()``
    * ``.standard_normal()``
    * ``.choice()``
    * ``.shuffle()``
    * ``.permutation()``

@bashtage writes:
> standard_gamma and standard_exponential are important enough to be included here IMO.

"Importance" was not my criterion, only whether they are used in unit test suites. This list was just off the top of my head for methods that I think were actually used in test suites, so I'd be happy to be shown live tests that use other methods. I'd like to be a *little* conservative about what methods we stick in here, but we don't have to be *too* conservative, since we are explicitly never going to be modifying these.

--
Robert Kern

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: NEP: Random Number Generator Policy

Robert Kern-2
In reply to this post by Eric Wieser
On Sun, Jun 3, 2018 at 4:35 PM Eric Wieser <[hidden email]> wrote:

You make a bunch of good points refuting reproducible research as an argument for not changing the random number streams.

However, there’s a second use-case you don’t address - unit tests. For better or worse, downstream, or even our own, unit tests use a seeded random number generator as a shorthand to produce some arbirary array, and then hard-code the expected output in their tests. Breaking stream compatibility will break these tests.

By the way, the reason that I didn't mention this use case as a motivation in the Status Quo section because, as I reviewed my mail archive, this wasn't actually a motivating use case for the policy. It's certainly a use case that developed once we did make these (*cough*extravagant*cough*) guarantees, though, as people started to rely on it, and I hope that my StableRandom proposal addresses it to your satisfaction. I could add some more details about that history if you think it would be useful.

--
Robert Kern

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: NEP: Random Number Generator Policy

Robert Kern-2
In reply to this post by Stephan Hoyer-2
On Sun, Jun 3, 2018 at 5:23 PM Stephan Hoyer <[hidden email]> wrote:
On Sat, Jun 2, 2018 at 12:06 PM Robert Kern <[hidden email]> wrote:
We propose first freezing ``RandomState`` as it is and developing a new RNG
subsystem alongside it.  This allows anyone who has been relying on our old
stream-compatibility guarantee to have plenty of time to migrate.
``RandomState`` will be considered deprecated, but with a long deprecation
cycle, at least a few years.  Deprecation warnings will start silent but become
increasingly noisy over time.  Bugs in the current state of the code will *not*
be fixed if fixing them would impact the stream.  However, if changes in the
rest of ``numpy`` would break something in the ``RandomState`` code, we will
fix ``RandomState`` to continue working (for example, some change in the
C API).  No new features will be added to ``RandomState``.  Users should
migrate to the new subsystem as they are able to.

Robert, thanks for this proposal. I think it makes a lot of sense and will help maintain the long-term viability of numpy.random.

The main clarification I would like to see addressed is what "freezing RandomState" means for top level functions in numpy.random. I think we could safely swap out the underlying implementation if numpy.random.seed() is not explicitly called, but how would we handle cases where a seed is explicitly set?

You and I both agree that this is an anti-pattern for numpy.random, but certainly there is plenty of code that relies on the stability of random numbers when seeds are set by np.random.seed(). Similar to the case for RandomState, we would presumably need to start issuing warnings when seed() is explicitly called, which begs the question of what (if anything) we propose to replace seed() with.

Well, *I* propose `AttributeError`, myself…
 
I suppose this will be your next NEP :).

I deliberately left it out of this one as it may, depending on our choices, impinge upon the design of the new PRNG subsystem, which I declared out of scope for this NEP. I have ideas (besides the glib "Let them eat AttributeErrors!"), and now that I think more about it, that does seem like it might be in scope just like the discussion of freezing RandomState and StableRandom are. But I think I'd like to hold that thought a little bit and get a little more screaming^Wfeedback on the core proposal first. I'll return to this in a few days if not sooner.

--
Robert Kern

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: NEP: Random Number Generator Policy

josef.pktd
In reply to this post by Robert Kern-2


On Sun, Jun 3, 2018 at 8:21 PM, Robert Kern <[hidden email]> wrote:
Moving some of the Github PR comments here:

Implementation
--------------

We propose first freezing ``RandomState`` as it is and developing a new RNG
subsystem alongside it.  This allows anyone who has been relying on our old
stream-compatibility guarantee to have plenty of time to migrate.
``RandomState`` will be considered deprecated, but with a long deprecation
cycle, at least a few years.

> RandomState could pretty easily be spun out into a stand-alone package, if useful. It is effectively a stand-alone submodule already.

Indeed. That would be a graceful forever-home for the code for anyone who needs it. However, I'd still only make that switch after at least a few years of deprecation inside numpy. And maybe a 2.0.0 release.
 
Any new design for the RNG subsystem will provide a choice of different core
uniform PRNG algorithms.  We will be more strict about a select subset of
methods on these core PRNG objects.  They MUST guarantee stream-compatibility
for a minimal, specified set of methods which are chosen to make it easier to
compose them to build other distributions.  Namely,

    * ``.bytes()``
    * ``.random_uintegers()``
    * ``.random_sample()``

BTW, `random_uintegers()` is a new method in Kevin Sheppard's `randomgen`, and I am referring to its semantics here.

> One of these (bytes, uintegers) seems redundant. uintegers should probably by 64 bit.

Because different core generators have different "native" outputs (MT19937, PCG32 output `uint32`s, PCG64 outputs `uint64`s, and some that I hope we never implement natively output doubles), there are some simple, but non-trivial choices to make to support each of these. I would like the core generator's author to make those choices and maintain them. They're not hard, but they are the kind of thing that ought to be decided once and consistently.

I am of the opinion that `uintegers` should support at least `uint32` and `uint64` as those are the most common native outputs among core generators. There should be a maintained way to get that native format (and yes, I'd rather have the user be explicit about it than have `random_native_uint()` in addition to `random_uint64()`).
 
This argument extends to `.bytes()`, too, now that I think about it. A stream of bytes is a native format for some generators, too, like if we decide to hook up /dev/urandom or other file-backed interface.

Hmm, what do you think about adding `random_interval()` to this list? And raising that up to the Python API level (a la what Python 3 did with exposing `secrets.randbelow()` as a primitive)?

Many, many uses of this method would be with numbers much less than 1<<32 (e.g. Fisher-Yates shuffle), and for the 32-bit native PRNGs could mean using half as many core PRNG draws if `random_interval()` is implemented along with the core PRNG to make use of that fact.

The list of ``StableRandom`` methods should be chosen to support unit tests:

    * ``.randint()``
    * ``.uniform()``
    * ``.normal()``
    * ``.standard_normal()``
    * ``.choice()``
    * ``.shuffle()``
    * ``.permutation()``

@bashtage writes:
> standard_gamma and standard_exponential are important enough to be included here IMO.

"Importance" was not my criterion, only whether they are used in unit test suites. This list was just off the top of my head for methods that I think were actually used in test suites, so I'd be happy to be shown live tests that use other methods. I'd like to be a *little* conservative about what methods we stick in here, but we don't have to be *too* conservative, since we are explicitly never going to be modifying these.

That's one area where I thought the selection is too narrow.
We should be able to get a stable stream from the uniform for some distributions.

However, according to the Wikipedia description Poisson doesn't look easy. I just wrote a unit test for statsmodels using Poisson random numbers with hard coded numbers for the regression tests.
I'm not sure which other distributions are common enough and not easily reproducible by transformation. E.g. negative binomial can be reproduces by a gamma-poisson mixture.

On the other hand normal can be easily recreated from standard_normal.

Would it be difficult to keep this list large, given that it should be frozen, low maintenance code ?


Josef


 

--
Robert Kern

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion



_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: NEP: Random Number Generator Policy

josef.pktd
In reply to this post by Robert Kern-2


On Sun, Jun 3, 2018 at 8:36 PM, Robert Kern <[hidden email]> wrote:
On Sun, Jun 3, 2018 at 4:35 PM Eric Wieser <[hidden email]> wrote:

You make a bunch of good points refuting reproducible research as an argument for not changing the random number streams.

However, there’s a second use-case you don’t address - unit tests. For better or worse, downstream, or even our own, unit tests use a seeded random number generator as a shorthand to produce some arbirary array, and then hard-code the expected output in their tests. Breaking stream compatibility will break these tests.

By the way, the reason that I didn't mention this use case as a motivation in the Status Quo section because, as I reviewed my mail archive, this wasn't actually a motivating use case for the policy. It's certainly a use case that developed once we did make these (*cough*extravagant*cough*) guarantees, though, as people started to rely on it, and I hope that my StableRandom proposal addresses it to your satisfaction. I could add some more details about that history if you think it would be useful.

I don't think that's accurate.
The unit tests for stable random numbers were added when Enthought silently changed the normal random numbers and we got messages from users that the unit tests fail and they cannot reproduce our results.

6/12/10
[SciPy-Dev] seeded randn gets different values on osx

(I don't find an online copy, this is from my own mail archive)

AFAIR

Josef

 

--
Robert Kern

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion



_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: NEP: Random Number Generator Policy

Robert Kern-2
On Sun, Jun 3, 2018 at 6:01 PM <[hidden email]> wrote:


On Sun, Jun 3, 2018 at 8:36 PM, Robert Kern <[hidden email]> wrote:
On Sun, Jun 3, 2018 at 4:35 PM Eric Wieser <[hidden email]> wrote:

You make a bunch of good points refuting reproducible research as an argument for not changing the random number streams.

However, there’s a second use-case you don’t address - unit tests. For better or worse, downstream, or even our own, unit tests use a seeded random number generator as a shorthand to produce some arbirary array, and then hard-code the expected output in their tests. Breaking stream compatibility will break these tests.

By the way, the reason that I didn't mention this use case as a motivation in the Status Quo section because, as I reviewed my mail archive, this wasn't actually a motivating use case for the policy. It's certainly a use case that developed once we did make these (*cough*extravagant*cough*) guarantees, though, as people started to rely on it, and I hope that my StableRandom proposal addresses it to your satisfaction. I could add some more details about that history if you think it would be useful.

I don't think that's accurate.
The unit tests for stable random numbers were added when Enthought silently changed the normal random numbers and we got messages from users that the unit tests fail and they cannot reproduce our results.

6/12/10
[SciPy-Dev] seeded randn gets different values on osx

(I don't find an online copy, this is from my own mail archive)

The policy was in place Nov 2008.

--
Robert Kern

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: NEP: Random Number Generator Policy

Robert Kern-2
In reply to this post by josef.pktd
On Sun, Jun 3, 2018 at 5:46 PM <[hidden email]> wrote:


On Sun, Jun 3, 2018 at 8:21 PM, Robert Kern <[hidden email]> wrote:

The list of ``StableRandom`` methods should be chosen to support unit tests:

    * ``.randint()``
    * ``.uniform()``
    * ``.normal()``
    * ``.standard_normal()``
    * ``.choice()``
    * ``.shuffle()``
    * ``.permutation()``

@bashtage writes:
> standard_gamma and standard_exponential are important enough to be included here IMO.

"Importance" was not my criterion, only whether they are used in unit test suites. This list was just off the top of my head for methods that I think were actually used in test suites, so I'd be happy to be shown live tests that use other methods. I'd like to be a *little* conservative about what methods we stick in here, but we don't have to be *too* conservative, since we are explicitly never going to be modifying these.

That's one area where I thought the selection is too narrow.
We should be able to get a stable stream from the uniform for some distributions.

However, according to the Wikipedia description Poisson doesn't look easy. I just wrote a unit test for statsmodels using Poisson random numbers with hard coded numbers for the regression tests.

I'd really rather people do this than use StableRandom; this is best practice, as I see it, if your tests involve making precise comparisons to expected results.

StableRandom is intended as a crutch so that the pain of moving existing unit tests away from the deprecated RandomState is less onerous. I'd really rather people write better unit tests!

In particular, I do not want to add any of the integer-domain distributions (aside from shuffle/permutation/choice) as these are the ones that have the platform-dependency issues with respect to 32/64-bit `long` integers. They'd be unreliable for unit tests even if we kept them stable over time.
 
I'm not sure which other distributions are common enough and not easily reproducible by transformation. E.g. negative binomial can be reproduces by a gamma-poisson mixture.

On the other hand normal can be easily recreated from standard_normal.

I was mostly motivated by making it a bit easier to mechanically replace uses of randn(), which is probably even more common than normal() and standard_normal() in unit tests.
 
Would it be difficult to keep this list large, given that it should be frozen, low maintenance code ?

I admit that I had in mind non-statistical unit tests. That is, tests that didn't depend on the precise distribution of the inputs.
 
--
Robert Kern

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: NEP: Random Number Generator Policy

josef.pktd
In reply to this post by Robert Kern-2


On Sat, Jun 2, 2018 at 3:04 PM, Robert Kern <[hidden email]> wrote:
As promised distressingly many months ago, I have written up a NEP about relaxing the stream-compatibility policy that we currently have.


I particularly invite comment on the two lists of methods that we still would make strict compatibility guarantees for.

---

==============================
Random Number Generator Policy
==============================

:Author: Robert Kern <[hidden email]>
:Status: Draft
:Type: Standards Track
:Created: 2018-05-24


Abstract
--------

For the past decade, NumPy has had a strict backwards compatibility policy for
the number stream of all of its random number distributions.  Unlike other
numerical components in ``numpy``, which are usually allowed to return
different when results when they are modified if they remain correct, we have
obligated the random number distributions to always produce the exact same
numbers in every version.  The objective of our stream-compatibility guarantee
was to provide exact reproducibility for simulations across numpy versions in
order to promote reproducible research.  However, this policy has made it very
difficult to enhance any of the distributions with faster or more accurate
algorithms.  After a decade of experience and improvements in the surrounding
ecosystem of scientific software, we believe that there are now better ways to
achieve these objectives.  We propose relaxing our strict stream-compatibility
policy to remove the obstacles that are in the way of accepting contributions
to our random number generation capabilities.


The Status Quo
--------------

Our current policy, in full:

    A fixed seed and a fixed series of calls to ``RandomState`` methods using the
    same parameters will always produce the same results up to roundoff error
    except when the values were incorrect.  Incorrect values will be fixed and
    the NumPy version in which the fix was made will be noted in the relevant
    docstring.  Extension of existing parameter ranges and the addition of new
    parameters is allowed as long the previous behavior remains unchanged.

This policy was first instated in Nov 2008 (in essence; the full set of weasel
words grew over time) in response to a user wanting to be sure that the
simulations that formed the basis of their scientific publication could be
reproduced years later, exactly, with whatever version of ``numpy`` that was
current at the time.  We were keen to support reproducible research, and it was
still early in the life of ``numpy.random``.  We had not seen much cause to
change the distribution methods all that much.

We also had not thought very thoroughly about the limits of what we really
could promise (and by “we” in this section, we really mean Robert Kern, let’s
be honest).  Despite all of the weasel words, our policy overpromises
compatibility.  The same version of ``numpy`` built on different platforms, or
just in a different way could cause changes in the stream, with varying degrees
of rarity.  The biggest is that the ``.multivariate_normal()`` method relies on
``numpy.linalg`` functions.  Even on the same platform, if one links ``numpy``
with a different LAPACK, ``.multivariate_normal()`` may well return completely
different results.  More rarely, building on a different OS or CPU can cause
differences in the stream. 

AFAIK, I have never seen this. Except for some corner cases (like singular transformation)
the "noise" from different linalg packages is in the range of floating point noise which is not relevant if we unit test, for example, pvalues at rtol=1e-10.

Based on the unit test that don't fail, "may well return completely different results" seems exaggerated.

(There can be huge jumps in results from linalg operations like svd around the near singular/singular threshold, i.e. when floating point noise is in the range of the rcond threshold, but that's independent of np.random and can happen in many cases when we want to have reproducible numerical noise which is not possible, but doesn't affect stability of results in well defined cases.)

Josef

 
We use C ``long`` integers internally for integer
distribution (it seemed like a good idea at the time), and those can vary in
size depending on the platform.  Distribution methods can overflow their
internal C ``longs`` at different breakpoints depending on the platform and
cause all of the random variate draws that follow to be different.

And even if all of that is controlled, our policy still does not provide exact
guarantees across versions.  We still do apply bug fixes when correctness is at
stake.  And even if we didn’t do that, any nontrivial program does more than
just draw random numbers.  They do computations on those numbers, transform
those with numerical algorithms from the rest of ``numpy``, which is not
subject to so strict a policy.  Trying to maintain stream-compatibility for our
random number distributions does not help reproducible research for these
reasons.

The standard practice now for bit-for-bit reproducible research is to pin all
of the versions of code of your software stack, possibly down to the OS itself.
The landscape for accomplishing this is much easier today than it was in 2008.
We now have ``pip``.  We now have virtual machines.  Those who need to
reproduce simulations exactly now can (and ought to) do so by using the exact
same version of ``numpy``.  We do not need to maintain stream-compatibility
across ``numpy`` versions to help them.

Our stream-compatibility guarantee has hindered our ability to make
improvements to ``numpy.random``.  Several first-time contributors have
submitted PRs to improve the distributions, usually by implementing a faster,
or more accurate algorithm than the one that is currently there.
Unfortunately, most of them would have required breaking the stream to do so.
Blocked by our policy, and our inability to work around that policy, many of
those contributors simply walked away.


Implementation
--------------

We propose first freezing ``RandomState`` as it is and developing a new RNG
subsystem alongside it.  This allows anyone who has been relying on our old
stream-compatibility guarantee to have plenty of time to migrate.
``RandomState`` will be considered deprecated, but with a long deprecation
cycle, at least a few years.  Deprecation warnings will start silent but become
increasingly noisy over time.  Bugs in the current state of the code will *not*
be fixed if fixing them would impact the stream.  However, if changes in the
rest of ``numpy`` would break something in the ``RandomState`` code, we will
fix ``RandomState`` to continue working (for example, some change in the
C API).  No new features will be added to ``RandomState``.  Users should
migrate to the new subsystem as they are able to.

Work on a proposed `new PRNG subsystem
<https://github.com/bashtage/randomgen>`_ is already underway.  The specifics
of the new design are out of scope for this NEP and up for much discussion, but
we will discuss general policies that will guide the evolution of whatever code
is adopted.

First, we will maintain API source compatibility just as we do with the rest of
``numpy``.  If we *must* make a breaking change, we will only do so with an
appropriate deprecation period and warnings.

Second, breaking stream-compatibility in order to introduce new features or
improve performance will be *allowed* with *caution*.  Such changes will be
considered features, and as such will be no faster than the standard release
cadence of features (i.e. on ``X.Y`` releases, never ``X.Y.Z``).  Slowness is
not a bug.  Correctness bug fixes that break stream-compatibility can happen on
bugfix releases, per usual, but developers should consider if they can wait
until the next feature release.  We encourage developers to strongly weight
user’s pain from the break in stream-compatibility against the improvements.
One example of a worthwhile improvement would be to change algorithms for
a significant increase in performance, for example, moving from the `Box-Muller
of Gaussian variate generation to the faster `Ziggurat algorithm
unworthy improvement would be tweaking the Ziggurat tables just a little bit.

Any new design for the RNG subsystem will provide a choice of different core
uniform PRNG algorithms.  We will be more strict about a select subset of
methods on these core PRNG objects.  They MUST guarantee stream-compatibility
for a minimal, specified set of methods which are chosen to make it easier to
compose them to build other distributions.  Namely,

    * ``.bytes()``
    * ``.random_uintegers()``
    * ``.random_sample()``

Furthermore, the new design should also provide one generator class (we shall
call it ``StableRandom`` for discussion purposes) that provides a slightly
broader subset of distribution methods for which stream-compatibility is
*guaranteed*.  The point of ``StableRandom`` is to provide something that can
be used in unit tests so projects that currently have tests which rely on the
precise stream can be migrated off of ``RandomState``.  For the best
transition, ``StableRandom`` should use as its core uniform PRNG the current
MT19937 algorithm.  As best as possible, the API for the distribution methods
that are provided on ``StableRandom`` should match their counterparts on
``RandomState``.  They should provide the same stream that the current version
of ``RandomState`` does.  Because their intended use is for unit tests, we do
not need the performance improvements from the new algorithms that will be
introduced by the new subsystem.

The list of ``StableRandom`` methods should be chosen to support unit tests:

    * ``.randint()``
    * ``.uniform()``
    * ``.normal()``
    * ``.standard_normal()``
    * ``.choice()``
    * ``.shuffle()``
    * ``.permutation()``


Not Versioning
--------------

For a long time, we considered that the way to allow algorithmic improvements
while maintaining the stream was to apply some form of versioning.  That is,
every time we make a stream change in one of the distributions, we increment
some version number somewhere.  ``numpy.random`` would keep all past versions
of the code, and there would be a way to get the old versions.  Proposals of
how to do this exactly varied widely, but we will not exhaustively list them
here.  We spent years going back and forth on these designs and were not able
to find one that sufficed.  Let that time lost, and more importantly, the
contributors that we lost while we dithered, serve as evidence against the
notion.

Concretely, adding in versioning makes maintenance of ``numpy.random``
difficult.  Necessarily, we would be keeping lots of versions of the same code
around.  Adding a new algorithm safely would still be quite hard.

But most importantly, versioning is fundamentally difficult to *use* correctly.
We want to make it easy and straightforward to get the latest, fastest, best
versions of the distribution algorithms; otherwise, what's the point?  The way
to make that easy is to make the latest the default.  But the default will
necessarily change from release to release, so the user’s code would need to be
altered anyway to specify the specific version that one wants to replicate.

Adding in versioning to maintain stream-compatibility would still only provide
the same level of stream-compatibility that we currently do, with all of the
limitations described earlier.  Given that the standard practice for such needs
is to pin the release of ``numpy`` as a whole, versioning ``RandomState`` alone
is superfluous.


Discussion
----------



Copyright
---------

This document has been placed in the public domain.


--
Robert Kern

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion



_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: NEP: Random Number Generator Policy

josef.pktd
In reply to this post by Robert Kern-2


On Sun, Jun 3, 2018 at 9:04 PM, Robert Kern <[hidden email]> wrote:
On Sun, Jun 3, 2018 at 6:01 PM <[hidden email]> wrote:


On Sun, Jun 3, 2018 at 8:36 PM, Robert Kern <[hidden email]> wrote:
On Sun, Jun 3, 2018 at 4:35 PM Eric Wieser <[hidden email]> wrote:

You make a bunch of good points refuting reproducible research as an argument for not changing the random number streams.

However, there’s a second use-case you don’t address - unit tests. For better or worse, downstream, or even our own, unit tests use a seeded random number generator as a shorthand to produce some arbirary array, and then hard-code the expected output in their tests. Breaking stream compatibility will break these tests.

By the way, the reason that I didn't mention this use case as a motivation in the Status Quo section because, as I reviewed my mail archive, this wasn't actually a motivating use case for the policy. It's certainly a use case that developed once we did make these (*cough*extravagant*cough*) guarantees, though, as people started to rely on it, and I hope that my StableRandom proposal addresses it to your satisfaction. I could add some more details about that history if you think it would be useful.

I don't think that's accurate.
The unit tests for stable random numbers were added when Enthought silently changed the normal random numbers and we got messages from users that the unit tests fail and they cannot reproduce our results.

6/12/10
[SciPy-Dev] seeded randn gets different values on osx

(I don't find an online copy, this is from my own mail archive)

The policy was in place Nov 2008.

only for the underlying stream, but those unit tests didn't guarantee it for the actual distributions

So maybe there was a discussion in 2008 which was mostly before my time.
The guarantee for distributions was added in 2010/2011, at least in terms of unit tests in numpy 
in order to protect the unit tests in scipy.stats and by analogy for similar cases in other packages 
and across users.


Josef

 

--
Robert Kern

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion



_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: NEP: Random Number Generator Policy

Robert Kern-2
On Sun, Jun 3, 2018 at 6:26 PM <[hidden email]> wrote:


On Sun, Jun 3, 2018 at 9:04 PM, Robert Kern <[hidden email]> wrote:
On Sun, Jun 3, 2018 at 6:01 PM <[hidden email]> wrote:


On Sun, Jun 3, 2018 at 8:36 PM, Robert Kern <[hidden email]> wrote:
On Sun, Jun 3, 2018 at 4:35 PM Eric Wieser <[hidden email]> wrote:

You make a bunch of good points refuting reproducible research as an argument for not changing the random number streams.

However, there’s a second use-case you don’t address - unit tests. For better or worse, downstream, or even our own, unit tests use a seeded random number generator as a shorthand to produce some arbirary array, and then hard-code the expected output in their tests. Breaking stream compatibility will break these tests.

By the way, the reason that I didn't mention this use case as a motivation in the Status Quo section because, as I reviewed my mail archive, this wasn't actually a motivating use case for the policy. It's certainly a use case that developed once we did make these (*cough*extravagant*cough*) guarantees, though, as people started to rely on it, and I hope that my StableRandom proposal addresses it to your satisfaction. I could add some more details about that history if you think it would be useful.

I don't think that's accurate.
The unit tests for stable random numbers were added when Enthought silently changed the normal random numbers and we got messages from users that the unit tests fail and they cannot reproduce our results.

6/12/10
[SciPy-Dev] seeded randn gets different values on osx

(I don't find an online copy, this is from my own mail archive)

The policy was in place Nov 2008.

only for the underlying stream, but those unit tests didn't guarantee it for the actual distributions

So maybe there was a discussion in 2008 which was mostly before my time.
The guarantee for distributions was added in 2010/2011, at least in terms of unit tests in numpy 
in order to protect the unit tests in scipy.stats and by analogy for similar cases in other packages 
and across users.

The policy existed for the distributions regardless of whether or not we had a test suite that ensured it. I cannot share internal emails, of course, but please be assured that the existence of the policy was one of my arguments for rolling back that addition to EPD (and would have been what I argued to prevent it from going out, had I been aware of it).

--
Robert Kern

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: NEP: Random Number Generator Policy

josef.pktd
In reply to this post by Robert Kern-2


On Sun, Jun 3, 2018 at 9:08 PM, Robert Kern <[hidden email]> wrote:
On Sun, Jun 3, 2018 at 5:46 PM <[hidden email]> wrote:


On Sun, Jun 3, 2018 at 8:21 PM, Robert Kern <[hidden email]> wrote:

The list of ``StableRandom`` methods should be chosen to support unit tests:

    * ``.randint()``
    * ``.uniform()``
    * ``.normal()``
    * ``.standard_normal()``
    * ``.choice()``
    * ``.shuffle()``
    * ``.permutation()``

@bashtage writes:
> standard_gamma and standard_exponential are important enough to be included here IMO.

"Importance" was not my criterion, only whether they are used in unit test suites. This list was just off the top of my head for methods that I think were actually used in test suites, so I'd be happy to be shown live tests that use other methods. I'd like to be a *little* conservative about what methods we stick in here, but we don't have to be *too* conservative, since we are explicitly never going to be modifying these.

That's one area where I thought the selection is too narrow.
We should be able to get a stable stream from the uniform for some distributions.

However, according to the Wikipedia description Poisson doesn't look easy. I just wrote a unit test for statsmodels using Poisson random numbers with hard coded numbers for the regression tests.

I'd really rather people do this than use StableRandom; this is best practice, as I see it, if your tests involve making precise comparisons to expected results.

I hardcoded the results not the random data. So the unit tests rely on a reproducible stream of Poisson random numbers.
I don't want to save 500 (100 or 1000) observations in a csv file for every variation of the unit test that I run.

 

StableRandom is intended as a crutch so that the pain of moving existing unit tests away from the deprecated RandomState is less onerous. I'd really rather people write better unit tests!

In particular, I do not want to add any of the integer-domain distributions (aside from shuffle/permutation/choice) as these are the ones that have the platform-dependency issues with respect to 32/64-bit `long` integers. They'd be unreliable for unit tests even if we kept them stable over time.
 
I'm not sure which other distributions are common enough and not easily reproducible by transformation. E.g. negative binomial can be reproduces by a gamma-poisson mixture.

On the other hand normal can be easily recreated from standard_normal.

I was mostly motivated by making it a bit easier to mechanically replace uses of randn(), which is probably even more common than normal() and standard_normal() in unit tests.
 
Would it be difficult to keep this list large, given that it should be frozen, low maintenance code ?

I admit that I had in mind non-statistical unit tests. That is, tests that didn't depend on the precise distribution of the inputs.

The problem is that the unit test in `stats` rely on precise inputs (up to some numerical noise).
For example p-values themselves are uniformly distributed if the hypothesis test works correctly. That mean if I don't have control over the inputs, then my p-value could be anything in (0, 1). So either we need a real dataset, save all the random numbers in a file or have a reproducible set of random numbers.

95% of the unit tests that I write are for statistics. A large fraction of them don't rely on the exact distribution, but do rely on a random numbers that are "good enough".
For example, when writing unit test, then I get every once in a while or sometimes more often a "bad" stream of random numbers, for which convergence might fail or where the estimated numbers are far away from the true numbers, so test tolerance would have to be very high.
If I pick one of the seeds that looks good, then I can have tighter unit test tolerance to insure results are good in a nice case.

The problem is that we cannot write robust unit tests for regression tests without stable inputs.
E.g. I verified my results with a Monte Carlo with 5000 replications and 1000 Poisson observations in each. 
Results look close to expected and won't depend much on the exact stream of random variables.
But the Monte Carlo for each variant of the test took about 40 seconds. Doing this for all option combination and dataset specification takes too long to be feasible in a unit test suite.
So I rely on numpy's stable random numbers and hard code the results for a specific random sample in the regression unit tests.

Josef

 
 
--
Robert Kern

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion



_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: NEP: Random Number Generator Policy

Stephan Hoyer-2
In reply to this post by Robert Kern-2
On Sun, Jun 3, 2018 at 5:39 PM Robert Kern <[hidden email]> wrote:
You and I both agree that this is an anti-pattern for numpy.random, but certainly there is plenty of code that relies on the stability of random numbers when seeds are set by np.random.seed(). Similar to the case for RandomState, we would presumably need to start issuing warnings when seed() is explicitly called, which begs the question of what (if anything) we propose to replace seed() with.

Well, *I* propose `AttributeError`, myself…
 
I suppose this will be your next NEP :).

I deliberately left it out of this one as it may, depending on our choices, impinge upon the design of the new PRNG subsystem, which I declared out of scope for this NEP. I have ideas (besides the glib "Let them eat AttributeErrors!"), and now that I think more about it, that does seem like it might be in scope just like the discussion of freezing RandomState and StableRandom are. But I think I'd like to hold that thought a little bit and get a little more screaming^Wfeedback on the core proposal first. I'll return to this in a few days if not sooner.

For this NEP, it might be enough here to say that the current behavior of np.random.seed() will be deprecated just like np.random.RandomState(), since the current implementation of np.random.seed() is intimately tied to RandomState.

The natural of the exact replacement (if any) can be left for future discussion.

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: NEP: Random Number Generator Policy

ralfgommers
In reply to this post by josef.pktd


On Sun, Jun 3, 2018 at 6:54 PM, <[hidden email]> wrote:


On Sun, Jun 3, 2018 at 9:08 PM, Robert Kern <[hidden email]> wrote:
On Sun, Jun 3, 2018 at 5:46 PM <[hidden email]> wrote:


On Sun, Jun 3, 2018 at 8:21 PM, Robert Kern <[hidden email]> wrote:

The list of ``StableRandom`` methods should be chosen to support unit tests:

    * ``.randint()``
    * ``.uniform()``
    * ``.normal()``
    * ``.standard_normal()``
    * ``.choice()``
    * ``.shuffle()``
    * ``.permutation()``

@bashtage writes:
> standard_gamma and standard_exponential are important enough to be included here IMO.

"Importance" was not my criterion, only whether they are used in unit test suites. This list was just off the top of my head for methods that I think were actually used in test suites, so I'd be happy to be shown live tests that use other methods. I'd like to be a *little* conservative about what methods we stick in here, but we don't have to be *too* conservative, since we are explicitly never going to be modifying these.

That's one area where I thought the selection is too narrow.
We should be able to get a stable stream from the uniform for some distributions.

However, according to the Wikipedia description Poisson doesn't look easy. I just wrote a unit test for statsmodels using Poisson random numbers with hard coded numbers for the regression tests.

I'd really rather people do this than use StableRandom; this is best practice, as I see it, if your tests involve making precise comparisons to expected results.

I hardcoded the results not the random data. So the unit tests rely on a reproducible stream of Poisson random numbers.
I don't want to save 500 (100 or 1000) observations in a csv file for every variation of the unit test that I run.

I agree, hardcoding numbers in every place where seeded random numbers are now used is quite unrealistic.

It may be worth having a look at test suites for scipy, statsmodels, scikit-learn, etc. and estimate how much work this NEP causes those projects. If the devs of those packages are forced to do large scale migrations from RandomState to StableState, then why not instead keep RandomState and just add a new API next to it?

Ralf



 

StableRandom is intended as a crutch so that the pain of moving existing unit tests away from the deprecated RandomState is less onerous. I'd really rather people write better unit tests!

In particular, I do not want to add any of the integer-domain distributions (aside from shuffle/permutation/choice) as these are the ones that have the platform-dependency issues with respect to 32/64-bit `long` integers. They'd be unreliable for unit tests even if we kept them stable over time.
 
I'm not sure which other distributions are common enough and not easily reproducible by transformation. E.g. negative binomial can be reproduces by a gamma-poisson mixture.

On the other hand normal can be easily recreated from standard_normal.

I was mostly motivated by making it a bit easier to mechanically replace uses of randn(), which is probably even more common than normal() and standard_normal() in unit tests.
 
Would it be difficult to keep this list large, given that it should be frozen, low maintenance code ?

I admit that I had in mind non-statistical unit tests. That is, tests that didn't depend on the precise distribution of the inputs.

The problem is that the unit test in `stats` rely on precise inputs (up to some numerical noise).
For example p-values themselves are uniformly distributed if the hypothesis test works correctly. That mean if I don't have control over the inputs, then my p-value could be anything in (0, 1). So either we need a real dataset, save all the random numbers in a file or have a reproducible set of random numbers.

95% of the unit tests that I write are for statistics. A large fraction of them don't rely on the exact distribution, but do rely on a random numbers that are "good enough".
For example, when writing unit test, then I get every once in a while or sometimes more often a "bad" stream of random numbers, for which convergence might fail or where the estimated numbers are far away from the true numbers, so test tolerance would have to be very high.
If I pick one of the seeds that looks good, then I can have tighter unit test tolerance to insure results are good in a nice case.

The problem is that we cannot write robust unit tests for regression tests without stable inputs.
E.g. I verified my results with a Monte Carlo with 5000 replications and 1000 Poisson observations in each. 
Results look close to expected and won't depend much on the exact stream of random variables.
But the Monte Carlo for each variant of the test took about 40 seconds. Doing this for all option combination and dataset specification takes too long to be feasible in a unit test suite.
So I rely on numpy's stable random numbers and hard code the results for a specific random sample in the regression unit tests.

Josef

 
 
--
Robert Kern

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion



_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion



_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: NEP: Random Number Generator Policy

Charles R Harris
In reply to this post by Robert Kern-2


On Sat, Jun 2, 2018 at 1:04 PM, Robert Kern <[hidden email]> wrote:
As promised distressingly many months ago, I have written up a NEP about relaxing the stream-compatibility policy that we currently have.


I particularly invite comment on the two lists of methods that we still would make strict compatibility guarantees for.

---

==============================
Random Number Generator Policy
==============================

:Author: Robert Kern <[hidden email]>
:Status: Draft
:Type: Standards Track
:Created: 2018-05-24


Abstract
--------

For the past decade, NumPy has had a strict backwards compatibility policy for
the number stream of all of its random number distributions.  Unlike other
numerical components in ``numpy``, which are usually allowed to return
different when results when they are modified if they remain correct, we have
obligated the random number distributions to always produce the exact same
numbers in every version.  The objective of our stream-compatibility guarantee
was to provide exact reproducibility for simulations across numpy versions in
order to promote reproducible research.  However, this policy has made it very
difficult to enhance any of the distributions with faster or more accurate
algorithms.  After a decade of experience and improvements in the surrounding
ecosystem of scientific software, we believe that there are now better ways to
achieve these objectives.  We propose relaxing our strict stream-compatibility
policy to remove the obstacles that are in the way of accepting contributions
to our random number generation capabilities.


The Status Quo
--------------

Our current policy, in full:

    A fixed seed and a fixed series of calls to ``RandomState`` methods using the
    same parameters will always produce the same results up to roundoff error
    except when the values were incorrect.  Incorrect values will be fixed and
    the NumPy version in which the fix was made will be noted in the relevant
    docstring.  Extension of existing parameter ranges and the addition of new
    parameters is allowed as long the previous behavior remains unchanged.

This policy was first instated in Nov 2008 (in essence; the full set of weasel

Instituted?
 
words grew over time) in response to a user wanting to be sure that the
simulations that formed the basis of their scientific publication could be
reproduced years later, exactly, with whatever version of ``numpy`` that was
current at the time.  We were keen to support reproducible research, and it was
still early in the life of ``numpy.random``.  We had not seen much cause to
change the distribution methods all that much.

We also had not thought very thoroughly about the limits of what we really
could promise (and by “we” in this section, we really mean Robert Kern, let’s
be honest).  Despite all of the weasel words, our policy overpromises
compatibility.  The same version of ``numpy`` built on different platforms, or
just in a different way could cause changes in the stream, with varying degrees
of rarity.  The biggest is that the ``.multivariate_normal()`` method relies on
``numpy.linalg`` functions.  Even on the same platform, if one links ``numpy``
with a different LAPACK, ``.multivariate_normal()`` may well return completely
different results.  More rarely, building on a different OS or CPU can cause
differences in the stream.  We use C ``long`` integers internally for integer
distribution (it seemed like a good idea at the time), and those can vary in
size depending on the platform.  Distribution methods can overflow their
internal C ``longs`` at different breakpoints depending on the platform and
cause all of the random variate draws that follow to be different.

And even if all of that is controlled, our policy still does not provide exact
guarantees across versions.  We still do apply bug fixes when correctness is at
stake.  And even if we didn’t do that, any nontrivial program does more than
just draw random numbers.  They do computations on those numbers, transform
those with numerical algorithms from the rest of ``numpy``, which is not
subject to so strict a policy.  Trying to maintain stream-compatibility for our
random number distributions does not help reproducible research for these
reasons.

The standard practice now for bit-for-bit reproducible research is to pin all
of the versions of code of your software stack, possibly down to the OS itself.
The landscape for accomplishing this is much easier today than it was in 2008.
We now have ``pip``.  We now have virtual machines.  Those who need to
reproduce simulations exactly now can (and ought to) do so by using the exact
same version of ``numpy``.  We do not need to maintain stream-compatibility
across ``numpy`` versions to help them.

Our stream-compatibility guarantee has hindered our ability to make
improvements to ``numpy.random``.  Several first-time contributors have
submitted PRs to improve the distributions, usually by implementing a faster,
or more accurate algorithm than the one that is currently there.
Unfortunately, most of them would have required breaking the stream to do so.
Blocked by our policy, and our inability to work around that policy, many of
those contributors simply walked away.


Implementation
--------------

We propose first freezing ``RandomState`` as it is and developing a new RNG
subsystem alongside it.  This allows anyone who has been relying on our old
stream-compatibility guarantee to have plenty of time to migrate.
``RandomState`` will be considered deprecated, but with a long deprecation
cycle, at least a few years.  Deprecation warnings will start silent but become
increasingly noisy over time.  Bugs in the current state of the code will *not*
be fixed if fixing them would impact the stream.  However, if changes in the
rest of ``numpy`` would break something in the ``RandomState`` code, we will
fix ``RandomState`` to continue working (for example, some change in the
C API).  No new features will be added to ``RandomState``.  Users should
migrate to the new subsystem as they are able to.

Work on a proposed `new PRNG subsystem
<https://github.com/bashtage/randomgen>`_ is already underway.  The specifics
of the new design are out of scope for this NEP and up for much discussion, but
we will discuss general policies that will guide the evolution of whatever code
is adopted.

First, we will maintain API source compatibility just as we do with the rest of
``numpy``.  If we *must* make a breaking change, we will only do so with an
appropriate deprecation period and warnings.

Second, breaking stream-compatibility in order to introduce new features or
improve performance will be *allowed* with *caution*.  Such changes will be
considered features, and as such will be no faster than the standard release
cadence of features (i.e. on ``X.Y`` releases, never ``X.Y.Z``).  Slowness is
not a bug.  Correctness bug fixes that break stream-compatibility can happen on
bugfix releases, per usual, but developers should consider if they can wait
until the next feature release.  We encourage developers to strongly weight
user’s pain from the break in stream-compatibility against the improvements.
One example of a worthwhile improvement would be to change algorithms for
a significant increase in performance, for example, moving from the `Box-Muller
of Gaussian variate generation to the faster `Ziggurat algorithm
unworthy improvement would be tweaking the Ziggurat tables just a little bit.

Any new design for the RNG subsystem will provide a choice of different core
uniform PRNG algorithms.  We will be more strict about a select subset of
methods on these core PRNG objects.  They MUST guarantee stream-compatibility
for a minimal, specified set of methods which are chosen to make it easier to
compose them to build other distributions.  Namely,

    * ``.bytes()``
    * ``.random_uintegers()``
    * ``.random_sample()``

Furthermore, the new design should also provide one generator class (we shall
call it ``StableRandom`` for discussion purposes) that provides a slightly
broader subset of distribution methods for which stream-compatibility is
*guaranteed*.  The point of ``StableRandom`` is to provide something that can
be used in unit tests so projects that currently have tests which rely on the
precise stream can be migrated off of ``RandomState``.  For the best
transition, ``StableRandom`` should use as its core uniform PRNG the current
MT19937 algorithm.  As best as possible, the API for the distribution methods
that are provided on ``StableRandom`` should match their counterparts on
``RandomState``.  They should provide the same stream that the current version
of ``RandomState`` does.  Because their intended use is for unit tests, we do
not need the performance improvements from the new algorithms that will be
introduced by the new subsystem.

The list of ``StableRandom`` methods should be chosen to support unit tests:

    * ``.randint()``
    * ``.uniform()``
    * ``.normal()``
    * ``.standard_normal()``
    * ``.choice()``
    * ``.shuffle()``
    * ``.permutation()``


Not Versioning
--------------

For a long time, we considered that the way to allow algorithmic improvements
while maintaining the stream was to apply some form of versioning.  That is,
every time we make a stream change in one of the distributions, we increment
some version number somewhere.  ``numpy.random`` would keep all past versions
of the code, and there would be a way to get the old versions.  Proposals of
how to do this exactly varied widely, but we will not exhaustively list them
here.  We spent years going back and forth on these designs and were not able
to find one that sufficed.  Let that time lost, and more importantly, the
contributors that we lost while we dithered, serve as evidence against the
notion.

Concretely, adding in versioning makes maintenance of ``numpy.random``
difficult.  Necessarily, we would be keeping lots of versions of the same code
around.  Adding a new algorithm safely would still be quite hard.

But most importantly, versioning is fundamentally difficult to *use* correctly.
We want to make it easy and straightforward to get the latest, fastest, best
versions of the distribution algorithms; otherwise, what's the point?  The way
to make that easy is to make the latest the default.  But the default will
necessarily change from release to release, so the user’s code would need to be
altered anyway to specify the specific version that one wants to replicate.

Adding in versioning to maintain stream-compatibility would still only provide
the same level of stream-compatibility that we currently do, with all of the
limitations described earlier.  Given that the standard practice for such needs
is to pin the release of ``numpy`` as a whole, versioning ``RandomState`` alone
is superfluous.

This section is a bit unclear. Would it be correct to say that the rng version is the numpy version? If so, it might be best to say that up front before justifying it.
 


Discussion
----------



Copyright
---------

This document has been placed in the public domain.



Mostly off topic, but I note that the new module proposes integers of various lengths using the Python half open ranges. I would like to suggest that we modify that just a hair so we can specify the whole range in the integer interval specification. For instance, the full range of an 8 bit unsigned integer could be given as `(0, 0)`, i.e., (0, 255 + 1). This would be most useful for the biggest (64 bit) types, but I am more thinking of the case where sequences of ranges can be used.

Chuck 

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: NEP: Random Number Generator Policy

Warren Weckesser-2
In reply to this post by ralfgommers


On Sun, Jun 3, 2018 at 11:20 PM, Ralf Gommers <[hidden email]> wrote:


On Sun, Jun 3, 2018 at 6:54 PM, <[hidden email]> wrote:


On Sun, Jun 3, 2018 at 9:08 PM, Robert Kern <[hidden email]> wrote:
On Sun, Jun 3, 2018 at 5:46 PM <[hidden email]> wrote:


On Sun, Jun 3, 2018 at 8:21 PM, Robert Kern <[hidden email]> wrote:

The list of ``StableRandom`` methods should be chosen to support unit tests:

    * ``.randint()``
    * ``.uniform()``
    * ``.normal()``
    * ``.standard_normal()``
    * ``.choice()``
    * ``.shuffle()``
    * ``.permutation()``

@bashtage writes:
> standard_gamma and standard_exponential are important enough to be included here IMO.

"Importance" was not my criterion, only whether they are used in unit test suites. This list was just off the top of my head for methods that I think were actually used in test suites, so I'd be happy to be shown live tests that use other methods. I'd like to be a *little* conservative about what methods we stick in here, but we don't have to be *too* conservative, since we are explicitly never going to be modifying these.

That's one area where I thought the selection is too narrow.
We should be able to get a stable stream from the uniform for some distributions.

However, according to the Wikipedia description Poisson doesn't look easy. I just wrote a unit test for statsmodels using Poisson random numbers with hard coded numbers for the regression tests.

I'd really rather people do this than use StableRandom; this is best practice, as I see it, if your tests involve making precise comparisons to expected results.

I hardcoded the results not the random data. So the unit tests rely on a reproducible stream of Poisson random numbers.
I don't want to save 500 (100 or 1000) observations in a csv file for every variation of the unit test that I run.

I agree, hardcoding numbers in every place where seeded random numbers are now used is quite unrealistic.

It may be worth having a look at test suites for scipy, statsmodels, scikit-learn, etc. and estimate how much work this NEP causes those projects. If the devs of those packages are forced to do large scale migrations from RandomState to StableState, then why not instead keep RandomState and just add a new API next to it?



As a quick and imperfect test, I monkey-patched numpy so that a call to numpy.random.seed(m) actually uses m+1000 as the seed.  I ran the tests using the `runtests.py` script:

seed+1000, using 'python runtests.py -n' in the source directory:

  236 failed, 12881 passed, 1248 skipped, 585 deselected, 84 xfailed, 7 xpassed


Most of the failures are in scipy.stats:

seed+1000, using 'python runtests.py -n -s stats' in the source directory:

  203 failed, 1034 passed, 4 skipped, 370 deselected, 4 xfailed, 1 xpassed


Changing the amount added to the seed or running the tests using the function `scipy.test("full")` gives different (but similar magnitude) results:

seed+1000, using 'import scipy; scipy.test("full")' in an ipython shell:

  269 failed, 13359 passed, 1271 skipped, 134 xfailed, 8 xpassed 

seed+1, using 'python runtests.py -n' in the source directory:

  305 failed, 12812 passed, 1248 skipped, 585 deselected, 84 xfailed, 7 xpassed


I suspect many of the tests will be easy to update, so fixing 300 or so tests does not seem like a monumental task.  I haven't looked into why there are 585 deselected tests; maybe there are many more tests lurking there that will have to be updated.

Warren



Ralf



 

StableRandom is intended as a crutch so that the pain of moving existing unit tests away from the deprecated RandomState is less onerous. I'd really rather people write better unit tests!

In particular, I do not want to add any of the integer-domain distributions (aside from shuffle/permutation/choice) as these are the ones that have the platform-dependency issues with respect to 32/64-bit `long` integers. They'd be unreliable for unit tests even if we kept them stable over time.
 
I'm not sure which other distributions are common enough and not easily reproducible by transformation. E.g. negative binomial can be reproduces by a gamma-poisson mixture.

On the other hand normal can be easily recreated from standard_normal.

I was mostly motivated by making it a bit easier to mechanically replace uses of randn(), which is probably even more common than normal() and standard_normal() in unit tests.
 
Would it be difficult to keep this list large, given that it should be frozen, low maintenance code ?

I admit that I had in mind non-statistical unit tests. That is, tests that didn't depend on the precise distribution of the inputs.

The problem is that the unit test in `stats` rely on precise inputs (up to some numerical noise).
For example p-values themselves are uniformly distributed if the hypothesis test works correctly. That mean if I don't have control over the inputs, then my p-value could be anything in (0, 1). So either we need a real dataset, save all the random numbers in a file or have a reproducible set of random numbers.

95% of the unit tests that I write are for statistics. A large fraction of them don't rely on the exact distribution, but do rely on a random numbers that are "good enough".
For example, when writing unit test, then I get every once in a while or sometimes more often a "bad" stream of random numbers, for which convergence might fail or where the estimated numbers are far away from the true numbers, so test tolerance would have to be very high.
If I pick one of the seeds that looks good, then I can have tighter unit test tolerance to insure results are good in a nice case.

The problem is that we cannot write robust unit tests for regression tests without stable inputs.
E.g. I verified my results with a Monte Carlo with 5000 replications and 1000 Poisson observations in each. 
Results look close to expected and won't depend much on the exact stream of random variables.
But the Monte Carlo for each variant of the test took about 40 seconds. Doing this for all option combination and dataset specification takes too long to be feasible in a unit test suite.
So I rely on numpy's stable random numbers and hard code the results for a specific random sample in the regression unit tests.

Josef

 
 
--
Robert Kern

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion



_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion



_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion



_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
123