reseed random generator (1.19)

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

reseed random generator (1.19)

Neal Becker
Consider the following:

from numpy.random import default_rng
rs = default_rng()

Now how do I re-seed the generator?
I thought perhaps rs.bit_generator.seed(), but there is no such attribute.

Thanks,
Neal

--
Those who don't understand recursion are doomed to repeat it

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: reseed random generator (1.19)

bashtage
Just call

rs = default_rng()

Again. 



On Wed, Jun 24, 2020, 20:31 Neal Becker <[hidden email]> wrote:
Consider the following:

from numpy.random import default_rng
rs = default_rng()

Now how do I re-seed the generator?
I thought perhaps rs.bit_generator.seed(), but there is no such attribute.

Thanks,
Neal

--
Those who don't understand recursion are doomed to repeat it
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: reseed random generator (1.19)

Robert Kern-2
In reply to this post by Neal Becker
On Wed, Jun 24, 2020 at 3:31 PM Neal Becker <[hidden email]> wrote:
Consider the following:

from numpy.random import default_rng
rs = default_rng()

Now how do I re-seed the generator?
I thought perhaps rs.bit_generator.seed(), but there is no such attribute.

In general, reseeding an existing generator instance is not a good practice. What effect are you trying to accomplish? I assume that you are asking this because you are currently using `RandomState.seed()`. In what circumstances?

The raw `bit_generator.state` property *can* be assigned to, in order to support some advanced use cases (mostly involving de/serialization and similar kinds of meta-programming tasks). It's also been helpful for me to construct worst-case scenarios for testing parallel streams. But it quite deliberately bypasses the notion of deriving the state from a human-friendly seed number.
 
--
Robert Kern

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: reseed random generator (1.19)

Neal Becker
I was using this to reset the generator, in order to repeat the same sequence again for testing purposes.

On Wed, Jun 24, 2020 at 6:40 PM Robert Kern <[hidden email]> wrote:
On Wed, Jun 24, 2020 at 3:31 PM Neal Becker <[hidden email]> wrote:
Consider the following:

from numpy.random import default_rng
rs = default_rng()

Now how do I re-seed the generator?
I thought perhaps rs.bit_generator.seed(), but there is no such attribute.

In general, reseeding an existing generator instance is not a good practice. What effect are you trying to accomplish? I assume that you are asking this because you are currently using `RandomState.seed()`. In what circumstances?

The raw `bit_generator.state` property *can* be assigned to, in order to support some advanced use cases (mostly involving de/serialization and similar kinds of meta-programming tasks). It's also been helpful for me to construct worst-case scenarios for testing parallel streams. But it quite deliberately bypasses the notion of deriving the state from a human-friendly seed number.
 
--
Robert Kern
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion


--
Those who don't understand recursion are doomed to repeat it

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: reseed random generator (1.19)

bashtage

If you want to use the same entropy-initialized generator for temporarily-reproducible experiments, then you can use

 

gen = np.random.default_rng()

state = gen.bit_generator.state

gen.standard_normal()

# 0.5644742559549797, will vary across runs

gen.bit_generator.state = state

gen.standard_normal()

# Always the same as before 0.5644742559549797

 

The equivalent to the old way of calling seed to reseed is:

 

SEED = 918273645

gen = np.random.default_rng(SEED)

gen.standard_normal()

# 0.12345677

gen = np.random.default_rng(SEED)

gen.standard_normal()

# Identical value

 

Rather than reseeding the same object, you just create a new object. At some point in the development of Generator both methods were timed and there was no performance to reusing the same object by reseeding.

 

Kevin

 

 

 

From: [hidden email]
Sent: Monday, June 29, 2020 1:01 PM
To: [hidden email]
Subject: Re: [Numpy-discussion] reseed random generator (1.19)

 

I was using this to reset the generator, in order to repeat the same sequence again for testing purposes.

 

On Wed, Jun 24, 2020 at 6:40 PM Robert Kern <[hidden email]> wrote:

On Wed, Jun 24, 2020 at 3:31 PM Neal Becker <[hidden email]> wrote:

Consider the following:

 

from numpy.random import default_rng
rs = default_rng()

 

Now how do I re-seed the generator?

I thought perhaps rs.bit_generator.seed(), but there is no such attribute.

 

In general, reseeding an existing generator instance is not a good practice. What effect are you trying to accomplish? I assume that you are asking this because you are currently using `RandomState.seed()`. In what circumstances?

 

The raw `bit_generator.state` property *can* be assigned to, in order to support some advanced use cases (mostly involving de/serialization and similar kinds of meta-programming tasks). It's also been helpful for me to construct worst-case scenarios for testing parallel streams. But it quite deliberately bypasses the notion of deriving the state from a human-friendly seed number.

 

--

Robert Kern

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion


 

--

Those who don't understand recursion are doomed to repeat it

 


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: reseed random generator (1.19)

Evgeni Burovski
(apologies for jumping into a conversation)
So what is the recommendation for instantiating a number of generators
with manually controlled seeds?

The use case is running a series of MC simulations with reproducible
streams. The runs are independent and are run in parallel in separate
OS processes, where I do not control the time each process starts
(jobs are submitted to the batch queue), so default seeding seems
dubious?

Previously, I would just do roughly

seeds = [1234, 1235, 1236, ...]
rngs = [np.random.RandomState(seed) for seed in seeds]
...
and each process operates with its own `rng`.
What would be the recommended way with the new `Generator` framework?
A human-friendly way would be preferable if possible.

Thanks,

Evgeni


On Mon, Jun 29, 2020 at 3:20 PM Kevin Sheppard
<[hidden email]> wrote:

>
> If you want to use the same entropy-initialized generator for temporarily-reproducible experiments, then you can use
>
>
>
> gen = np.random.default_rng()
>
> state = gen.bit_generator.state
>
> gen.standard_normal()
>
> # 0.5644742559549797, will vary across runs
>
> gen.bit_generator.state = state
>
> gen.standard_normal()
>
> # Always the same as before 0.5644742559549797
>
>
>
> The equivalent to the old way of calling seed to reseed is:
>
>
>
> SEED = 918273645
>
> gen = np.random.default_rng(SEED)
>
> gen.standard_normal()
>
> # 0.12345677
>
> gen = np.random.default_rng(SEED)
>
> gen.standard_normal()
>
> # Identical value
>
>
>
> Rather than reseeding the same object, you just create a new object. At some point in the development of Generator both methods were timed and there was no performance to reusing the same object by reseeding.
>
>
>
> Kevin
>
>
>
>
>
>
>
> From: Neal Becker
> Sent: Monday, June 29, 2020 1:01 PM
> To: Discussion of Numerical Python
> Subject: Re: [Numpy-discussion] reseed random generator (1.19)
>
>
>
> I was using this to reset the generator, in order to repeat the same sequence again for testing purposes.
>
>
>
> On Wed, Jun 24, 2020 at 6:40 PM Robert Kern <[hidden email]> wrote:
>
> On Wed, Jun 24, 2020 at 3:31 PM Neal Becker <[hidden email]> wrote:
>
> Consider the following:
>
>
>
> from numpy.random import default_rng
> rs = default_rng()
>
>
>
> Now how do I re-seed the generator?
>
> I thought perhaps rs.bit_generator.seed(), but there is no such attribute.
>
>
>
> In general, reseeding an existing generator instance is not a good practice. What effect are you trying to accomplish? I assume that you are asking this because you are currently using `RandomState.seed()`. In what circumstances?
>
>
>
> The raw `bit_generator.state` property *can* be assigned to, in order to support some advanced use cases (mostly involving de/serialization and similar kinds of meta-programming tasks). It's also been helpful for me to construct worst-case scenarios for testing parallel streams. But it quite deliberately bypasses the notion of deriving the state from a human-friendly seed number.
>
>
>
> --
>
> Robert Kern
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
>
>
> --
>
> Those who don't understand recursion are doomed to repeat it
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: reseed random generator (1.19)

bashtage

The best practice is to use a SeedSequence to spawn child SeedSequences, and then to use these children to initialize your generators or bit generators.

 

 

from numpy.random import SeedSequence, Generator, PCG64, default_rng

 

entropy = 382193877439745928479635728

 

seed_seq = SeedSequence(entropy)

NUM_STREAMS = 2**15

children = seed_seq.spawn(NUM_STREAMS)

# if you want the current best bit generator, which may change

rngs = [default_rng(child) for child in children]

# If you want the most control across version, set the bit generator

# this uses PCG64, which is the current default. Each bit generator needs to be wrapped in a generator

rngs = [Generator(PCG64(child)) for child in children]

 

Kevin

 

 

From: [hidden email]
Sent: Monday, June 29, 2020 2:21 PM
To: [hidden email]
Subject: Re: [Numpy-discussion] reseed random generator (1.19)

 

(apologies for jumping into a conversation)

So what is the recommendation for instantiating a number of generators

with manually controlled seeds?

 

The use case is running a series of MC simulations with reproducible

streams. The runs are independent and are run in parallel in separate

OS processes, where I do not control the time each process starts

(jobs are submitted to the batch queue), so default seeding seems

dubious?

 

Previously, I would just do roughly

 

seeds = [1234, 1235, 1236, ...]

rngs = [np.random.RandomState(seed) for seed in seeds]

...

and each process operates with its own `rng`.

What would be the recommended way with the new `Generator` framework?

A human-friendly way would be preferable if possible.

 

Thanks,

 

Evgeni

 

 

On Mon, Jun 29, 2020 at 3:20 PM Kevin Sheppard

<[hidden email]> wrote:

> 

> If you want to use the same entropy-initialized generator for temporarily-reproducible experiments, then you can use

> 

> 

> 

> gen = np.random.default_rng()

> 

> state = gen.bit_generator.state

> 

> gen.standard_normal()

> 

> # 0.5644742559549797, will vary across runs

> 

> gen.bit_generator.state = state

> 

> gen.standard_normal()

> 

> # Always the same as before 0.5644742559549797

> 

> 

> 

> The equivalent to the old way of calling seed to reseed is:

> 

> 

> 

> SEED = 918273645

> 

> gen = np.random.default_rng(SEED)

> 

> gen.standard_normal()

> 

> # 0.12345677

> 

> gen = np.random.default_rng(SEED)

> 

> gen.standard_normal()

> 

> # Identical value

> 

> 

> 

> Rather than reseeding the same object, you just create a new object. At some point in the development of Generator both methods were timed and there was no performance to reusing the same object by reseeding.

> 

> 

> 

> Kevin

> 

> 

> 

> 

> 

> 

> 

> From: Neal Becker

> Sent: Monday, June 29, 2020 1:01 PM

> To: Discussion of Numerical Python

> Subject: Re: [Numpy-discussion] reseed random generator (1.19)

> 

> 

> 

> I was using this to reset the generator, in order to repeat the same sequence again for testing purposes.

> 

> 

> 

> On Wed, Jun 24, 2020 at 6:40 PM Robert Kern <[hidden email]> wrote:

> 

> On Wed, Jun 24, 2020 at 3:31 PM Neal Becker <[hidden email]> wrote:

> 

> Consider the following:

> 

> 

> 

> from numpy.random import default_rng

> rs = default_rng()

> 

> 

> 

> Now how do I re-seed the generator?

> 

> I thought perhaps rs.bit_generator.seed(), but there is no such attribute.

> 

> 

> 

> In general, reseeding an existing generator instance is not a good practice. What effect are you trying to accomplish? I assume that you are asking this because you are currently using `RandomState.seed()`. In what circumstances?

> 

> 

> 

> The raw `bit_generator.state` property *can* be assigned to, in order to support some advanced use cases (mostly involving de/serialization and similar kinds of meta-programming tasks). It's also been helpful for me to construct worst-case scenarios for testing parallel streams. But it quite deliberately bypasses the notion of deriving the state from a human-friendly seed number.

> 

> 

> 

> --

> 

> Robert Kern

> 

> _______________________________________________

> NumPy-Discussion mailing list

> [hidden email]

> https://mail.python.org/mailman/listinfo/numpy-discussion

> 

> 

> 

> 

> --

> 

> Those who don't understand recursion are doomed to repeat it

> 

> 

> 

> _______________________________________________

> NumPy-Discussion mailing list

> [hidden email]

> https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________

NumPy-Discussion mailing list

[hidden email]

https://mail.python.org/mailman/listinfo/numpy-discussion

 


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: reseed random generator (1.19)

mattip
Administrator

On 6/29/20 5:37 PM, Kevin Sheppard wrote:

The best practice is to use a SeedSequence to spawn child SeedSequences, and then to use these children to initialize your generators or bit generators.

 

 

from numpy.random import SeedSequence, Generator, PCG64, default_rng

 

entropy = 382193877439745928479635728

 

seed_seq = SeedSequence(entropy)

NUM_STREAMS = 2**15

children = seed_seq.spawn(NUM_STREAMS)

# if you want the current best bit generator, which may change

rngs = [default_rng(child) for child in children]

# If you want the most control across version, set the bit generator

# this uses PCG64, which is the current default. Each bit generator needs to be wrapped in a generator

rngs = [Generator(PCG64(child)) for child in children]

 

Kevin


I guess something is wrong with the docs, what can we do to make this page more discoverable?

https://numpy.org/devdocs/reference/random/parallel.html#seedsequence-spawning


Matti


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: reseed random generator (1.19)

Robert Kern-2
In reply to this post by Neal Becker
On Mon, Jun 29, 2020 at 8:02 AM Neal Becker <[hidden email]> wrote:
I was using this to reset the generator, in order to repeat the same sequence again for testing purposes.

In general, you should just pass in a new Generator that was created with the same seed.

def function_to_test(rg):
    x = rg.standard_normal()
    ...

SEED = 12345...

rg = np.random.default_rng(SEED)
function_to_test(rg)
rg = npp.random.default_rng(SEED)
function_to_test(rg)

Resetting the state of the underlying BitGenerator in-place is possible, as Kevin showed, but if you can refactor your code so that there isn't a persistent Generator object between these runs, that's probably better. It's a code smell if you can't just pass in a fresh Generator; in general, it means that your code is harder to use, not just because we don't expose an in-place seed() method.

--
Robert Kern

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: reseed random generator (1.19)

Evgeni Burovski
In reply to this post by bashtage
Thanks Kevin!

A possibly dumb follow-up question: in your example,

> entropy = 382193877439745928479635728

is it relevant that `entropy` is a long integer? I.e., what are the
constraints on its value, can one use

entropy = 1234 or
entropy = 0 or
entropy = 1

instead?




On Mon, Jun 29, 2020 at 5:37 PM Kevin Sheppard
<[hidden email]> wrote:

>
> The best practice is to use a SeedSequence to spawn child SeedSequences, and then to use these children to initialize your generators or bit generators.
>
>
>
>
>
> from numpy.random import SeedSequence, Generator, PCG64, default_rng
>
>
>
> entropy = 382193877439745928479635728
>
>
>
> seed_seq = SeedSequence(entropy)
>
> NUM_STREAMS = 2**15
>
> children = seed_seq.spawn(NUM_STREAMS)
>
> # if you want the current best bit generator, which may change
>
> rngs = [default_rng(child) for child in children]
>
> # If you want the most control across version, set the bit generator
>
> # this uses PCG64, which is the current default. Each bit generator needs to be wrapped in a generator
>
> rngs = [Generator(PCG64(child)) for child in children]
>
>
>
> Kevin
>
>
>
>
>
> From: Evgeni Burovski
> Sent: Monday, June 29, 2020 2:21 PM
> To: Discussion of Numerical Python
> Subject: Re: [Numpy-discussion] reseed random generator (1.19)
>
>
>
> (apologies for jumping into a conversation)
>
> So what is the recommendation for instantiating a number of generators
>
> with manually controlled seeds?
>
>
>
> The use case is running a series of MC simulations with reproducible
>
> streams. The runs are independent and are run in parallel in separate
>
> OS processes, where I do not control the time each process starts
>
> (jobs are submitted to the batch queue), so default seeding seems
>
> dubious?
>
>
>
> Previously, I would just do roughly
>
>
>
> seeds = [1234, 1235, 1236, ...]
>
> rngs = [np.random.RandomState(seed) for seed in seeds]
>
> ...
>
> and each process operates with its own `rng`.
>
> What would be the recommended way with the new `Generator` framework?
>
> A human-friendly way would be preferable if possible.
>
>
>
> Thanks,
>
>
>
> Evgeni
>
>
>
>
>
> On Mon, Jun 29, 2020 at 3:20 PM Kevin Sheppard
>
> <[hidden email]> wrote:
>
> >
>
> > If you want to use the same entropy-initialized generator for temporarily-reproducible experiments, then you can use
>
> >
>
> >
>
> >
>
> > gen = np.random.default_rng()
>
> >
>
> > state = gen.bit_generator.state
>
> >
>
> > gen.standard_normal()
>
> >
>
> > # 0.5644742559549797, will vary across runs
>
> >
>
> > gen.bit_generator.state = state
>
> >
>
> > gen.standard_normal()
>
> >
>
> > # Always the same as before 0.5644742559549797
>
> >
>
> >
>
> >
>
> > The equivalent to the old way of calling seed to reseed is:
>
> >
>
> >
>
> >
>
> > SEED = 918273645
>
> >
>
> > gen = np.random.default_rng(SEED)
>
> >
>
> > gen.standard_normal()
>
> >
>
> > # 0.12345677
>
> >
>
> > gen = np.random.default_rng(SEED)
>
> >
>
> > gen.standard_normal()
>
> >
>
> > # Identical value
>
> >
>
> >
>
> >
>
> > Rather than reseeding the same object, you just create a new object. At some point in the development of Generator both methods were timed and there was no performance to reusing the same object by reseeding.
>
> >
>
> >
>
> >
>
> > Kevin
>
> >
>
> >
>
> >
>
> >
>
> >
>
> >
>
> >
>
> > From: Neal Becker
>
> > Sent: Monday, June 29, 2020 1:01 PM
>
> > To: Discussion of Numerical Python
>
> > Subject: Re: [Numpy-discussion] reseed random generator (1.19)
>
> >
>
> >
>
> >
>
> > I was using this to reset the generator, in order to repeat the same sequence again for testing purposes.
>
> >
>
> >
>
> >
>
> > On Wed, Jun 24, 2020 at 6:40 PM Robert Kern <[hidden email]> wrote:
>
> >
>
> > On Wed, Jun 24, 2020 at 3:31 PM Neal Becker <[hidden email]> wrote:
>
> >
>
> > Consider the following:
>
> >
>
> >
>
> >
>
> > from numpy.random import default_rng
>
> > rs = default_rng()
>
> >
>
> >
>
> >
>
> > Now how do I re-seed the generator?
>
> >
>
> > I thought perhaps rs.bit_generator.seed(), but there is no such attribute.
>
> >
>
> >
>
> >
>
> > In general, reseeding an existing generator instance is not a good practice. What effect are you trying to accomplish? I assume that you are asking this because you are currently using `RandomState.seed()`. In what circumstances?
>
> >
>
> >
>
> >
>
> > The raw `bit_generator.state` property *can* be assigned to, in order to support some advanced use cases (mostly involving de/serialization and similar kinds of meta-programming tasks). It's also been helpful for me to construct worst-case scenarios for testing parallel streams. But it quite deliberately bypasses the notion of deriving the state from a human-friendly seed number.
>
> >
>
> >
>
> >
>
> > --
>
> >
>
> > Robert Kern
>
> >
>
> > _______________________________________________
>
> > NumPy-Discussion mailing list
>
> > [hidden email]
>
> > https://mail.python.org/mailman/listinfo/numpy-discussion
>
> >
>
> >
>
> >
>
> >
>
> > --
>
> >
>
> > Those who don't understand recursion are doomed to repeat it
>
> >
>
> >
>
> >
>
> > _______________________________________________
>
> > NumPy-Discussion mailing list
>
> > [hidden email]
>
> > https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
>
> NumPy-Discussion mailing list
>
> [hidden email]
>
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: reseed random generator (1.19)

bashtage

It can be anything, but “good practice” is to use a number that would have 2 properties:

 

  1. When expressed as binary number, it would have a large number of both 0s and 1s
  2. The total number of digits in the binary representation is somewhere between 32 and 128.

 

The binary representation of the one I chose (by mashing numbers) is:

 

'0b10011110000100100101100111001110010001101111111001100111100011000101001111111100100010000'

 

This has 43 0s and 46 1s.

 

Many people just use 0, which is fine in the sense that the stream should have the same properties as if any of 2**N number were chosen. Simple choices so, however, have a slight consequence in the sense that these generate strange dependence across researchers if everyone uses a small number of seeds (e.g., 0 or 1234).

 

Kevin

 

 

From: [hidden email]
Sent: Monday, June 29, 2020 4:01 PM
To: [hidden email]
Subject: Re: [Numpy-discussion] reseed random generator (1.19)

 

Thanks Kevin!

 

A possibly dumb follow-up question: in your example,

 

> entropy = 382193877439745928479635728

 

is it relevant that `entropy` is a long integer? I.e., what are the

constraints on its value, can one use

 

entropy = 1234 or

entropy = 0 or

entropy = 1

 

instead?

 

 

 

 

On Mon, Jun 29, 2020 at 5:37 PM Kevin Sheppard

<[hidden email]> wrote:

> 

> The best practice is to use a SeedSequence to spawn child SeedSequences, and then to use these children to initialize your generators or bit generators.

> 

> 

> 

> 

> 

> from numpy.random import SeedSequence, Generator, PCG64, default_rng

> 

> 

> 

> entropy = 382193877439745928479635728

> 

> 

> 

> seed_seq = SeedSequence(entropy)

> 

> NUM_STREAMS = 2**15

> 

> children = seed_seq.spawn(NUM_STREAMS)

> 

> # if you want the current best bit generator, which may change

> 

> rngs = [default_rng(child) for child in children]

> 

> # If you want the most control across version, set the bit generator

> 

> # this uses PCG64, which is the current default. Each bit generator needs to be wrapped in a generator

> 

> rngs = [Generator(PCG64(child)) for child in children]

> 

> 

> 

> Kevin

> 

> 

> 

> 

> 

> From: Evgeni Burovski

> Sent: Monday, June 29, 2020 2:21 PM

> To: Discussion of Numerical Python

> Subject: Re: [Numpy-discussion] reseed random generator (1.19)

> 

> 

> 

> (apologies for jumping into a conversation)

> 

> So what is the recommendation for instantiating a number of generators

> 

> with manually controlled seeds?

> 

> 

> 

> The use case is running a series of MC simulations with reproducible

> 

> streams. The runs are independent and are run in parallel in separate

> 

> OS processes, where I do not control the time each process starts

> 

> (jobs are submitted to the batch queue), so default seeding seems

> 

> dubious?

> 

> 

> 

> Previously, I would just do roughly

> 

> 

> 

> seeds = [1234, 1235, 1236, ...]

> 

> rngs = [np.random.RandomState(seed) for seed in seeds]

> 

> ...

> 

> and each process operates with its own `rng`.

> 

> What would be the recommended way with the new `Generator` framework?

> 

> A human-friendly way would be preferable if possible.

> 

> 

> 

> Thanks,

> 

> 

> 

> Evgeni

> 

> 

> 

> 

> 

> On Mon, Jun 29, 2020 at 3:20 PM Kevin Sheppard

> 

> <[hidden email]> wrote:

> 

> >

> 

> > If you want to use the same entropy-initialized generator for temporarily-reproducible experiments, then you can use

> 

> >

> 

> >

> 

> >

> 

> > gen = np.random.default_rng()

> 

> >

> 

> > state = gen.bit_generator.state

> 

> >

> 

> > gen.standard_normal()

> 

> >

> 

> > # 0.5644742559549797, will vary across runs

> 

> >

> 

> > gen.bit_generator.state = state

> 

> >

> 

> > gen.standard_normal()

> 

> >

> 

> > # Always the same as before 0.5644742559549797

> 

> >

> 

> >

> 

> >

> 

> > The equivalent to the old way of calling seed to reseed is:

> 

> >

> 

> >

> 

> >

> 

> > SEED = 918273645

> 

> >

> 

> > gen = np.random.default_rng(SEED)

> 

> >

> 

> > gen.standard_normal()

> 

> >

> 

> > # 0.12345677

> 

> >

> 

> > gen = np.random.default_rng(SEED)

> 

> >

> 

> > gen.standard_normal()

> 

> >

> 

> > # Identical value

> 

> >

> 

> >

> 

> >

> 

> > Rather than reseeding the same object, you just create a new object. At some point in the development of Generator both methods were timed and there was no performance to reusing the same object by reseeding.

> 

> >

> 

> >

> 

> >

> 

> > Kevin

> 

> >

> 

> >

> 

> >

> 

> >

> 

> >

> 

> >

> 

> >

> 

> > From: Neal Becker

> 

> > Sent: Monday, June 29, 2020 1:01 PM

> 

> > To: Discussion of Numerical Python

> 

> > Subject: Re: [Numpy-discussion] reseed random generator (1.19)

> 

> >

> 

> >

> 

> >

> 

> > I was using this to reset the generator, in order to repeat the same sequence again for testing purposes.

> 

> >

> 

> >

> 

> >

> 

> > On Wed, Jun 24, 2020 at 6:40 PM Robert Kern <[hidden email]> wrote:

> 

> >

> 

> > On Wed, Jun 24, 2020 at 3:31 PM Neal Becker <[hidden email]> wrote:

> 

> >

> 

> > Consider the following:

> 

> >

> 

> >

> 

> >

> 

> > from numpy.random import default_rng

> 

> > rs = default_rng()

> 

> >

> 

> >

> 

> >

> 

> > Now how do I re-seed the generator?

> 

> >

> 

> > I thought perhaps rs.bit_generator.seed(), but there is no such attribute.

> 

> >

> 

> >

> 

> >

> 

> > In general, reseeding an existing generator instance is not a good practice. What effect are you trying to accomplish? I assume that you are asking this because you are currently using `RandomState.seed()`. In what circumstances?

> 

> >

> 

> >

> 

> >

> 

> > The raw `bit_generator.state` property *can* be assigned to, in order to support some advanced use cases (mostly involving de/serialization and similar kinds of meta-programming tasks). It's also been helpful for me to construct worst-case scenarios for testing parallel streams. But it quite deliberately bypasses the notion of deriving the state from a human-friendly seed number.

> 

> >

> 

> >

> 

> >

> 

> > --

> 

> >

> 

> > Robert Kern

> 

> >

> 

> > _______________________________________________

> 

> > NumPy-Discussion mailing list

> 

> > [hidden email]

> 

> > https://mail.python.org/mailman/listinfo/numpy-discussion

> 

> >

> 

> >

> 

> >

> 

> >

> 

> > --

> 

> >

> 

> > Those who don't understand recursion are doomed to repeat it

> 

> >

> 

> >

> 

> >

> 

> > _______________________________________________

> 

> > NumPy-Discussion mailing list

> 

> > [hidden email]

> 

> > https://mail.python.org/mailman/listinfo/numpy-discussion

> 

> _______________________________________________

> 

> NumPy-Discussion mailing list

> 

> [hidden email]

> 

> https://mail.python.org/mailman/listinfo/numpy-discussion

> 

> 

> 

> _______________________________________________

> NumPy-Discussion mailing list

> [hidden email]

> https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________

NumPy-Discussion mailing list

[hidden email]

https://mail.python.org/mailman/listinfo/numpy-discussion

 


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: reseed random generator (1.19)

Robert Kern-2
On Mon, Jun 29, 2020 at 11:10 AM Kevin Sheppard <[hidden email]> wrote:

It can be anything, but “good practice” is to use a number that would have 2 properties:

 

  1. When expressed as binary number, it would have a large number of both 0s and 1s

The properties of the SeedSequence algorithm render this irrelevant, fortunately. While there are seed numbers that might create "bad" outputs from SeedSequence with overly low or high Hamming weight (number of 1s), they are scattered around the input space so you have to adversarially reverse the SeedSequence algorithm to find them. IMO, the only reason to avoid seed numbers like this has more to do with the fact that there are a relatively small number of these seeds. If you are deliberately picking from that small set somehow, it's more likely that other researchers are too, and you are more likely to reuse that same seed.
 
  1. The total number of digits in the binary representation is somewhere between 32 and 128.

I like using the standard library `secrets` module.

>>> import secrets
>>> secrets.randbelow(1<<128)
8080125189471896523368405732926911908
 
If you want an easy-to-follow rule, just use the above snippet to get a 128-bit number. More than 128 bits won't do you any good (at least by default, the internal bottleneck inside of SeedSequence is a 128-bit pool), and 128-bit numbers are just about small enough to copy-paste comfortably.

We have thought about wrapping that up in a numpy.random function (e.g. `np.random.simple_seed()` or something like that) for convenience, but we wanted to wait a bit before commiting to an API.

--
Robert Kern

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: reseed random generator (1.19)

Robert Kern-2
On Mon, Jun 29, 2020 at 11:30 AM Robert Kern <[hidden email]> wrote:
On Mon, Jun 29, 2020 at 11:10 AM Kevin Sheppard <[hidden email]> wrote:
  1. The total number of digits in the binary representation is somewhere between 32 and 128.

I like using the standard library `secrets` module.

>>> import secrets
>>> secrets.randbelow(1<<128)
8080125189471896523368405732926911908
 
If you want an easy-to-follow rule, just use the above snippet to get a 128-bit number. More than 128 bits won't do you any good (at least by default, the internal bottleneck inside of SeedSequence is a 128-bit pool), and 128-bit numbers are just about small enough to copy-paste comfortably.

Sorry, `secrets.randbits(128)` is the cleaner form of this. 

--
Robert Kern

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: reseed random generator (1.19)

Evgeni Burovski
Thanks Kevin, thanks Robert, this is very helpful!

I'd strongly agree with Matti that your explanations could/should make
it to the docs. Maybe it's something for the GSoD.

While we're on the subject, one comment and two (hopefully last) questions:

1. My two cents w.r.t. `np.random.simple_seed()` function Robert
mentioned: I personally would find it way more confusing than a clear
explanation + example in the docs. I'd ask myself what's "simple"
here, click through to the source of this `simple_seed`, find out that
it's a docsting and a two-liner, and just copy-paste the latter into
my user code. Again, just FWIW.

2. What would be a preferred way of spelling out "give me the N-th
spawned child SeedSequence"?
The use case is that I prepare (human-readable) input files once and
run a number of computational jobs in separate OS processes. From what
Kevin said, I can of course five each worker a pair of (entropy,
worker_id) and then each of them does at startup

> parent_seq = SeedSequence(entropy)
> this_sequence = seed_seq.spawn(worker_id)[worker_id]

Is this a recommended way, or is there a better API? Or does the
number of spawned children need to be known beforehand?
I'd much rather avoid serialization/deserialization if possible.

3. Is there a way of telling the number of draws a generator did?

The use case is to checkpoint the number of draws and `.advance` the
bit generator when resuming from the checkpoint. (The runs are longer
then the batch queue limits).

Thanks!

Evgeni

On Mon, Jun 29, 2020 at 11:06 PM Robert Kern <[hidden email]> wrote:

>
> On Mon, Jun 29, 2020 at 11:30 AM Robert Kern <[hidden email]> wrote:
>>
>> On Mon, Jun 29, 2020 at 11:10 AM Kevin Sheppard <[hidden email]> wrote:
>>>
>>> The total number of digits in the binary representation is somewhere between 32 and 128.
>>
>>
>> I like using the standard library `secrets` module.
>>
>> >>> import secrets
>> >>> secrets.randbelow(1<<128)
>> 8080125189471896523368405732926911908
>>
>> If you want an easy-to-follow rule, just use the above snippet to get a 128-bit number. More than 128 bits won't do you any good (at least by default, the internal bottleneck inside of SeedSequence is a 128-bit pool), and 128-bit numbers are just about small enough to copy-paste comfortably.
>
>
> Sorry, `secrets.randbits(128)` is the cleaner form of this.
>
> --
> Robert Kern
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: reseed random generator (1.19)

Robert Kern-2
On Sat, Jul 4, 2020 at 1:03 PM Evgeni Burovski <[hidden email]> wrote:
Thanks Kevin, thanks Robert, this is very helpful!

I'd strongly agree with Matti that your explanations could/should make
it to the docs. Maybe it's something for the GSoD.

While we're on the subject, one comment and two (hopefully last) questions:

1. My two cents w.r.t. `np.random.simple_seed()` function Robert
mentioned: I personally would find it way more confusing than a clear
explanation + example in the docs. I'd ask myself what's "simple"
here, click through to the source of this `simple_seed`, find out that
it's a docsting and a two-liner, and just copy-paste the latter into
my user code. Again, just FWIW.

Noted.
 
2. What would be a preferred way of spelling out "give me the N-th
spawned child SeedSequence"?
The use case is that I prepare (human-readable) input files once and
run a number of computational jobs in separate OS processes. From what
Kevin said, I can of course five each worker a pair of (entropy,
worker_id) and then each of them does at startup

> parent_seq = SeedSequence(entropy)
> this_sequence = seed_seq.spawn(worker_id)[worker_id]

Is this a recommended way, or is there a better API? Or does the
number of spawned children need to be known beforehand?
I'd much rather avoid serialization/deserialization if possible.

Assuming that `worker_id` starts at 0:

  this_sequence = SeedSequence(entropy, spawn_key=(worker_id,))
 
3. Is there a way of telling the number of draws a generator did?

The use case is to checkpoint the number of draws and `.advance` the
bit generator when resuming from the checkpoint. (The runs are longer
then the batch queue limits).

There are computations you can do on the internal state of PCG64 and Philox to get this information, but not in general, no. I do recommend serializing the Generator or BitGenerator (or at least the BitGenerator's .state property, which is a nice JSONable dict for PCG64) for checkpointing purposes. Among other things, there is a cached uint32 for when odd numbers of uint32s are drawn that you might need to handle. The state of the default PCG64 is much smaller than MT19937. It's less work and more reliable than computing that distance and storing the original seed and the distance.

--
Robert Kern

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: reseed random generator (1.19)

Neal Becker


On Sat, Jul 4, 2020 at 1:56 PM Robert Kern <[hidden email]> wrote:
....

3. Is there a way of telling the number of draws a generator did?

The use case is to checkpoint the number of draws and `.advance` the
bit generator when resuming from the checkpoint. (The runs are longer
then the batch queue limits).

There are computations you can do on the internal state of PCG64 and Philox to get this information, but not in general, no. I do recommend serializing the Generator or BitGenerator (or at least the BitGenerator's .state property, which is a nice JSONable dict for PCG64) for checkpointing purposes. Among other things, there is a cached uint32 for when odd numbers of uint32s are drawn that you might need to handle. The state of the default PCG64 is much smaller than MT19937. It's less work and more reliable than computing that distance and storing the original seed and the distance.

--
Robert Kern

Sorry, you lost me here.  If I want to save, restore the state of a generator, can I use pickle/unpickle?
 

--
Those who don't understand recursion are doomed to repeat it

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: reseed random generator (1.19)

Robert Kern-2


On Sat, Jul 4, 2020, 2:39 PM Neal Becker <[hidden email]> wrote:


On Sat, Jul 4, 2020 at 1:56 PM Robert Kern <[hidden email]> wrote:
....

3. Is there a way of telling the number of draws a generator did?

The use case is to checkpoint the number of draws and `.advance` the
bit generator when resuming from the checkpoint. (The runs are longer
then the batch queue limits).

There are computations you can do on the internal state of PCG64 and Philox to get this information, but not in general, no. I do recommend serializing the Generator or BitGenerator (or at least the BitGenerator's .state property, which is a nice JSONable dict for PCG64) for checkpointing purposes. Among other things, there is a cached uint32 for when odd numbers of uint32s are drawn that you might need to handle. The state of the default PCG64 is much smaller than MT19937. It's less work and more reliable than computing that distance and storing the original seed and the distance.

--
Robert Kern

Sorry, you lost me here.  If I want to save, restore the state of a generator, can I use pickle/unpickle?

Absolutely.


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion