Re: overhauling numpy.random and randomgen Message-ID:

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: overhauling numpy.random and randomgen Message-ID:

bashtage
>  Rather than "base RNG", what about calling these classes a "random source"
or "random stream"? In particular, I would suggest defining two Python
classes:
> - np.random.Generator as a less redundant name for what is currently called
RandomGenerator
> - np.random.Source or np.random.Stream as an abstract base class for what
are currently called "base RNGs"

Naming is definitely hard.  Simple RNGs are currently called basic RNGs which was inspired by mkl-random. 'source' sounds OK to me, but sort of hides the fact that these are the actual Psuedo RNGs. `stream` has a technical meaning (a single PRNG make produce multiple independent streams) and IMO should be avoided since this might lead to confusion.  Perhaps source_rng (or in docs Source RNG)?

RandomGenerator is actually RandomTransformer, but I didn't like the latter name. 

> There are also a couple of convenience attributes in the user-facing API
that I would suggest refining:
>   - The "brng" attribute of RandomGenerator is not a very descriptive name. I
would prefer "stream" or "source", or the more explicit "base_rng" if we
stick with that term.
>   - I don't think we need the "generator" property on base RNG objects. It is
fine to require writing np.random.Generator(base) instead. Looking at the
implementation, .generator caches the RandomGenerator objects it creates on
the base RNG, which creates a reference cycle. Yes, Python can garbage
collect reference cycles, but this is still a muddled data model.

The attribute name should match the final (descriptive) name, whatever it is.  In RandomGen I am using the `basic_rng` attribute name, but this could be `source`.  I also use a property so that the attribute can have a docstring attached for use in IPython. I think this is more user-friendly.

I think dropping the `generator` property on the basic RNGs is reasonable.  It was a convenience but is awkward, and I always understood that it creates a cycle.

> Finally, why do we expose the np.random.gen object? I thought part of the
idea with the new API was to avoid global mutable state.

Module level functions are essential for quick experiments and should be provided.  The only difference here is that the singleton `seed`  and `state` are no longer exposed so that it isn't possible (using the exposed API) to set the seed.


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: overhauling numpy.random and randomgen Message-ID:

Neal Becker
The boost_random c++ library uses the terms 'generators' and
'distributions'.  Distributions are applied to generators.

On Fri, Apr 19, 2019 at 7:54 AM Kevin Sheppard
<[hidden email]> wrote:

>
> >  Rather than "base RNG", what about calling these classes a "random source"
> or "random stream"? In particular, I would suggest defining two Python
> classes:
> > - np.random.Generator as a less redundant name for what is currently called
> RandomGenerator
> > - np.random.Source or np.random.Stream as an abstract base class for what
> are currently called "base RNGs"
>
> Naming is definitely hard.  Simple RNGs are currently called basic RNGs which was inspired by mkl-random. 'source' sounds OK to me, but sort of hides the fact that these are the actual Psuedo RNGs. `stream` has a technical meaning (a single PRNG make produce multiple independent streams) and IMO should be avoided since this might lead to confusion.  Perhaps source_rng (or in docs Source RNG)?
>
> RandomGenerator is actually RandomTransformer, but I didn't like the latter name.
>
> > There are also a couple of convenience attributes in the user-facing API
> that I would suggest refining:
> >   - The "brng" attribute of RandomGenerator is not a very descriptive name. I
> would prefer "stream" or "source", or the more explicit "base_rng" if we
> stick with that term.
> >   - I don't think we need the "generator" property on base RNG objects. It is
> fine to require writing np.random.Generator(base) instead. Looking at the
> implementation, .generator caches the RandomGenerator objects it creates on
> the base RNG, which creates a reference cycle. Yes, Python can garbage
> collect reference cycles, but this is still a muddled data model.
>
> The attribute name should match the final (descriptive) name, whatever it is.  In RandomGen I am using the `basic_rng` attribute name, but this could be `source`.  I also use a property so that the attribute can have a docstring attached for use in IPython. I think this is more user-friendly.
>
> I think dropping the `generator` property on the basic RNGs is reasonable.  It was a convenience but is awkward, and I always understood that it creates a cycle.
>
> > Finally, why do we expose the np.random.gen object? I thought part of the
> idea with the new API was to avoid global mutable state.
>
> Module level functions are essential for quick experiments and should be provided.  The only difference here is that the singleton `seed`  and `state` are no longer exposed so that it isn't possible (using the exposed API) to set the seed.
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion



--
Those who don't understand recursion are doomed to repeat it
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: overhauling numpy.random and randomgen Message-ID:

Stephan Hoyer-2
On Fri, Apr 19, 2019 at 5:16 AM Neal Becker <[hidden email]> wrote:
The boost_random c++ library uses the terms 'generators' and
'distributions'.  Distributions are applied to generators.

"distributions" is a little confusing in the context of scipy.stats.distributions, which a distribution corresponds to a particular probability distribution.
 
On Fri, Apr 19, 2019 at 7:54 AM Kevin Sheppard
<[hidden email]> wrote:
>
> >  Rather than "base RNG", what about calling these classes a "random source"
> or "random stream"? In particular, I would suggest defining two Python
> classes:
> > - np.random.Generator as a less redundant name for what is currently called
> RandomGenerator
> > - np.random.Source or np.random.Stream as an abstract base class for what
> are currently called "base RNGs"
>
> Naming is definitely hard.  Simple RNGs are currently called basic RNGs which was inspired by mkl-random. 'source' sounds OK to me, but sort of hides the fact that these are the actual Psuedo RNGs. `stream` has a technical meaning (a single PRNG make produce multiple independent streams) and IMO should be avoided since this might lead to confusion.  Perhaps source_rng (or in docs Source RNG)?
>
> RandomGenerator is actually RandomTransformer, but I didn't like the latter name.
>
> > There are also a couple of convenience attributes in the user-facing API
> that I would suggest refining:
> >   - The "brng" attribute of RandomGenerator is not a very descriptive name. I
> would prefer "stream" or "source", or the more explicit "base_rng" if we
> stick with that term.
> >   - I don't think we need the "generator" property on base RNG objects. It is
> fine to require writing np.random.Generator(base) instead. Looking at the
> implementation, .generator caches the RandomGenerator objects it creates on
> the base RNG, which creates a reference cycle. Yes, Python can garbage
> collect reference cycles, but this is still a muddled data model.
>
> The attribute name should match the final (descriptive) name, whatever it is.  In RandomGen I am using the `basic_rng` attribute name, but this could be `source`.  I also use a property so that the attribute can have a docstring attached for use in IPython. I think this is more user-friendly.
>
> I think dropping the `generator` property on the basic RNGs is reasonable.  It was a convenience but is awkward, and I always understood that it creates a cycle.
>
> > Finally, why do we expose the np.random.gen object? I thought part of the
> idea with the new API was to avoid global mutable state.
>
> Module level functions are essential for quick experiments and should be provided.  The only difference here is that the singleton `seed`  and `state` are no longer exposed so that it isn't possible (using the exposed API) to set the seed.
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion



--
Those who don't understand recursion are doomed to repeat it
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: overhauling numpy.random and randomgen Message-ID:

Stephan Hoyer-2
In reply to this post by bashtage
On Fri, Apr 19, 2019 at 4:54 AM Kevin Sheppard <[hidden email]> wrote:
> Finally, why do we expose the np.random.gen object? I thought part of the
idea with the new API was to avoid global mutable state.

Module level functions are essential for quick experiments and should be provided.  The only difference here is that the singleton `seed`  and `state` are no longer exposed so that it isn't possible (using the exposed API) to set the seed.

It seems like you could set the seed with np.random.gen.brng.seed() ?

I do agree that module level functions are convenient and should be preserved (without the ability to seed them), but it seems like that would be an argument for exposing aliases to RandomGenerator methods (like what we currently do for RandomState methods), rather the exposing the full RandomGenerator itself.

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion