locking np.random.Generator in a cython nogil context?

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

locking np.random.Generator in a cython nogil context?

Evgeni Burovski
Hi,

What would be the correct way of locking the bit generator of
np.random.Generator in cython's nogil context?
(This is threading 101, surely, so please forgive my ignorance).

The docs for extending np.random.Generator in cython
(https://numpy.org/doc/stable/reference/random/extending.html#cython)
recommend the following idiom for generating uniform variates, where
the GIL is released and a Generator-specific lock is held:

x = PCG64()
rng = <bitgen_t *> PyCapsule_GetPointer(x.capsule, capsule_name)
with nogil, x.lock:
    rng.next_double(rng.state)

What is the correct way of locking it when already in the nogil
section (so that x.lock is not accessible)?

The use case is a long-running MC process which generates random
variates in a tight loop (so the loop is all nogil). In case it
matters, I probably won't be using python threads, but may use
multiprocessing.

Basically,

    cdef double uniform(self) nogil:
        if self.idx >= self.buf.shape[0]:
            self._fill()
        cdef double value = self.buf[self.idx]
        self.idx += 1
        return value

    cdef void _fill(self) nogil:
        self.idx = 0
        # HERE: Lock ?
        for i in range(self.buf.shape[0]):
            self.buf[i] = self.rng.next_double(self.rng.state)


Thanks,
Evgeni


P.S. The full cdef class, for completeness:

cdef class RndmWrapper():
    cdef:
        double[::1] buf
        Py_ssize_t idx
        bitgen_t *rng
        object py_gen  # keep the garbage collector away

   def __init__(self, seed=(1234, 0), buf_size=4096, bitgen_kind=None):
        if bitgen_kind is None:
            bitgen_kind = PCG64

        # cf Numpy-discussion list, K.~Sheppard, R.~Kern, June 29,
2020 and below
        # https://mail.python.org/pipermail/numpy-discussion/2020-June/080794.html
        entropy, num = seed
        seed_seq = SeedSequence(entropy, spawn_key=(num,))
        py_gen = bitgen_kind(seed_seq)

        # store the python object to avoid it being garbage collected
        self.py_gen = py_gen

        capsule = py_gen.capsule
        self.rng = <bitgen_t *>PyCapsule_GetPointer(capsule, capsule_name)
        if not PyCapsule_IsValid(capsule, capsule_name):
            raise ValueError("Invalid pointer to anon_func_state")

        self.buf = np.empty(buf_size, dtype='float64')
        self._fill()

    @cython.boundscheck(False)
    @cython.wraparound(False)
    cdef void _fill(self) nogil:
        self.idx = 0
        for i in range(self.buf.shape[0]):
            self.buf[i] = self.rng.next_double(self.rng.state)

    @cython.boundscheck(False)
    @cython.wraparound(False)
    cdef double uniform(self) nogil:
        if self.idx >= self.buf.shape[0]:
            self._fill()
        cdef double value = self.buf[self.idx]
        self.idx += 1
        return value
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: locking np.random.Generator in a cython nogil context?

bashtage
You need to reacquire the gil, then you can get the lock and rerelease the gil.  

I think this works (on phone, so untested)

with gil:
    with nogil, lock:
        ...

Kevin


On Mon, Dec 14, 2020, 13:37 Evgeni Burovski <[hidden email]> wrote:
Hi,

What would be the correct way of locking the bit generator of
np.random.Generator in cython's nogil context?
(This is threading 101, surely, so please forgive my ignorance).

The docs for extending np.random.Generator in cython
(https://numpy.org/doc/stable/reference/random/extending.html#cython)
recommend the following idiom for generating uniform variates, where
the GIL is released and a Generator-specific lock is held:

x = PCG64()
rng = <bitgen_t *> PyCapsule_GetPointer(x.capsule, capsule_name)
with nogil, x.lock:
    rng.next_double(rng.state)

What is the correct way of locking it when already in the nogil
section (so that x.lock is not accessible)?

The use case is a long-running MC process which generates random
variates in a tight loop (so the loop is all nogil). In case it
matters, I probably won't be using python threads, but may use
multiprocessing.

Basically,

    cdef double uniform(self) nogil:
        if self.idx >= self.buf.shape[0]:
            self._fill()
        cdef double value = self.buf[self.idx]
        self.idx += 1
        return value

    cdef void _fill(self) nogil:
        self.idx = 0
        # HERE: Lock ?
        for i in range(self.buf.shape[0]):
            self.buf[i] = self.rng.next_double(self.rng.state)


Thanks,
Evgeni


P.S. The full cdef class, for completeness:

cdef class RndmWrapper():
    cdef:
        double[::1] buf
        Py_ssize_t idx
        bitgen_t *rng
        object py_gen  # keep the garbage collector away

   def __init__(self, seed=(1234, 0), buf_size=4096, bitgen_kind=None):
        if bitgen_kind is None:
            bitgen_kind = PCG64

        # cf Numpy-discussion list, K.~Sheppard, R.~Kern, June 29,
2020 and below
        # https://mail.python.org/pipermail/numpy-discussion/2020-June/080794.html
        entropy, num = seed
        seed_seq = SeedSequence(entropy, spawn_key=(num,))
        py_gen = bitgen_kind(seed_seq)

        # store the python object to avoid it being garbage collected
        self.py_gen = py_gen

        capsule = py_gen.capsule
        self.rng = <bitgen_t *>PyCapsule_GetPointer(capsule, capsule_name)
        if not PyCapsule_IsValid(capsule, capsule_name):
            raise ValueError("Invalid pointer to anon_func_state")

        self.buf = np.empty(buf_size, dtype='float64')
        self._fill()

    @cython.boundscheck(False)
    @cython.wraparound(False)
    cdef void _fill(self) nogil:
        self.idx = 0
        for i in range(self.buf.shape[0]):
            self.buf[i] = self.rng.next_double(self.rng.state)

    @cython.boundscheck(False)
    @cython.wraparound(False)
    cdef double uniform(self) nogil:
        if self.idx >= self.buf.shape[0]:
            self._fill()
        cdef double value = self.buf[self.idx]
        self.idx += 1
        return value
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: locking np.random.Generator in a cython nogil context?

Evgeni Burovski
On Mon, Dec 14, 2020 at 4:46 PM Kevin Sheppard
<[hidden email]> wrote:
>
> You need to reacquire the gil, then you can get the lock and rerelease the gil.
>
> I think this works (on phone, so untested)
>
> with gil:
>     with nogil, lock:
>         ...

Thanks Kevin.
This surely works, but feels seriously weird. Is this the only / the
recommended way?
I can of course adjust the buffer size to amortize the time for the
GIL manipulation, but this really starts looking like a code smell.
My original motivation was to have inner simulation loops python-free.
Most likely, the issue is that I'm not using the Generator correctly?

Evgeni


> On Mon, Dec 14, 2020, 13:37 Evgeni Burovski <[hidden email]> wrote:
>>
>> Hi,
>>
>> What would be the correct way of locking the bit generator of
>> np.random.Generator in cython's nogil context?
>> (This is threading 101, surely, so please forgive my ignorance).
>>
>> The docs for extending np.random.Generator in cython
>> (https://numpy.org/doc/stable/reference/random/extending.html#cython)
>> recommend the following idiom for generating uniform variates, where
>> the GIL is released and a Generator-specific lock is held:
>>
>> x = PCG64()
>> rng = <bitgen_t *> PyCapsule_GetPointer(x.capsule, capsule_name)
>> with nogil, x.lock:
>>     rng.next_double(rng.state)
>>
>> What is the correct way of locking it when already in the nogil
>> section (so that x.lock is not accessible)?
>>
>> The use case is a long-running MC process which generates random
>> variates in a tight loop (so the loop is all nogil). In case it
>> matters, I probably won't be using python threads, but may use
>> multiprocessing.
>>
>> Basically,
>>
>>     cdef double uniform(self) nogil:
>>         if self.idx >= self.buf.shape[0]:
>>             self._fill()
>>         cdef double value = self.buf[self.idx]
>>         self.idx += 1
>>         return value
>>
>>     cdef void _fill(self) nogil:
>>         self.idx = 0
>>         # HERE: Lock ?
>>         for i in range(self.buf.shape[0]):
>>             self.buf[i] = self.rng.next_double(self.rng.state)
>>
>>
>> Thanks,
>> Evgeni
>>
>>
>> P.S. The full cdef class, for completeness:
>>
>> cdef class RndmWrapper():
>>     cdef:
>>         double[::1] buf
>>         Py_ssize_t idx
>>         bitgen_t *rng
>>         object py_gen  # keep the garbage collector away
>>
>>    def __init__(self, seed=(1234, 0), buf_size=4096, bitgen_kind=None):
>>         if bitgen_kind is None:
>>             bitgen_kind = PCG64
>>
>>         # cf Numpy-discussion list, K.~Sheppard, R.~Kern, June 29,
>> 2020 and below
>>         # https://mail.python.org/pipermail/numpy-discussion/2020-June/080794.html
>>         entropy, num = seed
>>         seed_seq = SeedSequence(entropy, spawn_key=(num,))
>>         py_gen = bitgen_kind(seed_seq)
>>
>>         # store the python object to avoid it being garbage collected
>>         self.py_gen = py_gen
>>
>>         capsule = py_gen.capsule
>>         self.rng = <bitgen_t *>PyCapsule_GetPointer(capsule, capsule_name)
>>         if not PyCapsule_IsValid(capsule, capsule_name):
>>             raise ValueError("Invalid pointer to anon_func_state")
>>
>>         self.buf = np.empty(buf_size, dtype='float64')
>>         self._fill()
>>
>>     @cython.boundscheck(False)
>>     @cython.wraparound(False)
>>     cdef void _fill(self) nogil:
>>         self.idx = 0
>>         for i in range(self.buf.shape[0]):
>>             self.buf[i] = self.rng.next_double(self.rng.state)
>>
>>     @cython.boundscheck(False)
>>     @cython.wraparound(False)
>>     cdef double uniform(self) nogil:
>>         if self.idx >= self.buf.shape[0]:
>>             self._fill()
>>         cdef double value = self.buf[self.idx]
>>         self.idx += 1
>>         return value
>> _______________________________________________
>> NumPy-Discussion mailing list
>> [hidden email]
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: locking np.random.Generator in a cython nogil context?

bashtage

I don’t think it that strange.  You always need the GIL before you interact with Python code. The lock is a python object, and so you need the GIL.

 

You could redesign your code to always use different bit generators so that you would never use the same one, in which case you wouldn’t need to worry about the lock.

 

I also think that the lock only matters for Multithreaded code not Multiprocess.  I believe the latter pickles and unpickles any Generator object (and the underying BitGenerator) and so each process has its own version.  Note that when multiprocessing the recommended procedure is to use spawn() to generate a sequence of BitGenerators and to use a distinct BitGenerator in each process. If you do this then you are free from the lock.

 

Kevin

 

 

From: [hidden email]
Sent: Monday, December 14, 2020 2:10 PM
To: [hidden email]
Subject: Re: [Numpy-discussion] locking np.random.Generator in a cython nogil context?

 

On Mon, Dec 14, 2020 at 4:46 PM Kevin Sheppard

<[hidden email]> wrote:

> 

> You need to reacquire the gil, then you can get the lock and rerelease the gil.

> 

> I think this works (on phone, so untested)

> 

> with gil:

>     with nogil, lock:

>         ..

 

Thanks Kevin.

This surely works, but feels seriously weird. Is this the only / the

recommended way?

I can of course adjust the buffer size to amortize the time for the

GIL manipulation, but this really starts looking like a code smell.

My original motivation was to have inner simulation loops python-free.

Most likely, the issue is that I'm not using the Generator correctly?

 

Evgeni

 

 

> On Mon, Dec 14, 2020, 13:37 Evgeni Burovski <[hidden email]> wrote:

>> 

>> Hi,

>> 

>> What would be the correct way of locking the bit generator of

>> np.random.Generator in cython's nogil context?

>> (This is threading 101, surely, so please forgive my ignorance).

>> 

>> The docs for extending np.random.Generator in cython

>> (https://numpy.org/doc/stable/reference/random/extending.html#cython)

>> recommend the following idiom for generating uniform variates, where

>> the GIL is released and a Generator-specific lock is held:

>> 

>> x = PCG64()

>> rng = <bitgen_t *> PyCapsule_GetPointer(x.capsule, capsule_name)

>> with nogil, x.lock:

>>     rng.next_double(rng.state)

>> 

>> What is the correct way of locking it when already in the nogil

>> section (so that x.lock is not accessible)?

>> 

>> The use case is a long-running MC process which generates random

>> variates in a tight loop (so the loop is all nogil). In case it

>> matters, I probably won't be using python threads, but may use

>> multiprocessing.

>> 

>> Basically,

>> 

>>     cdef double uniform(self) nogil:

>>         if self.idx >= self.buf.shape[0]:

>>             self._fill()

>>         cdef double value = self.buf[self.idx]

>>         self.idx += 1

>>         return value

>> 

>>     cdef void _fill(self) nogil:

>>         self.idx = 0

>>         # HERE: Lock ?

>>         for i in range(self.buf.shape[0]):

>>             self.buf[i] = self.rng.next_double(self.rng.state)

>> 

>> 

>> Thanks,

>> Evgeni

>> 

>> 

>> P.S. The full cdef class, for completeness:

>> 

>> cdef class RndmWrapper():

>>     cdef:

>>         double[::1] buf

>>         Py_ssize_t idx

>>         bitgen_t *rng

>>         object py_gen  # keep the garbage collector away

>> 

>>    def __init__(self, seed=(1234, 0), buf_size=4096, bitgen_kind=None):

>>         if bitgen_kind is None:

>>             bitgen_kind = PCG64

>> 

>>         # cf Numpy-discussion list, K.~Sheppard, R.~Kern, June 29,

>> 2020 and below

>>         # https://mail.python.org/pipermail/numpy-discussion/2020-June/080794.html

>>         entropy, num = seed

>>         seed_seq = SeedSequence(entropy, spawn_key=(num,))

>>         py_gen = bitgen_kind(seed_seq)

>> 

>>         # store the python object to avoid it being garbage collected

>>         self.py_gen = py_gen

>> 

>>         capsule = py_gen.capsule

>>         self.rng = <bitgen_t *>PyCapsule_GetPointer(capsule, capsule_name)

>>         if not PyCapsule_IsValid(capsule, capsule_name):

>>             raise ValueError("Invalid pointer to anon_func_state")

>> 

>>         self.buf = np.empty(buf_size, dtype='float64')

>>         self._fill()

>> 

>>     @cython.boundscheck(False)

>>     @cython.wraparound(False)

>>     cdef void _fill(self) nogil:

>>         self.idx = 0

>>         for i in range(self.buf.shape[0]):

>>             self.buf[i] = self.rng.next_double(self.rng.state)

>> 

>>     @cython.boundscheck(False)

>>     @cython.wraparound(False)

>>     cdef double uniform(self) nogil:

>>         if self.idx >= self.buf.shape[0]:

>>             self._fill()

>>         cdef double value = self.buf[self.idx]

>>         self.idx += 1

>>         return value

>> _______________________________________________

>> NumPy-Discussion mailing list

>> [hidden email]

>> https://mail.python.org/mailman/listinfo/numpy-discussion

> 

> _______________________________________________

> NumPy-Discussion mailing list

> [hidden email]

> https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________

NumPy-Discussion mailing list

[hidden email]

https://mail.python.org/mailman/listinfo/numpy-discussion

 


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: locking np.random.Generator in a cython nogil context?

Evgeni Burovski
<snip>

> I also think that the lock only matters for Multithreaded code not Multiprocess.  I believe the latter pickles and unpickles any Generator object (and the underying BitGenerator) and so each process has its own version.  Note that when multiprocessing the recommended procedure is to use spawn() to generate a sequence of BitGenerators and to use a distinct BitGenerator in each process. If you do this then you are free from the lock.

Thanks. Just to confirm: does using SeedSequence spawn_key arg
generate distinct BitGenerators? As in

cdef class Wrapper():
    def __init__(self, seed):
        entropy, num = seed
        py_gen = PCG64(SeedSequence(entropy, spawn_key=(spawn_key,)))
        self.rng = <bitgen_t *>
py_gen.capsule.PyCapsule_GetPointer(capsule, "BitGenerator")    # <---
this

cdef Wrapper rng_0 = Wrapper(seed=(123, 0))
cdef Wrapper rng_1 = Wrapper(seed=(123, 1))

And then,of these two objects, do they have distinct BitGenerators?

Evgeni


>
>
> Kevin
>
>
>
>
>
> From: Evgeni Burovski
> Sent: Monday, December 14, 2020 2:10 PM
> To: Discussion of Numerical Python
> Subject: Re: [Numpy-discussion] locking np.random.Generator in a cython nogil context?
>
>
>
> On Mon, Dec 14, 2020 at 4:46 PM Kevin Sheppard
>
> <[hidden email]> wrote:
>
> >
>
> > You need to reacquire the gil, then you can get the lock and rerelease the gil.
>
> >
>
> > I think this works (on phone, so untested)
>
> >
>
> > with gil:
>
> >     with nogil, lock:
>
> >         ..
>
>
>
> Thanks Kevin.
>
> This surely works, but feels seriously weird. Is this the only / the
>
> recommended way?
>
> I can of course adjust the buffer size to amortize the time for the
>
> GIL manipulation, but this really starts looking like a code smell.
>
> My original motivation was to have inner simulation loops python-free.
>
> Most likely, the issue is that I'm not using the Generator correctly?
>
>
>
> Evgeni
>
>
>
>
>
> > On Mon, Dec 14, 2020, 13:37 Evgeni Burovski <[hidden email]> wrote:
>
> >>
>
> >> Hi,
>
> >>
>
> >> What would be the correct way of locking the bit generator of
>
> >> np.random.Generator in cython's nogil context?
>
> >> (This is threading 101, surely, so please forgive my ignorance).
>
> >>
>
> >> The docs for extending np.random.Generator in cython
>
> >> (https://numpy.org/doc/stable/reference/random/extending.html#cython)
>
> >> recommend the following idiom for generating uniform variates, where
>
> >> the GIL is released and a Generator-specific lock is held:
>
> >>
>
> >> x = PCG64()
>
> >> rng = <bitgen_t *> PyCapsule_GetPointer(x.capsule, capsule_name)
>
> >> with nogil, x.lock:
>
> >>     rng.next_double(rng.state)
>
> >>
>
> >> What is the correct way of locking it when already in the nogil
>
> >> section (so that x.lock is not accessible)?
>
> >>
>
> >> The use case is a long-running MC process which generates random
>
> >> variates in a tight loop (so the loop is all nogil). In case it
>
> >> matters, I probably won't be using python threads, but may use
>
> >> multiprocessing.
>
> >>
>
> >> Basically,
>
> >>
>
> >>     cdef double uniform(self) nogil:
>
> >>         if self.idx >= self.buf.shape[0]:
>
> >>             self._fill()
>
> >>         cdef double value = self.buf[self.idx]
>
> >>         self.idx += 1
>
> >>         return value
>
> >>
>
> >>     cdef void _fill(self) nogil:
>
> >>         self.idx = 0
>
> >>         # HERE: Lock ?
>
> >>         for i in range(self.buf.shape[0]):
>
> >>             self.buf[i] = self.rng.next_double(self.rng.state)
>
> >>
>
> >>
>
> >> Thanks,
>
> >> Evgeni
>
> >>
>
> >>
>
> >> P.S. The full cdef class, for completeness:
>
> >>
>
> >> cdef class RndmWrapper():
>
> >>     cdef:
>
> >>         double[::1] buf
>
> >>         Py_ssize_t idx
>
> >>         bitgen_t *rng
>
> >>         object py_gen  # keep the garbage collector away
>
> >>
>
> >>    def __init__(self, seed=(1234, 0), buf_size=4096, bitgen_kind=None):
>
> >>         if bitgen_kind is None:
>
> >>             bitgen_kind = PCG64
>
> >>
>
> >>         # cf Numpy-discussion list, K.~Sheppard, R.~Kern, June 29,
>
> >> 2020 and below
>
> >>         # https://mail.python.org/pipermail/numpy-discussion/2020-June/080794.html
>
> >>         entropy, num = seed
>
> >>         seed_seq = SeedSequence(entropy, spawn_key=(num,))
>
> >>         py_gen = bitgen_kind(seed_seq)
>
> >>
>
> >>         # store the python object to avoid it being garbage collected
>
> >>         self.py_gen = py_gen
>
> >>
>
> >>         capsule = py_gen.capsule
>
> >>         self.rng = <bitgen_t *>PyCapsule_GetPointer(capsule, capsule_name)
>
> >>         if not PyCapsule_IsValid(capsule, capsule_name):
>
> >>             raise ValueError("Invalid pointer to anon_func_state")
>
> >>
>
> >>         self.buf = np.empty(buf_size, dtype='float64')
>
> >>         self._fill()
>
> >>
>
> >>     @cython.boundscheck(False)
>
> >>     @cython.wraparound(False)
>
> >>     cdef void _fill(self) nogil:
>
> >>         self.idx = 0
>
> >>         for i in range(self.buf.shape[0]):
>
> >>             self.buf[i] = self.rng.next_double(self.rng.state)
>
> >>
>
> >>     @cython.boundscheck(False)
>
> >>     @cython.wraparound(False)
>
> >>     cdef double uniform(self) nogil:
>
> >>         if self.idx >= self.buf.shape[0]:
>
> >>             self._fill()
>
> >>         cdef double value = self.buf[self.idx]
>
> >>         self.idx += 1
>
> >>         return value
>
> >> _______________________________________________
>
> >> NumPy-Discussion mailing list
>
> >> [hidden email]
>
> >> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> >
>
> > _______________________________________________
>
> > NumPy-Discussion mailing list
>
> > [hidden email]
>
> > https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
>
> NumPy-Discussion mailing list
>
> [hidden email]
>
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: locking np.random.Generator in a cython nogil context?

Robert Kern-2
On Mon, Dec 14, 2020 at 3:27 PM Evgeni Burovski <[hidden email]> wrote:
<snip>

> I also think that the lock only matters for Multithreaded code not Multiprocess.  I believe the latter pickles and unpickles any Generator object (and the underying BitGenerator) and so each process has its own version.  Note that when multiprocessing the recommended procedure is to use spawn() to generate a sequence of BitGenerators and to use a distinct BitGenerator in each process. If you do this then you are free from the lock.

Thanks. Just to confirm: does using SeedSequence spawn_key arg
generate distinct BitGenerators? As in

cdef class Wrapper():
    def __init__(self, seed):
        entropy, num = seed
        py_gen = PCG64(SeedSequence(entropy, spawn_key=(spawn_key,)))
        self.rng = <bitgen_t *>
py_gen.capsule.PyCapsule_GetPointer(capsule, "BitGenerator")    # <---
this

cdef Wrapper rng_0 = Wrapper(seed=(123, 0))
cdef Wrapper rng_1 = Wrapper(seed=(123, 1))

And then,of these two objects, do they have distinct BitGenerators?

The code you wrote doesn't work (`spawn_key` is never assigned). I can guess what you meant to write, though, and yes, you would get distinct `BitGenerator`s. However, I do not recommend using `spawn_key` explicitly. The `SeedSequence.spawn()` method internally keeps track of how many children it has spawned and uses that to construct the `spawn_key`s for its subsequent children. If you play around with making your own `spawn_key`s, then the parent `SeedSequence(entropy)` might spawn identical `SeedSequence`s to the ones you constructed.

If you don't want to use the `spawn()` API to construct the separate `SeedSequence`s but still want to incorporate some per-process information into the seeds (e.g. the 0 and 1 in your example), then note that a tuple of integers is a valid value for the `entropy` argument. You can have the first item be the same (i.e. per-run information) and the second item be a per-process ID or counter.

cdef class Wrapper():
    def __init__(self, seed):
        py_gen = PCG64(SeedSequence(seed))
        self.rng = <bitgen_t *>py_gen.capsule.PyCapsule_GetPointer(capsule, "BitGenerator")

cdef Wrapper rng_0 = Wrapper(seed=(123, 0))
cdef Wrapper rng_1 = Wrapper(seed=(123, 1))

--
Robert Kern

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: locking np.random.Generator in a cython nogil context?

Evgeni Burovski
On Tue, Dec 15, 2020 at 1:00 AM Robert Kern <[hidden email]> wrote:

>
> On Mon, Dec 14, 2020 at 3:27 PM Evgeni Burovski <[hidden email]> wrote:
>>
>> <snip>
>>
>> > I also think that the lock only matters for Multithreaded code not Multiprocess.  I believe the latter pickles and unpickles any Generator object (and the underying BitGenerator) and so each process has its own version.  Note that when multiprocessing the recommended procedure is to use spawn() to generate a sequence of BitGenerators and to use a distinct BitGenerator in each process. If you do this then you are free from the lock.
>>
>> Thanks. Just to confirm: does using SeedSequence spawn_key arg
>> generate distinct BitGenerators? As in
>>
>> cdef class Wrapper():
>>     def __init__(self, seed):
>>         entropy, num = seed
>>         py_gen = PCG64(SeedSequence(entropy, spawn_key=(spawn_key,)))
>>         self.rng = <bitgen_t *>
>> py_gen.capsule.PyCapsule_GetPointer(capsule, "BitGenerator")    # <---
>> this
>>
>> cdef Wrapper rng_0 = Wrapper(seed=(123, 0))
>> cdef Wrapper rng_1 = Wrapper(seed=(123, 1))
>>
>> And then,of these two objects, do they have distinct BitGenerators?
>
>
> The code you wrote doesn't work (`spawn_key` is never assigned). I can guess what you meant to write, though, and yes, you would get distinct `BitGenerator`s. However, I do not recommend using `spawn_key` explicitly. The `SeedSequence.spawn()` method internally keeps track of how many children it has spawned and uses that to construct the `spawn_key`s for its subsequent children. If you play around with making your own `spawn_key`s, then the parent `SeedSequence(entropy)` might spawn identical `SeedSequence`s to the ones you constructed.
>
> If you don't want to use the `spawn()` API to construct the separate `SeedSequence`s but still want to incorporate some per-process information into the seeds (e.g. the 0 and 1 in your example), then note that a tuple of integers is a valid value for the `entropy` argument. You can have the first item be the same (i.e. per-run information) and the second item be a per-process ID or counter.
>
> cdef class Wrapper():
>     def __init__(self, seed):
>         py_gen = PCG64(SeedSequence(seed))
>         self.rng = <bitgen_t *>py_gen.capsule.PyCapsule_GetPointer(capsule, "BitGenerator")
>
> cdef Wrapper rng_0 = Wrapper(seed=(123, 0))
> cdef Wrapper rng_1 = Wrapper(seed=(123, 1))


Thanks Robert!

I indeed typo'd the spawn_key, and indeed the intention is exactly to
include a worker_id into a seed to make sure each worker gets a
separate stream.

The use of the spawn_key was --- as I now finally realize --- a
misunderstanding of your and Kevin's previous replies in
https://mail.python.org/pipermail/numpy-discussion/2020-July/080833.html

So I'm moving my project to use the `SeedSequence((base_seed,
worker_id))` API --- thanks!

Just as a side note, this is not very prominent in the docs, and I'm
ready to volunteer to send a doc PR --- I'm only not sure which part
of the docs, and would appreciate a pointer.
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: locking np.random.Generator in a cython nogil context?

mattip
Administrator

On 12/17/20 11:47 AM, Evgeni Burovski wrote:
> Just as a side note, this is not very prominent in the docs, and I'm
> ready to volunteer to send a doc PR --- I'm only not sure which part
> of the docs, and would appreciate a pointer.

Maybe here

https://numpy.org/devdocs/reference/random/bit_generators/index.html#seeding-and-entropy

which is here in the sources

https://github.com/numpy/numpy/blob/master/doc/source/reference/random/bit_generators/index.rst#seeding-and-entropy


And/or in the SeedSequence docstring documentation

https://numpy.org/devdocs/reference/random/bit_generators/generated/numpy.random.SeedSequence.html#numpy.random.SeedSequence

which is here in the sources

https://github.com/numpy/numpy/blob/master/numpy/random/bit_generator.pyx#L255


Matti



_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: locking np.random.Generator in a cython nogil context?

Evgeni Burovski
On Thu, Dec 17, 2020 at 1:01 PM Matti Picus <[hidden email]> wrote:

>
>
> On 12/17/20 11:47 AM, Evgeni Burovski wrote:
> > Just as a side note, this is not very prominent in the docs, and I'm
> > ready to volunteer to send a doc PR --- I'm only not sure which part
> > of the docs, and would appreciate a pointer.
>
> Maybe here
>
> https://numpy.org/devdocs/reference/random/bit_generators/index.html#seeding-and-entropy
>
> which is here in the sources
>
> https://github.com/numpy/numpy/blob/master/doc/source/reference/random/bit_generators/index.rst#seeding-and-entropy
>
>
> And/or in the SeedSequence docstring documentation
>
> https://numpy.org/devdocs/reference/random/bit_generators/generated/numpy.random.SeedSequence.html#numpy.random.SeedSequence
>
> which is here in the sources
>
> https://github.com/numpy/numpy/blob/master/numpy/random/bit_generator.pyx#L255


Here's the PR, https://github.com/numpy/numpy/pull/18014

Two minor comments, both OT for the PR:

1. The recommendation to seed the generators from the OS --- I've been
bitten by exactly this once. That was a rather exotic combination of a
vendor RNG and a batch queueing system, and some of my runs did end up
with identical random streams. Given that the recommendation is what
it is, it probably means that experience is a singular point and it no
longer happens with modern generators.

2. Robert's comment that `SeedSequence(..., spawn_key=(num,))`  is not
equivalent to `SeedSequence(...).spawn(num)[num]` and that the former
is not recommended. I'm not questioning the recommendation, but then
__repr__ seems to suggest the equivalence:

In [2]: from numpy.random import PCG64, SeedSequence

In [3]: base_seq = SeedSequence(1234)

In [4]: base_seq.spawn(8)
Out[4]:
[SeedSequence(
     entropy=1234,
     spawn_key=(0,),
 ),

<snip>

Evgeni
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: locking np.random.Generator in a cython nogil context?

Robert Kern-2
On Thu, Dec 17, 2020 at 9:56 AM Evgeni Burovski <[hidden email]> wrote:
On Thu, Dec 17, 2020 at 1:01 PM Matti Picus <[hidden email]> wrote:
>
>
> On 12/17/20 11:47 AM, Evgeni Burovski wrote:
> > Just as a side note, this is not very prominent in the docs, and I'm
> > ready to volunteer to send a doc PR --- I'm only not sure which part
> > of the docs, and would appreciate a pointer.
>
> Maybe here
>
> https://numpy.org/devdocs/reference/random/bit_generators/index.html#seeding-and-entropy
>
> which is here in the sources
>
> https://github.com/numpy/numpy/blob/master/doc/source/reference/random/bit_generators/index.rst#seeding-and-entropy
>
>
> And/or in the SeedSequence docstring documentation
>
> https://numpy.org/devdocs/reference/random/bit_generators/generated/numpy.random.SeedSequence.html#numpy.random.SeedSequence
>
> which is here in the sources
>
> https://github.com/numpy/numpy/blob/master/numpy/random/bit_generator.pyx#L255


Here's the PR, https://github.com/numpy/numpy/pull/18014

Two minor comments, both OT for the PR:

1. The recommendation to seed the generators from the OS --- I've been
bitten by exactly this once. That was a rather exotic combination of a
vendor RNG and a batch queueing system, and some of my runs did end up
with identical random streams. Given that the recommendation is what
it is, it probably means that experience is a singular point and it no
longer happens with modern generators.

I suspect the vendor RNG was rolling its own entropy using time. We use `secrets.getrandbits()`, which ultimately uses the best cryptographic entropy source available. And if there is no cryptographic entropy source available, I think we fail hard instead of falling back to less reliable things like time. I'm not entirely sure that's a feature, but it is safe!
 
2. Robert's comment that `SeedSequence(..., spawn_key=(num,))`  is not
equivalent to `SeedSequence(...).spawn(num)[num]` and that the former
is not recommended. I'm not questioning the recommendation, but then
__repr__ seems to suggest the equivalence:

I was saying that they were equivalent. That's precisely why it's not recommended: it's too easy to do both and get identical streams inadvertently.
 
--
Robert Kern

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion