Question about optimizing random_standard_normal

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

Question about optimizing random_standard_normal

camel-cdr
I tried to implement a different implementation of the ziggurat method for generating standard normal distributions that is about twice as fast and uses 2/3 of the memory than the old one.
I tested the implementation separately and am very confident it's correct, but it does fail 28 test in coverage testing.
Checking the testing code I found out that all the failed tests are inside TestRandomDist which has the goal of "Make[ing] sure the random distribution returns the correct value for a given seed". Why would this be needed?
The only explanation I can come up with is that it's standard_normal is, in regards to seeding, required to be backwards compatible. If that's the case how would, could one even implement a new algorithm?

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Question about optimizing random_standard_normal

Tom Swirly
Well, I can tell you why it needs to be backward compatible!  I use random numbers fairly frequently, and to unit test them I set a specific seed and then make sure I get the same answers.

If your change went in (and I were using numpy normal distributions, which I am not) then my tests would break.

Particularly, you'd have the unfixable problem that it would be impossible to write tests for your code that worked regardless of the version of numpy that was installed.

Yes, I agree that in your use case, this is powerfully unfortunate, and prevents you from making a change that would otherwise benefit everyone.

The three ways to do this would be the following:
  • Add a new parameter to the function call, say, faster=False, which you set True to get the new behavior
  • Add a global flag somewhere you set to get the new behavior everywhere
  • Create a new function called normal_faster or some such
All of these are ugly for obvious reasons.

On Sat, Feb 6, 2021 at 10:33 AM <[hidden email]> wrote:
I tried to implement a different implementation of the ziggurat method for generating standard normal distributions that is about twice as fast and uses 2/3 of the memory than the old one.
I tested the implementation separately and am very confident it's correct, but it does fail 28 test in coverage testing.
Checking the testing code I found out that all the failed tests are inside TestRandomDist which has the goal of "Make[ing] sure the random distribution returns the correct value for a given seed". Why would this be needed?
The only explanation I can come up with is that it's standard_normal is, in regards to seeding, required to be backwards compatible. If that's the case how would, could one even implement a new algorithm?
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion


--

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Question about optimizing random_standard_normal

bashtage
In reply to this post by camel-cdr
Have you benchmarked it using the generator interface? The structure of this as a no monolithic generator makes it a good deal slower than generating in straight C (with everything inline).  While I'm not sure a factor of 2 is enough to justify a change (for me 10x, 1.2x is not but I don't know where the cutoff is).  

Can you post benchmarks from using it through Generator?

Also, those tests would be replaced with new values if the patch was accepted, so don't worry about them.  

Kevin


On Sat, Feb 6, 2021, 09:32 <[hidden email]> wrote:
I tried to implement a different implementation of the ziggurat method for generating standard normal distributions that is about twice as fast and uses 2/3 of the memory than the old one.
I tested the implementation separately and am very confident it's correct, but it does fail 28 test in coverage testing.
Checking the testing code I found out that all the failed tests are inside TestRandomDist which has the goal of "Make[ing] sure the random distribution returns the correct value for a given seed". Why would this be needed?
The only explanation I can come up with is that it's standard_normal is, in regards to seeding, required to be backwards compatible. If that's the case how would, could one even implement a new algorithm?
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Question about optimizing random_standard_normal

camel-cdr
Well, I can tell you why it needs to be backward compatible!  I use random numbers fairly frequently, and to unit test them I set a specific seed and then make sure I get the same answers.
Hmm, I guess that makes sense. I tried to adjust my algorithms to do the same thing with the same bit's as the original one, but I couldn't get it to work.

Have you benchmarked it using the generator interface? The structure of this as a no monolithic generator makes it a good deal slower than generating in straight C (with everything inline).  While I'm not sure a factor of 2 is enough to justify a change (for me 10x, 1.2x is not but I don't know where the cutoff is).  

I originally benchmarked my implementation against a bunch of other ones in c (because I was developing a c library https://github.com/camel-cdr/cauldron/blob/main/cauldron/random.h#L1928).
But I did run the built-in benchmark: ./runtests.py --bench bench_random.RNG.time_normal_zig and the results are:

              new           old
PCG64      589±3μs     1.06±0.03ms
MT19937     985±4μs     1.44±0.01ms
Philox     981±30μs    1.39±0.01ms
SFC64      508±4μs       900±4μs
numpy    2.99±0.06ms   2.98±0.01ms # no change for /dev/urandom


I'm not yet 100% certain about the implementations, but I attached a diff of my current progress.


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

ziggurat.diff (79K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Question about optimizing random_standard_normal

Charles R Harris


On Sat, Feb 6, 2021 at 5:27 AM <[hidden email]> wrote:
Well, I can tell you why it needs to be backward compatible!  I use random numbers fairly frequently, and to unit test them I set a specific seed and then make sure I get the same answers.
Hmm, I guess that makes sense. I tried to adjust my algorithms to do the same thing with the same bit's as the original one, but I couldn't get it to work.

Have you benchmarked it using the generator interface? The structure of this as a no monolithic generator makes it a good deal slower than generating in straight C (with everything inline).  While I'm not sure a factor of 2 is enough to justify a change (for me 10x, 1.2x is not but I don't know where the cutoff is).  

I originally benchmarked my implementation against a bunch of other ones in c (because I was developing a c library https://github.com/camel-cdr/cauldron/blob/main/cauldron/random.h#L1928).
But I did run the built-in benchmark: ./runtests.py --bench bench_random.RNG.time_normal_zig and the results are:

              new           old
PCG64      589±3μs     1.06±0.03ms
MT19937     985±4μs     1.44±0.01ms
Philox     981±30μs    1.39±0.01ms
SFC64      508±4μs       900±4μs
numpy    2.99±0.06ms   2.98±0.01ms # no change for /dev/urandom


I'm not yet 100% certain about the implementations, but I attached a diff of my current progress.


You can actually get rid of the loop entirely and implement the exponential function directly by using an exponential bound on the bottom ziggurat block ends. It just requires a slight change in the block boundaries.

Chuck 

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Question about optimizing random_standard_normal

Robert Kern-2
In reply to this post by camel-cdr
On Sat, Feb 6, 2021 at 7:27 AM <[hidden email]> wrote:
Well, I can tell you why it needs to be backward compatible!  I use random numbers fairly frequently, and to unit test them I set a specific seed and then make sure I get the same answers.
Hmm, I guess that makes sense. I tried to adjust my algorithms to do the same thing with the same bit's as the original one, but I couldn't get it to work.

To be clear, this is not our backwards compatibility policy for the methods that you have modified. Our policy is spelled out here:


The TestRandomDist suite of tests were adapted from the older RandomState (which is indeed frozen and not allowed to change algorithms). It's a mix of correctness tests that are valid regardless of the precise algorithm (does this method reject invalid arguments? do degenerate arguments yield the correct constant value?) and actual "has this algorithm changed unexpectedly?" tests. The former are the most valuable, but the latter are useful for testing in cross-platform contexts. Compilers and different CPUs can do naughty things sometimes, and we want the cross-platform differences to be minimal. When you do change an algorithm implementation for Generator, as you have done, you are expected to do thorough tests (offline, not in the unit tests) that it is correctly sampling from the target probability distribution, then once satisfied, change the hard-coded values in TestRandomDist to match whatever you are generating.

--
Robert Kern

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Question about optimizing random_standard_normal

camel-cdr
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Saturday, February 6, 2021 3:29 PM, Robert Kern <[hidden email]> wrote:

On Sat, Feb 6, 2021 at 7:27 AM <[hidden email]> wrote:
Well, I can tell you why it needs to be backward compatible!  I use random numbers fairly frequently, and to unit test them I set a specific seed and then make sure I get the same answers.
Hmm, I guess that makes sense. I tried to adjust my algorithms to do the same thing with the same bit's as the original one, but I couldn't get it to work.

To be clear, this is not our backwards compatibility policy for the methods that you have modified. Our policy is spelled out here:


The TestRandomDist suite of tests were adapted from the older RandomState (which is indeed frozen and not allowed to change algorithms). It's a mix of correctness tests that are valid regardless of the precise algorithm (does this method reject invalid arguments? do degenerate arguments yield the correct constant value?) and actual "has this algorithm changed unexpectedly?" tests. The former are the most valuable, but the latter are useful for testing in cross-platform contexts. Compilers and different CPUs can do naughty things sometimes, and we want the cross-platform differences to be minimal. When you do change an algorithm implementation for Generator, as you have done, you are expected to do thorough tests (offline, not in the unit tests) that it is correctly sampling from the target probability distribution, then once satisfied, change the hard-coded values in TestRandomDist to match whatever you are generating.

--
Robert Kern

Ok, cool, that basically explains a lot.

When you do change an algorithm implementation for Generator, as you have done, you are expected to do thorough tests (offline, not in the unit tests) that it is correctly sampling from the target probability distribution, then once satisfied, change the hard-coded values in TestRandomDist to match whatever you are generating.

I'm probably not versed enough in statistics to do thorough testing. I used the testing in https://www.seehuhn.de/pages/ziggurat and plotting histograms to verify correctness, that probably won't be sufficient.

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Question about optimizing random_standard_normal

bashtage
In reply to this post by camel-cdr
If I understand correctly, there is no gain when applying this patch to Generator.  I'm not that surprised that this is the case since the compiler is much more limited in when it can do in Generator than what it can when filling a large array directly with functions available for inlining and unrolling. Again, if I understand correctly I think it will be difficult to justify breaking the stream for a negligible gain in performance.

Kevin


On Sat, Feb 6, 2021 at 12:27 PM <[hidden email]> wrote:
Well, I can tell you why it needs to be backward compatible!  I use random numbers fairly frequently, and to unit test them I set a specific seed and then make sure I get the same answers.
Hmm, I guess that makes sense. I tried to adjust my algorithms to do the same thing with the same bit's as the original one, but I couldn't get it to work.

Have you benchmarked it using the generator interface? The structure of this as a no monolithic generator makes it a good deal slower than generating in straight C (with everything inline).  While I'm not sure a factor of 2 is enough to justify a change (for me 10x, 1.2x is not but I don't know where the cutoff is).  

I originally benchmarked my implementation against a bunch of other ones in c (because I was developing a c library https://github.com/camel-cdr/cauldron/blob/main/cauldron/random.h#L1928).
But I did run the built-in benchmark: ./runtests.py --bench bench_random.RNG.time_normal_zig and the results are:

              new           old
PCG64      589±3μs     1.06±0.03ms
MT19937     985±4μs     1.44±0.01ms
Philox     981±30μs    1.39±0.01ms
SFC64      508±4μs       900±4μs
numpy    2.99±0.06ms   2.98±0.01ms # no change for /dev/urandom


I'm not yet 100% certain about the implementations, but I attached a diff of my current progress.

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Question about optimizing random_standard_normal

Robert Kern-2
On Mon, Feb 8, 2021 at 3:05 AM Kevin Sheppard <[hidden email]> wrote:
If I understand correctly, there is no gain when applying this patch to Generator.  I'm not that surprised that this is the case since the compiler is much more limited in when it can do in Generator than what it can when filling a large array directly with functions available for inlining and unrolling. Again, if I understand correctly I think it will be difficult to justify breaking the stream for a negligible gain in performance.

Can you explain your understanding of the benchmark results? To me, it looks like nearly a 2x improvement with the faster BitGenerators (our default PCG64 and SFC64). That may or may not worth breaking the stream, but it's far from negligible.
But I did run the built-in benchmark: ./runtests.py --bench bench_random.RNG.time_normal_zig and the results are:

              new           old
PCG64      589±3μs     1.06±0.03ms
MT19937     985±4μs     1.44±0.01ms
Philox     981±30μs    1.39±0.01ms
SFC64      508±4μs       900±4μs
numpy    2.99±0.06ms   2.98±0.01ms # no change for /dev/urandom
 
--
Robert Kern

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Question about optimizing random_standard_normal

bashtage

My reading is that the first 4 are pure C, presumably using the standard practice of inclining so as to make the tightest loop possible, and to allow the compiler to make other optimizations.  The final line is what happens when you replace the existing ziggurat in NumPy with the new one. I read it this way since it has both “new” and “old” with numpy. If it isn’t this, then I’m unsure what “new” and “old” could mean in the context of this thread.

 

I suppose camel-cdr can clarify what was actually done.

 

Kevin

 

 

From: [hidden email]
Sent: Monday, February 8, 2021 3:41 PM
To: [hidden email]
Subject: Re: [Numpy-discussion] Question about optimizing random_standard_normal

 

On Mon, Feb 8, 2021 at 3:05 AM Kevin Sheppard <[hidden email]> wrote:

If I understand correctly, there is no gain when applying this patch to Generator.  I'm not that surprised that this is the case since the compiler is much more limited in when it can do in Generator than what it can when filling a large array directly with functions available for inlining and unrolling. Again, if I understand correctly I think it will be difficult to justify breaking the stream for a negligible gain in performance.

 

Can you explain your understanding of the benchmark results? To me, it looks like nearly a 2x improvement with the faster BitGenerators (our default PCG64 and SFC64). That may or may not worth breaking the stream, but it's far from negligible.

But I did run the built-in benchmark: ./runtests.py --bench bench_random.RNG.time_normal_zig and the results are:

 

              new           old

PCG64      589±3μs     1.06±0.03ms

MT19937     985±4μs     1.44±0.01ms

Philox     981±30μs    1.39±0.01ms

SFC64      508±4μs       900±4μs

numpy    2.99±0.06ms   2.98±0.01ms # no change for /dev/urandom

 

--

Robert Kern

 


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Question about optimizing random_standard_normal

Robert Kern-2
On Mon, Feb 8, 2021 at 10:53 AM Kevin Sheppard <[hidden email]> wrote:

My reading is that the first 4 are pure C, presumably using the standard practice of inclining so as to make the tightest loop possible, and to allow the compiler to make other optimizations.  The final line is what happens when you replace the existing ziggurat in NumPy with the new one. I read it this way since it has both “new” and “old” with numpy. If it isn’t this, then I’m unsure what “new” and “old” could mean in the context of this thread.


No, these are our benchmarks of `Generator`. `numpy` is testing `RandomState`, which wasn't touched by their contribution.

 

I suppose camel-cdr can clarify what was actually done.


But I did run the built-in benchmark: ./runtests.py --bench bench_random.RNG.time_normal_zig and the results are:

 
--
Robert Kern

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Question about optimizing random_standard_normal

bashtage

That is good news indeed.  Seems like a good upgrade, especially given the breadth of application of normals and the multiple appearances within distributions.c (e.g., Cauchy). Is there a deprecation for a change like this? Or is it just a note and new random numbers in the next major?  I think this is the first time a substantially new algo has replaced an existing one.

 

Kevin

 

 

From: [hidden email]
Sent: Monday, February 8, 2021 4:06 PM
To: [hidden email]
Subject: Re: [Numpy-discussion] Question about optimizing random_standard_normal

 

On Mon, Feb 8, 2021 at 10:53 AM Kevin Sheppard <[hidden email]> wrote:

My reading is that the first 4 are pure C, presumably using the standard practice of inclining so as to make the tightest loop possible, and to allow the compiler to make other optimizations.  The final line is what happens when you replace the existing ziggurat in NumPy with the new one. I read it this way since it has both “new” and “old” with numpy. If it isn’t this, then I’m unsure what “new” and “old” could mean in the context of this thread.

 

No, these are our benchmarks of `Generator`. `numpy` is testing `RandomState`, which wasn't touched by their contribution.

 

 

I suppose camel-cdr can clarify what was actually done.

 

But I did run the built-in benchmark: ./runtests.py --bench bench_random.RNG.time_normal_zig and the results are:

 

--

Robert Kern

 


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Question about optimizing random_standard_normal

Sebastian Berg
On Mon, 2021-02-08 at 16:36 +0000, Kevin Sheppard wrote:
> That is good news indeed.  Seems like a good upgrade, especially
> given the breadth of application of normals and the multiple
> appearances within distributions.c (e.g., Cauchy). Is there a
> deprecation for a change like this? Or is it just a note and new
> random numbers in the next major?  I think this is the first time a
> substantially new algo has replaced an existing one.
>  

I don't think we can deprecate or even warn about it, that would just
result in noise that cannot be silenced.
If we really think warnings are necessary, it sounds like you would
need an opt-in `numpy.random.set_warn_if_streams_will_change()`.
That sounds like diminishing returns on first sight.

It may be good that this happens now, rather than in a few years when
adoption of the new API is probably still on the low side.


This type of change should be in the release notes undoubtedly and
likely a `.. versionchanged::` directive in the docstring.

Maybe the best thing would be to create a single, prominent but brief,
changelog listing all (or almost all) stream changes? (Possibly instead
of documenting it in the individual function as `.. versionchanged::`)

I am thinking just a table with:
  * version changed
  * short description
  * affected functions
  * how the stream changed (if that is ever relevant)


Cheers,

Sebastian


> Kevin
>  
>  
> From: Robert Kern
> Sent: Monday, February 8, 2021 4:06 PM
> To: Discussion of Numerical Python
> Subject: Re: [Numpy-discussion] Question about optimizing
> random_standard_normal
>  
> On Mon, Feb 8, 2021 at 10:53 AM Kevin Sheppard <
> [hidden email]> wrote:
> > My reading is that the first 4 are pure C, presumably using the
> > standard practice of inclining so as to make the tightest loop
> > possible, and to allow the compiler to make other optimizations. 
> > The final line is what happens when you replace the existing
> > ziggurat in NumPy with the new one. I read it this way since it has
> > both “new” and “old” with numpy. If it isn’t this, then I’m unsure
> > what “new” and “old” could mean in the context of this thread.
>
>  
> No, these are our benchmarks of `Generator`. `numpy` is testing
> `RandomState`, which wasn't touched by their contribution.
>  
>   
> https://github.com/numpy/numpy/blob/master/benchmarks/benchmarks/bench_random.py#L93-L97
>  
> https://github.com/numpy/numpy/blob/master/benchmarks/benchmarks/bench_random.py#L123-L124
>  
> > I suppose camel-cdr can clarify what was actually done.
>
>  
> > > > > But I did run the built-in benchmark: ./runtests.py --bench
> > > > > bench_random.RNG.time_normal_zig and the results are:
>
>  
> --
> Robert Kern
>  
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Question about optimizing random_standard_normal

Robert Kern-2
In reply to this post by bashtage
On Mon, Feb 8, 2021 at 11:38 AM Kevin Sheppard <[hidden email]> wrote:

That is good news indeed.  Seems like a good upgrade, especially given the breadth of application of normals and the multiple appearances within distributions.c (e.g., Cauchy). Is there a deprecation for a change like this? Or is it just a note and new random numbers in the next major?  I think this is the first time a substantially new algo has replaced an existing one.


Per NEP 19, a change like this is a new feature that can be included in a feature release, like any other feature.

I would like to see some more testing on the quality of the sequences more than the ones already quoted. Using Kolmogorov-Smirnov and/or Anderson-Darling tests as well, which should be more thorough.


There are also some subtle issues involved in ziggurat method implementations that go beyond just the marginal distributions. I'm not sure, even, that our current implementation deals with the issues raised in the following paper, but I'd like to do no worse.


--
Robert Kern

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Question about optimizing random_standard_normal

Charles R Harris
In reply to this post by camel-cdr


On Sat, Feb 6, 2021 at 2:32 AM <[hidden email]> wrote:
I tried to implement a different implementation of the ziggurat method for generating standard normal distributions that is about twice as fast and uses 2/3 of the memory than the old one.
I tested the implementation separately and am very confident it's correct, but it does fail 28 test in coverage testing.
Checking the testing code I found out that all the failed tests are inside TestRandomDist which has the goal of "Make[ing] sure the random distribution returns the correct value for a given seed". Why would this be needed?
The only explanation I can come up with is that it's standard_normal is, in regards to seeding, required to be backwards compatible. If that's the case how would, could one even implement a new algorithm?

Just for fun, I've attached the (C++) implementation that uses the exponentially extended base block. Note that the constructor produces the table.

Chuck 

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

RandomNormal.hpp (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Question about optimizing random_standard_normal

Robert Kern-2
In reply to this post by Sebastian Berg
On Mon, Feb 8, 2021 at 12:10 PM Sebastian Berg <[hidden email]> wrote:

This type of change should be in the release notes undoubtedly and
likely a `.. versionchanged::` directive in the docstring.

Maybe the best thing would be to create a single, prominent but brief,
changelog listing all (or almost all) stream changes? (Possibly instead
of documenting it in the individual function as `.. versionchanged::`)

I am thinking just a table with:
  * version changed
  * short description
  * affected functions
  * how the stream changed (if that is ever relevant)

Both are probably useful. Good ideas.
 
--
Robert Kern

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion