Proposal - extend histograms api to allow uneven bins

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Proposal - extend histograms api to allow uneven bins

areeves87
Greetings,

I have a PR that warrants discussion according to @seberg. See https://github.com/numpy/numpy/pull/14278.

It is an enhancement that fixes a bug. The original bug is that when using the fd estimator on a dataset with small inter-quartile range and large outliers, the current codebase produces more bins than memory allows. There are several related bug reports (see #11879, #10297, #8203).

In terms of scope, I restricted my changes to conditions where np.histogram(bins='auto') defaults to the 'fd'.  For the actual fix, I actually enhanced the API. I used a suggestion from @eric-wieser to merge empty histogram bins. In practice this solves the outsized bins issue.

However @seberg is concerned that extending the API in this way may not be the way to go. For example, if you use "auto" once, and then re-use the bins, the uneven bins may not be what you want.

Furthermore @eric-wieser is concerned that there may be a floating-point devil in the details. He advocates using the hypothesis testing package to increase our confidence that the current implementation adequately handles corner cases.

I would like to do my part in improving the code base. I don't have strong opinions but I have to admit that I would like to eventually make a PR that resolves these bugs. This has been a PR half a year in the making after all.

Thoughts?

-areeves87

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Proposal - extend histograms api to allow uneven bins

Ross Barnowski
Just a few thoughts re: the changes proposed in https://github.com/numpy/numpy/pull/14278

1. Though the PR is limited to the 'auto' kwarg, the issue of potential memory problems for the automated binning methods is a more general one (e.g. #15332).

2. The main concern that jumps out to me is downstream users who are relying on the implicit assumption of regular binning. This is of course bad practice and makes even less sense when using one of the bin estimators, so I'm not sure how big of a concern it is. However, there is likely downstream user code that relies on the regular binning assumption, especially since, as far as I know, NumPy has never implemented binning techniques that return irregular bins.

3. The astropy project have at least one estimator that returns irregular bins.  I checked for issues related to irregular binning: though they have many of the same problems with the automatic bin estimators (i.e. memory problems for inputs with outliers), I didn't see anything specifically related to irregular binning

I just wanted to add my two cents. The binning-data-with-outliers problem is very common in high-resolution spectroscopy, and I have seen practitioners rely on the assumption of regular binning (e.g. divide the `range` by the number of bins) to specify bin centers even though this is not the right way to do things.

Thanks for taking the time to write up your work!

On Mon, Feb 10, 2020 at 10:53 PM <[hidden email]> wrote:
Send NumPy-Discussion mailing list submissions to
        [hidden email]

To subscribe or unsubscribe via the World Wide Web, visit
        https://mail.python.org/mailman/listinfo/numpy-discussion
or, via email, send a message with subject or body 'help' to
        [hidden email]

You can reach the person managing the list at
        [hidden email]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of NumPy-Discussion digest..."


Today's Topics:

   1. Proposal - extend histograms api to allow uneven bins
      (Alexander Reeves)
   2. Re: NEP 38 - Universal SIMD intrinsics (Ralf Gommers)
   3. Re: NEP 38 - Universal SIMD intrinsics (Matti Picus)


----------------------------------------------------------------------

Message: 1
Date: Mon, 10 Feb 2020 18:07:40 -0800
From: Alexander Reeves <[hidden email]>
To: [hidden email]
Subject: [Numpy-discussion] Proposal - extend histograms api to allow
        uneven bins
Message-ID:
        <CABeAeRyiVt4RJP2ew7=[hidden email]>
Content-Type: text/plain; charset="utf-8"

Greetings,

I have a PR that warrants discussion according to @seberg. See
https://github.com/numpy/numpy/pull/14278.

It is an enhancement that fixes a bug. The original bug is that when using
the fd estimator on a dataset with small inter-quartile range and large
outliers, the current codebase produces more bins than memory allows. There
are several related bug reports (see #11879, #10297, #8203).

In terms of scope, I restricted my changes to conditions where
np.histogram(bins='auto') defaults to the 'fd'.  For the actual fix, I
actually enhanced the API. I used a suggestion from @eric-wieser to merge
empty histogram bins. In practice this solves the outsized bins issue.

However @seberg is concerned that extending the API in this way may not be
the way to go. For example, if you use "auto" once, and then re-use the
bins, the uneven bins may not be what you want.

Furthermore @eric-wieser is concerned that there may be a floating-point
devil in the details. He advocates using the hypothesis testing package to
increase our confidence that the current implementation adequately handles
corner cases.

I would like to do my part in improving the code base. I don't have strong
opinions but I have to admit that I would like to eventually make a PR that
resolves these bugs. This has been a PR half a year in the making after all.

Thoughts?

-areeves87
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200210/69a4fab1/attachment-0001.html>

------------------------------

Message: 2
Date: Mon, 10 Feb 2020 23:16:44 -0600
From: Ralf Gommers <[hidden email]>
To: Discussion of Numerical Python <[hidden email]>
Subject: Re: [Numpy-discussion] NEP 38 - Universal SIMD intrinsics
Message-ID:
        <CABL7CQjzqoZ5Zmo=+[hidden email]>
Content-Type: text/plain; charset="utf-8"

On Tue, Feb 4, 2020 at 2:00 PM Hameer Abbasi <[hidden email]>
wrote:

> ?snip?
>
> > 1) Once NumPy adds the framework and initial set of Universal Intrinsic,
> if contributors want to leverage a new architecture specific SIMD
> instruction, will they be expected to add software implementation of this
> instruction for all other architectures too?
>
> In my opinion, if the instructions are lower, then yes. For example, one
> cannot add AVX-512 without adding, for example adding AVX-256 and AVX-128
> and SSE*.  However, I would not expect one person or team to be an expert
> in all assemblies, so intrinsics for one architecture can be developed
> independently of another.
>

I think this doesn't quite answer the question. If I understand correctly,
it's about a single instruction (e.g. one needs "VEXP2PD" and it's missing
from the supported AVX512 instructions in master). I think the answer is
yes, it needs to be added for other architectures as well. Otherwise, if
universal intrinsics are added ad-hoc and there's no guarantee that a
universal instruction is available for all main supported platforms, then
over time there won't be much that's "universal" about the framework.

This is a different question though from adding a new ufunc implementation.
I would expect accelerating ufuncs via intrinsics that are already
supported to be much more common than having to add new intrinsics. Does
that sound right?


> > 2) On whom does the burden lie to ensure that new implementations are
> benchmarked and shows benefits on every architecture? What happens if
> optimizing an Ufunc leads to improving performance on one architecture and
> worsens performance on another?
>

This is slightly hard to provide a recipe for. I suspect it may take a
while before this becomes an issue, since we don't have much SIMD code to
begin with. So adding new code with benchmarks will likely show
improvements on all architectures (we should ensure benchmarks can be run
via CI, otherwise it's too onerous). And if not and it's not easily
fixable, the problematic platform could be skipped so performance there is
unchanged.

Only once there's existing universal intrinsics and then they're tweaked
will we have to be much more careful I'd think.

Cheers,
Ralf



>
> I would look at this from a maintainability point of view. If we are
> increasing the code size by 20% for a certain ufunc, there must be a
> domonstrable 20% increase in performance on any CPU. That is to say,
> micro-optimisation will be unwelcome, and code readability will be
> preferable. Usually we ask the submitter of the PR to test the PR with a
> machine they have on hand, and I would be inclined to keep this trend of
> self-reporting. Of course, if someone else came along and reported a
> performance regression of, say, 10%, then we have increased code by 20%,
> with only a net 5% gain in performance, and the PR will have to be reverted.
>
> ?snip?
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200210/4b8354cf/attachment-0001.html>

------------------------------

Message: 3
Date: Tue, 11 Feb 2020 08:53:18 +0200
From: Matti Picus <[hidden email]>
To: [hidden email]
Subject: Re: [Numpy-discussion] NEP 38 - Universal SIMD intrinsics
Message-ID: <[hidden email]>
Content-Type: text/plain; charset=utf-8; format=flowed


On 11/2/20 7:16 am, Ralf Gommers wrote:
>
>
> On Tue, Feb 4, 2020 at 2:00 PM Hameer Abbasi
> <[hidden email] <mailto:[hidden email]>> wrote:
>
>     ?snip?
>
>     > 1) Once NumPy adds the framework and initial set of Universal Intrinsic, if
>     contributors want to leverage a new architecture specific SIMD
>     instruction, will they be expected to add software implementation
>     of this instruction for all other architectures too?
>
>     In my opinion, if the instructions are lower, then yes. For
>     example, one cannot add AVX-512 without adding, for example adding
>     AVX-256 and AVX-128 and SSE*.? However, I would not expect one
>     person or team to be an expert in all assemblies, so intrinsics
>     for one architecture can be developed independently of another.
>
>
> I think this doesn't quite answer the question. If I understand
> correctly, it's about a single instruction (e.g. one needs
> |"VEXP2PD"and it's missing from the supported AVX512 instructions in
> master). I think the answer is yes, it needs to be added for other
> architectures as well. Otherwise, if universal intrinsics are added
> ad-hoc and there's no guarantee that a universal instruction is
> available for all main supported platforms, then over time there won't
> be much that's "universal" about the framework.|
> |
> |
> |This is a different question though from adding a new ufunc
> implementation. I would expect accelerating ufuncs via intrinsics that
> are already supported to be much more common than having to add new
> intrinsics. Does that sound right?
> |
|Yes. Universal intrinsics are cross-platform. However the NEP is open
to the possibility that certain architectures may have SIMD intrinsics
that cannot be expressed in terms of intrinsics for other platforms, and
so there may be a use case for architecture-specific loops. This is
explicitly stated in the latest PR to the NEP: "|If the regression is
not minimal, we may choose to keep the X86-specific code for that
platform and use the universal intrisic code for other platforms."
> |
> |
>
>
>     > 2) On whom does the burden lie to ensure that new
>     implementations are benchmarked and shows benefits on every
>     architecture? What happens if optimizing an Ufunc leads to
>     improving performance on one architecture and worsens performance
>     on another?
>
>
> This is slightly hard to provide a recipe for. I suspect it may take a
> while before this becomes an issue, since we don't have much SIMD code
> to begin with. So adding new code with benchmarks will likely show
> improvements on all architectures (we should ensure benchmarks can be
> run via CI, otherwise it's too onerous). And if not and it's not
> easily fixable, the problematic platform could be skipped so
> performance there is unchanged.


On HEAD, out of the 89 ufuncs in
numpy.core.code_generators.generate_umath.defdict, 34 have X86-specific
simd loops:


 >>> [x for x in defdict.keys() if any([td.simd for td in
defdict[x].type_descriptions])]
['add', 'subtract', 'multiply', 'conjugate', 'square', 'reciprocal',
'absolute', 'negative', 'greater', 'greater_equal', 'less',
'less_equal', 'equal', 'not_equal', 'logical_and', 'logical_not',
'logical_or', 'maximum', 'minimum', 'bitwise_and', 'bitwise_or',
'bitwise_xor', 'invert', 'left_shift', 'right_shift', 'cos', 'sin',
'exp', 'log', 'sqrt', 'ceil', 'trunc', 'floor', 'rint']


They would be the first targets for universal intrinsics. Of them I
estimate that the ones with more than one loop for at least one dtype
signature would be the most difficult, since these have different
optimizations for avx2, fma, and/or avx512f:


['square', 'reciprocal', 'absolute', 'cos', 'sin', 'exp', 'log', 'sqrt',
'ceil', 'trunc', 'floor', 'rint']


The other 55 ufuncs, for completeness, are


['floor_divide', 'true_divide', 'fmod', '_ones_like', 'power',
'float_power', '_arg', 'positive', 'sign', 'logical_xor', 'clip',
'fmax', 'fmin', 'logaddexp', 'logaddexp2', 'heaviside', 'degrees',
'rad2deg', 'radians', 'deg2rad', 'arccos', 'arccosh', 'arcsin',
'arcsinh', 'arctan', 'arctanh', 'tan', 'cosh', 'sinh', 'tanh', 'exp2',
'expm1', 'log2', 'log10', 'log1p', 'cbrt', 'fabs', 'arctan2',
'remainder', 'divmod', 'hypot', 'isnan', 'isnat', 'isinf', 'isfinite',
'signbit', 'copysign', 'nextafter', 'spacing', 'modf', 'ldexp', 'frexp',
'gcd', 'lcm', 'matmul']


As for testing accuracy: we recently added a framework for testing ulp
variation of ufuncs against "golden results" in
numpy/core/tests/test_umath_accuracy. So far float32 is tested for exp,
log, cos, sin. Others may be tested elsewhere by specific tests, for
instance numpy/core/test/test_half.py has test_half_ufuncs.


It is difficult to do benchmarking on CI: the machines that run CI vary
too much. We would need to set aside a machine for this and carefully
set it up to keep CPU speed and temperature constant. We do have
benchmarks for ufuncs (they could always be improved). I think Pauli
runs the benchmarks carefully on X86, and may even makes the results
public, but that resource is not really on PR reviewers' radar. We could
run benchmarks on the gcc build farm machines for other architectures.
Those machines are shared but not heavily utilized.


> Only once there's existing universal intrinsics and then they're
> tweaked will we have to be much more careful I'd think.
>
>
>
>     I would look at this from a maintainability point of view. If we
>     are increasing the code size by 20% for a certain ufunc, there
>     must be a domonstrable 20% increase in performance on any CPU.
>     That is to say, micro-optimisation will be unwelcome, and code
>     readability will be preferable. Usually we ask the submitter of
>     the PR to test the PR with a machine they have on hand, and I
>     would be inclined to keep this trend of self-reporting. Of course,
>     if someone else came along and reported a performance regression
>     of, say, 10%, then we have increased code by 20%, with only a net
>     5% gain in performance, and the PR will have to be reverted.
>
>     ?snip?
>

I think we should be careful not to increase the reviewer burden, and
try to automate as much as possible. It would be nice if we could at
some point set up a set of bots that can be triggered to run benchmarks
for us and report in the PR the results.


Matti



------------------------------

Subject: Digest Footer

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion


------------------------------

End of NumPy-Discussion Digest, Vol 161, Issue 10
*************************************************

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion