Default type for functions that accumulate integers

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Default type for functions that accumulate integers

Charles R Harris
Hi All,

Currently functions like trace use the C long type as the default accumulator for integer types of lesser precision:

dtype : dtype, optional
    Determines the data-type of the returned array and of the accumulator
    where the elements are summed. If dtype has the value None and `a` is
    of integer type of precision less than the default integer
    precision, then the default integer precision is used. Otherwise,
    the precision is the same as that of `a`.

The problem with this is that the precision of long varies with the platform so that the result varies,  see gh-8433 for a complaint about this. There are two possible alternatives that seem reasonable to me:

  1. Use 32 bit accumulators on 32 bit platforms and 64 bit accumulators on 64 bit platforms.
  2. Always use 64 bit accumulators.

Thoughts?

Chuck


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Default type for functions that accumulate integers

Nathaniel Smith
On Mon, Jan 2, 2017 at 6:27 PM, Charles R Harris
<[hidden email]> wrote:

> Hi All,
>
> Currently functions like trace use the C long type as the default
> accumulator for integer types of lesser precision:
>
>> dtype : dtype, optional
>>     Determines the data-type of the returned array and of the accumulator
>>     where the elements are summed. If dtype has the value None and `a` is
>>     of integer type of precision less than the default integer
>>     precision, then the default integer precision is used. Otherwise,
>>     the precision is the same as that of `a`.
>
>
> The problem with this is that the precision of long varies with the platform
> so that the result varies,  see gh-8433 for a complaint about this. There
> are two possible alternatives that seem reasonable to me:
>
> Use 32 bit accumulators on 32 bit platforms and 64 bit accumulators on 64
> bit platforms.
> Always use 64 bit accumulators.

This is a special case of a more general question: right now we use
the default integer precision (i.e., what you get from np.array([1]),
or np.arange, or np.dtype(int)), and it turns out that the default
integer precision itself varies in confusing ways, and this is a
common source of bugs. Specifically: right now it's 32-bit on 32-bit
builds, and 64-bit on 64-bit builds, except on Windows where it's
always 32-bit. This matches the default precision of Python 2 'int'.

So some options include:
- make the default integer precision 64-bits everywhere
- make the default integer precision 32-bits on 32-bit systems, and
64-bits on 64-bit systems (including Windows)
- leave the default integer precision the same, but make accumulators
64-bits everywhere
- leave the default integer precision the same, but make accumulators
64-bits on 64-bit systems (including Windows)
- ...

Given the prevalence of 64-bit systems these days, and the fact that
the current setup makes it very easy to write code that seems to work
when tested on a 64-bit system but that silently returns incorrect
results on 32-bit systems, it sure would be nice if we could switch to
a 64-bit default everywhere. (You could still get 32-bit integers, of
course, you'd just have to ask for them explicitly.)

Things we'd need to know more about before making a decision:
- compatibility: if we flip this switch, how much code breaks? In
general correct numpy-using code has to be prepared to handle
np.dtype(int) being 64-bits, and in fact there might be more code that
accidentally assumes that np.dtype(int) is always 64-bits than there
is code that assumes it is always 32-bits. But that's theory; to know
how bad this is we would need to try actually running some projects
test suites and see whether they break or not.
- speed: there's probably some cost to using 64-bit integers on 32-bit
systems; how big is the penalty in practice?

-n

--
Nathaniel J. Smith -- https://vorpus.org
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Default type for functions that accumulate integers

Sebastian Berg
On Mo, 2017-01-02 at 18:46 -0800, Nathaniel Smith wrote:
> On Mon, Jan 2, 2017 at 6:27 PM, Charles R Harris
> <[hidden email]> wrote:
> >
> > Hi All,
> >
> > Currently functions like trace use the C long type as the default
> > accumulator for integer types of lesser precision:
> >

<snip>

>
> Things we'd need to know more about before making a decision:
> - compatibility: if we flip this switch, how much code breaks? In
> general correct numpy-using code has to be prepared to handle
> np.dtype(int) being 64-bits, and in fact there might be more code
> that
> accidentally assumes that np.dtype(int) is always 64-bits than there
> is code that assumes it is always 32-bits. But that's theory; to know
> how bad this is we would need to try actually running some projects
> test suites and see whether they break or not.
> - speed: there's probably some cost to using 64-bit integers on 32-
> bit
> systems; how big is the penalty in practice?
>
I agree with trying to switch the default in general first, I don't
like the idea of having two different "defaults".

There are two issues, one is the change on Python 2 (no inheritance of
Python int by default numpy type) and any issues due to increased
precision (more RAM usage, code actually expects lower precision
somehow, etc.).
Cannot say I know for sure, but I would be extremely surprised if there
is a speed difference between 32bit vs. 64bit architectures, except the
general slowdown you get due to bus speeds, etc. when going to higher
bit width.

If the inheritance for some reason is a bigger issue, we might limit
the change to Python 3. For other possible problems, I think we may
have difficulties assessing how much is affected. The problem is, that
the most affected thing should be projects only being used on windows,
or so. Bigger projects should work fine already (they are more likely
to get better due to not being tested as well on 32bit long platforms,
especially 64bit windows).

Of course limiting the change to python 3, could have the advantage of
not affecting older projects which are possibly more likely to be
specifically using the current behaviour.

So, I would be open to trying the change, I think the idea of at least
changing it in python 3 has been brought up a couple of times,
including by Julian, so maybe it is time to give it a shot....

It would be interesting to see if anyone knows projects that may be
affected (for example because they are designed to only run on windows
or limited hardware), and if avoiding to change anything in python 2
might mitigate problems here as well (additionally to avoiding the
inheritance change)?

Best,

Sebastian


> -n
>
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Default type for functions that accumulate integers

Charles R Harris


On Tue, Jan 3, 2017 at 10:08 AM, Sebastian Berg <[hidden email]> wrote:
On Mo, 2017-01-02 at 18:46 -0800, Nathaniel Smith wrote:
> On Mon, Jan 2, 2017 at 6:27 PM, Charles R Harris
> <[hidden email]> wrote:
> >
> > Hi All,
> >
> > Currently functions like trace use the C long type as the default
> > accumulator for integer types of lesser precision:
> >

<snip>

>
> Things we'd need to know more about before making a decision:
> - compatibility: if we flip this switch, how much code breaks? In
> general correct numpy-using code has to be prepared to handle
> np.dtype(int) being 64-bits, and in fact there might be more code
> that
> accidentally assumes that np.dtype(int) is always 64-bits than there
> is code that assumes it is always 32-bits. But that's theory; to know
> how bad this is we would need to try actually running some projects
> test suites and see whether they break or not.
> - speed: there's probably some cost to using 64-bit integers on 32-
> bit
> systems; how big is the penalty in practice?
>

I agree with trying to switch the default in general first, I don't
like the idea of having two different "defaults".

There are two issues, one is the change on Python 2 (no inheritance of
Python int by default numpy type) and any issues due to increased
precision (more RAM usage, code actually expects lower precision
somehow, etc.).
Cannot say I know for sure, but I would be extremely surprised if there
is a speed difference between 32bit vs. 64bit architectures, except the
general slowdown you get due to bus speeds, etc. when going to higher
bit width.

If the inheritance for some reason is a bigger issue, we might limit
the change to Python 3. For other possible problems, I think we may
have difficulties assessing how much is affected. The problem is, that
the most affected thing should be projects only being used on windows,
or so. Bigger projects should work fine already (they are more likely
to get better due to not being tested as well on 32bit long platforms,
especially 64bit windows).

Of course limiting the change to python 3, could have the advantage of
not affecting older projects which are possibly more likely to be
specifically using the current behaviour.

So, I would be open to trying the change, I think the idea of at least
changing it in python 3 has been brought up a couple of times,
including by Julian, so maybe it is time to give it a shot....

It would be interesting to see if anyone knows projects that may be
affected (for example because they are designed to only run on windows
or limited hardware), and if avoiding to change anything in python 2
might mitigate problems here as well (additionally to avoiding the
inheritance change)?

There have been a number of reports of problems due to the inheritance stemming both from the changing precision and, IIRC, from differences in print format or some such. So I don't expect that there will be no problems, but they will probably not be difficult to fix.

Chuck

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Default type for functions that accumulate integers

Antoine Pitrou
In reply to this post by Nathaniel Smith
On Mon, 2 Jan 2017 18:46:08 -0800
Nathaniel Smith <[hidden email]> wrote:
>
> So some options include:
> - make the default integer precision 64-bits everywhere
> - make the default integer precision 32-bits on 32-bit systems, and
> 64-bits on 64-bit systems (including Windows)

Either of those two would be the best IMO.

Intuitively, I think people would expect 32-bit ints in 32-bit
processes by default, and 64-bit ints in 64-bit processes likewise. So
I would slightly favour the latter option.

> - leave the default integer precision the same, but make accumulators
> 64-bits everywhere
> - leave the default integer precision the same, but make accumulators
> 64-bits on 64-bit systems (including Windows)

Both of these options introduce a confusing discrepancy.

> - speed: there's probably some cost to using 64-bit integers on 32-bit
> systems; how big is the penalty in practice?

Ok, I have fired up a Windows VM to compare 32-bit and 64-bit builds.
Numpy version is 1.11.2, Python version is 3.5.2.  Keep in mind those
are Anaconda builds of Numpy, with MKL enabled for linear algebra;
YMMV.

For each benchmark, the first number is the result on the 32-bit build,
the second number on the 64-bit build.

Simple arithmetic
-----------------

>>> v = np.ones(1024**2, dtype='int32')

>>> %timeit v + v            # 1.73 ms per loop | 1.78 ms per loop
>>> %timeit v * v            # 1.77 ms per loop | 1.79 ms per loop
>>> %timeit v // v           # 5.89 ms per loop | 5.39 ms per loop

>>> v = np.ones(1024**2, dtype='int64')

>>> %timeit v + v            # 3.54 ms per loop | 3.54 ms per loop
>>> %timeit v * v            # 5.61 ms per loop | 3.52 ms per loop
>>> %timeit v // v           # 17.1 ms per loop | 13.9 ms per loop

Linear algebra
--------------

>>> m = np.ones((1024,1024), dtype='int32')

>>> %timeit m @ m            # 556 ms per loop  | 569 ms per loop

>>> m = np.ones((1024,1024), dtype='int64')

>>> %timeit m @ m            # 3.81 s per loop  | 1.01 s per loop

Sorting
-------

>>> v = np.random.RandomState(42).randint(1000, size=1024**2).astype('int32')

>>> %timeit np.sort(v)       # 43.4 ms per loop | 44 ms per loop

>>> v = np.random.RandomState(42).randint(1000, size=1024**2).astype('int64')

>>> %timeit np.sort(v)       # 61.5 ms per loop | 45.5 ms per loop

Indexing
--------

>>> v = np.ones(1024**2, dtype='int32')

>>> %timeit v[v[::-1]]       # 2.38 ms per loop | 4.63 ms per loop

>>> v = np.ones(1024**2, dtype='int64')

>>> %timeit v[v[::-1]]       # 6.9 ms per loop  | 3.63 ms per loop



Quick summary:
- for very simple operations, 32b and 64b builds can have the same perf
  on each given bitwidth (though speed is uniformly halved on 64-bit
  integers when the given operation is SIMD-vectorized)
- for more sophisticated operations (such as element-wise
  multiplication or division, or quicksort, but much more so on the
  matrix product), 32b builds are competitive with 64b builds on 32-bit
  ints, but lag behind on 64-bit ints
- for indexing, it's desirable to use a "native" width integer,
  regardless of whether that means 32- or 64-bit

Of course the numbers will vary depend on the platform (read:
compiler), but some aspects of this comparison will probably translate
to other platforms.

Regards

Antoine.


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/numpy-discussion