# Default type for functions that accumulate integers

5 messages
Open this post in threaded view
|
Report Content as Inappropriate

## Default type for functions that accumulate integers

 Hi All,Currently functions like trace use the C long type as the default accumulator for integer types of lesser precision: dtype : dtype, optional    Determines the data-type of the returned array and of the accumulator    where the elements are summed. If dtype has the value None and `a` is    of integer type of precision less than the default integer    precision, then the default integer precision is used. Otherwise,    the precision is the same as that of `a`.The problem with this is that the precision of long varies with the platform so that the result varies,  see gh-8433 for a complaint about this. There are two possible alternatives that seem reasonable to me:Use 32 bit accumulators on 32 bit platforms and 64 bit accumulators on 64 bit platforms.Always use 64 bit accumulators.Thoughts?Chuck _______________________________________________ NumPy-Discussion mailing list [hidden email] https://mail.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|
Report Content as Inappropriate

## Re: Default type for functions that accumulate integers

 On Mon, Jan 2, 2017 at 6:27 PM, Charles R Harris <[hidden email]> wrote: > Hi All, > > Currently functions like trace use the C long type as the default > accumulator for integer types of lesser precision: > >> dtype : dtype, optional >>     Determines the data-type of the returned array and of the accumulator >>     where the elements are summed. If dtype has the value None and `a` is >>     of integer type of precision less than the default integer >>     precision, then the default integer precision is used. Otherwise, >>     the precision is the same as that of `a`. > > > The problem with this is that the precision of long varies with the platform > so that the result varies,  see gh-8433 for a complaint about this. There > are two possible alternatives that seem reasonable to me: > > Use 32 bit accumulators on 32 bit platforms and 64 bit accumulators on 64 > bit platforms. > Always use 64 bit accumulators. This is a special case of a more general question: right now we use the default integer precision (i.e., what you get from np.array([1]), or np.arange, or np.dtype(int)), and it turns out that the default integer precision itself varies in confusing ways, and this is a common source of bugs. Specifically: right now it's 32-bit on 32-bit builds, and 64-bit on 64-bit builds, except on Windows where it's always 32-bit. This matches the default precision of Python 2 'int'. So some options include: - make the default integer precision 64-bits everywhere - make the default integer precision 32-bits on 32-bit systems, and 64-bits on 64-bit systems (including Windows) - leave the default integer precision the same, but make accumulators 64-bits everywhere - leave the default integer precision the same, but make accumulators 64-bits on 64-bit systems (including Windows) - ... Given the prevalence of 64-bit systems these days, and the fact that the current setup makes it very easy to write code that seems to work when tested on a 64-bit system but that silently returns incorrect results on 32-bit systems, it sure would be nice if we could switch to a 64-bit default everywhere. (You could still get 32-bit integers, of course, you'd just have to ask for them explicitly.) Things we'd need to know more about before making a decision: - compatibility: if we flip this switch, how much code breaks? In general correct numpy-using code has to be prepared to handle np.dtype(int) being 64-bits, and in fact there might be more code that accidentally assumes that np.dtype(int) is always 64-bits than there is code that assumes it is always 32-bits. But that's theory; to know how bad this is we would need to try actually running some projects test suites and see whether they break or not. - speed: there's probably some cost to using 64-bit integers on 32-bit systems; how big is the penalty in practice? -n -- Nathaniel J. Smith -- https://vorpus.org_______________________________________________ NumPy-Discussion mailing list [hidden email] https://mail.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|
Report Content as Inappropriate

## Re: Default type for functions that accumulate integers

 On Mo, 2017-01-02 at 18:46 -0800, Nathaniel Smith wrote: > On Mon, Jan 2, 2017 at 6:27 PM, Charles R Harris > <[hidden email]> wrote: > > > > Hi All, > > > > Currently functions like trace use the C long type as the default > > accumulator for integer types of lesser precision: > > > > Things we'd need to know more about before making a decision: > - compatibility: if we flip this switch, how much code breaks? In > general correct numpy-using code has to be prepared to handle > np.dtype(int) being 64-bits, and in fact there might be more code > that > accidentally assumes that np.dtype(int) is always 64-bits than there > is code that assumes it is always 32-bits. But that's theory; to know > how bad this is we would need to try actually running some projects > test suites and see whether they break or not. > - speed: there's probably some cost to using 64-bit integers on 32- > bit > systems; how big is the penalty in practice? > I agree with trying to switch the default in general first, I don't like the idea of having two different "defaults". There are two issues, one is the change on Python 2 (no inheritance of Python int by default numpy type) and any issues due to increased precision (more RAM usage, code actually expects lower precision somehow, etc.). Cannot say I know for sure, but I would be extremely surprised if there is a speed difference between 32bit vs. 64bit architectures, except the general slowdown you get due to bus speeds, etc. when going to higher bit width. If the inheritance for some reason is a bigger issue, we might limit the change to Python 3. For other possible problems, I think we may have difficulties assessing how much is affected. The problem is, that the most affected thing should be projects only being used on windows, or so. Bigger projects should work fine already (they are more likely to get better due to not being tested as well on 32bit long platforms, especially 64bit windows). Of course limiting the change to python 3, could have the advantage of not affecting older projects which are possibly more likely to be specifically using the current behaviour. So, I would be open to trying the change, I think the idea of at least changing it in python 3 has been brought up a couple of times, including by Julian, so maybe it is time to give it a shot.... It would be interesting to see if anyone knows projects that may be affected (for example because they are designed to only run on windows or limited hardware), and if avoiding to change anything in python 2 might mitigate problems here as well (additionally to avoiding the inheritance change)? Best, Sebastian > -n > _______________________________________________ NumPy-Discussion mailing list [hidden email] https://mail.scipy.org/mailman/listinfo/numpy-discussion signature.asc (836 bytes) Download Attachment
Open this post in threaded view
|
Report Content as Inappropriate

## Re: Default type for functions that accumulate integers

 On Tue, Jan 3, 2017 at 10:08 AM, Sebastian Berg wrote:On Mo, 2017-01-02 at 18:46 -0800, Nathaniel Smith wrote: > On Mon, Jan 2, 2017 at 6:27 PM, Charles R Harris > <[hidden email]> wrote: > > > > Hi All, > > > > Currently functions like trace use the C long type as the default > > accumulator for integer types of lesser precision: > > > > Things we'd need to know more about before making a decision: > - compatibility: if we flip this switch, how much code breaks? In > general correct numpy-using code has to be prepared to handle > np.dtype(int) being 64-bits, and in fact there might be more code > that > accidentally assumes that np.dtype(int) is always 64-bits than there > is code that assumes it is always 32-bits. But that's theory; to know > how bad this is we would need to try actually running some projects > test suites and see whether they break or not. > - speed: there's probably some cost to using 64-bit integers on 32- > bit > systems; how big is the penalty in practice? > I agree with trying to switch the default in general first, I don't like the idea of having two different "defaults". There are two issues, one is the change on Python 2 (no inheritance of Python int by default numpy type) and any issues due to increased precision (more RAM usage, code actually expects lower precision somehow, etc.). Cannot say I know for sure, but I would be extremely surprised if there is a speed difference between 32bit vs. 64bit architectures, except the general slowdown you get due to bus speeds, etc. when going to higher bit width. If the inheritance for some reason is a bigger issue, we might limit the change to Python 3. For other possible problems, I think we may have difficulties assessing how much is affected. The problem is, that the most affected thing should be projects only being used on windows, or so. Bigger projects should work fine already (they are more likely to get better due to not being tested as well on 32bit long platforms, especially 64bit windows). Of course limiting the change to python 3, could have the advantage of not affecting older projects which are possibly more likely to be specifically using the current behaviour. So, I would be open to trying the change, I think the idea of at least changing it in python 3 has been brought up a couple of times, including by Julian, so maybe it is time to give it a shot.... It would be interesting to see if anyone knows projects that may be affected (for example because they are designed to only run on windows or limited hardware), and if avoiding to change anything in python 2 might mitigate problems here as well (additionally to avoiding the inheritance change)?There have been a number of reports of problems due to the inheritance stemming both from the changing precision and, IIRC, from differences in print format or some such. So I don't expect that there will be no problems, but they will probably not be difficult to fix.Chuck _______________________________________________ NumPy-Discussion mailing list [hidden email] https://mail.scipy.org/mailman/listinfo/numpy-discussion