Re: problem with float64's str()

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: problem with float64's str()

Will Lee
Thanks for all the comments about this issue.  Do you know if there's a ticket that's open for this?  Is this an easy fix before the 1.0.5 release?

Thanks,

Will

On Fri, Apr 4, 2008 at 3:40 PM, Timothy Hochberg <[hidden email]> wrote:


On Fri, Apr 4, 2008 at 12:47 PM, Robert Kern <[hidden email]> wrote:
On Fri, Apr 4, 2008 at 9:56 AM, Will Lee <[hidden email]> wrote:
> I understand the implication for the floating point comparison and the need
> for allclose.  However, I think in a doctest context, this behavior makes
> the doc much harder to read.

Tabling the issue of the fact that we changed behavior for a moment,
this is a fundamental problem with using doctests as unit tests for
numerical code. The floating point results that you get *will* be
different on different machines, but the code will still be correct.
Using allclose() and similar techniques are the best tools available
(although they still suck). Relying on visual representations of these
results is simply an untenable strategy.

That is sometimes, but not always the case. Why? Because most of the time that one ends up with simple values, one is starting with arbitrary floating point values and doing at most simple operations on them. Thus a strategy that helps many of my unit tests look better and function reliably is to choose values that can be represented exactly in floating point. If the original value here had been 0.00125 rather than .0012, there would be no problem here. Well almost, you still are vulnerable to the rules for zero padding and what no getting changed and so forth, but in general it's more reliable and prettier.

Of course this isn't always a solution. But I've found it's helpful for a lot cases.

Note that the string
representation of NaNs and Infs are completely different across
platforms.

That said, str(float_numpy_scalar) really should have the same rules
as str(some_python_float).

+1


 


--

Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
 -- Umberto Eco
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion



--
. __
. |-\
.
. [hidden email]

_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion



_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: problem with float64's str()

Charles R Harris


On Fri, Apr 4, 2008 at 1:47 PM, Robert Kern <[hidden email]> wrote:
On Fri, Apr 4, 2008 at 9:56 AM, Will Lee <[hidden email]> wrote:
> I understand the implication for the floating point comparison and the need
> for allclose.  However, I think in a doctest context, this behavior makes
> the doc much harder to read.

Tabling the issue of the fact that we changed behavior for a moment,
this is a fundamental problem with using doctests as unit tests for
numerical code. The floating point results that you get *will* be
different on different machines, but the code will still be correct.
Using allclose() and similar techniques are the best tools available
(although they still suck). Relying on visual representations of these
results is simply an untenable strategy. Note that the string
representation of NaNs and Infs are completely different across
platforms.

That said, str(float_numpy_scalar) really should have the same rules
as str(some_python_float).

For all different precisions? And what should the rules be. I note that numpy doesn't distinguish between repr and str, maybe we could specify different behavior for the two.

Chuck



_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: problem with float64's str()

Robert Kern-2
On Thu, Apr 10, 2008 at 7:31 PM, Charles R Harris
<[hidden email]> wrote:
> > That said, str(float_numpy_scalar) really should have the same rules
> > as str(some_python_float).
>
> For all different precisions?

No. I should have said str(float64_numpy_scalar). I am content to
leave the other types alone.

> And what should the rules be.

All Python does is use a lower decimal precision for __str__ than __repr__.

> I note that
> numpy doesn't distinguish between repr and str, maybe we could specify
> different behavior for the two.

Yes, precisely.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
 -- Umberto Eco
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: problem with float64's str()

Charles R Harris


On Thu, Apr 10, 2008 at 6:38 PM, Robert Kern <[hidden email]> wrote:
On Thu, Apr 10, 2008 at 7:31 PM, Charles R Harris
<[hidden email]> wrote:
> > That said, str(float_numpy_scalar) really should have the same rules
> > as str(some_python_float).
>
> For all different precisions?

No. I should have said str(float64_numpy_scalar). I am content to
leave the other types alone.

> And what should the rules be.

All Python does is use a lower decimal precision for __str__ than __repr__.

> I note that
> numpy doesn't distinguish between repr and str, maybe we could specify
> different behavior for the two.

Yes, precisely.

Well, I know where to do that and have a ticket for it. What I would also like to do is use float.h for setting the repr precision, but I am not sure I can count on its presence as it only became part of the spec in 1999. Then again, that's almost ten years ago. Anyway,  python on my machine generates 12 significant digits. Is that common to everyone?

Chuck


_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: problem with float64's str()

Robert Kern-2
On Thu, Apr 10, 2008 at 7:57 PM, Charles R Harris
<[hidden email]> wrote:

>
> On Thu, Apr 10, 2008 at 6:38 PM, Robert Kern <[hidden email]> wrote:
> >
> > On Thu, Apr 10, 2008 at 7:31 PM, Charles R Harris
> > <[hidden email]> wrote:
> > > > That said, str(float_numpy_scalar) really should have the same rules
> > > > as str(some_python_float).
> > >
> > > For all different precisions?
> >
> > No. I should have said str(float64_numpy_scalar). I am content to
> > leave the other types alone.
> >
> > > And what should the rules be.
> >
> > All Python does is use a lower decimal precision for __str__ than
> __repr__.
> >
> >
> > > I note that
> > > numpy doesn't distinguish between repr and str, maybe we could specify
> > > different behavior for the two.
> >
> > Yes, precisely.
>
> Well, I know where to do that and have a ticket for it. What I would also
> like to do is use float.h for setting the repr precision, but I am not sure
> I can count on its presence as it only became part of the spec in 1999. Then
> again, that's almost ten years ago. Anyway,  python on my machine generates
> 12 significant digits. Is that common to everyone?

Here is the relevant portion of Objects/floatobject.c:

/* Precisions used by repr() and str(), respectively.

   The repr() precision (17 significant decimal digits) is the minimal number
   that is guaranteed to have enough precision so that if the number is read
   back in the exact same binary value is recreated.  This is true for IEEE
   floating point by design, and also happens to work for all other modern
   hardware.

   The str() precision is chosen so that in most cases, the rounding noise
   created by various operations is suppressed, while giving plenty of
   precision for practical use.

*/

#define PREC_REPR 17
#define PREC_STR 12



svn blame tells me that those have been there unchanged since 1999.

You may want to steal the function format_float() that is defined in
that file, too.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
 -- Umberto Eco
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: problem with float64's str()

Charles R Harris


On Thu, Apr 10, 2008 at 7:06 PM, Robert Kern <[hidden email]> wrote:
On Thu, Apr 10, 2008 at 7:57 PM, Charles R Harris
<[hidden email]> wrote:
>
> On Thu, Apr 10, 2008 at 6:38 PM, Robert Kern <[hidden email]> wrote:
> >
> > On Thu, Apr 10, 2008 at 7:31 PM, Charles R Harris
> > <[hidden email]> wrote:
> > > > That said, str(float_numpy_scalar) really should have the same rules
> > > > as str(some_python_float).
> > >
> > > For all different precisions?
> >
> > No. I should have said str(float64_numpy_scalar). I am content to
> > leave the other types alone.
> >
> > > And what should the rules be.
> >
> > All Python does is use a lower decimal precision for __str__ than
> __repr__.
> >
> >
> > > I note that
> > > numpy doesn't distinguish between repr and str, maybe we could specify
> > > different behavior for the two.
> >
> > Yes, precisely.
>
> Well, I know where to do that and have a ticket for it. What I would also
> like to do is use float.h for setting the repr precision, but I am not sure
> I can count on its presence as it only became part of the spec in 1999. Then
> again, that's almost ten years ago. Anyway,  python on my machine generates
> 12 significant digits. Is that common to everyone?

Here is the relevant portion of Objects/floatobject.c:

/* Precisions used by repr() and str(), respectively.

  The repr() precision (17 significant decimal digits) is the minimal number
  that is guaranteed to have enough precision so that if the number is read
  back in the exact same binary value is recreated.  This is true for IEEE
  floating point by design, and also happens to work for all other modern
  hardware.

  The str() precision is chosen so that in most cases, the rounding noise
  created by various operations is suppressed, while giving plenty of
  precision for practical use.

*/

#define PREC_REPR       17
#define PREC_STR        12



svn blame tells me that those have been there unchanged since 1999.

I left this note on my ticket.

These values should really be determined at compile time, not hardwired in at lines 611-621 of scalartypes.inc.src. Maybe use the values in float.h, which on my machine give

single digits 6
double digits 15
long double digits 18

The current values we are using are 8, 17, and 22 whereas the values above are supposed to guarantee reversible conversion to and from decimal. Of course, that doesn't seem to be the case in practice, they seem to need at least one more digit. The other question is if all the common compilers support float.h

The numbers above were generated by

#include <float.h>
#include <stdio.h>

int main(int argc, char** argv)
{
printf("single digits %d\n",FLT_DIG);
printf("double digits %d\n",DBL_DIG);
printf("long double digits %d\n",LDBL_DIG);

return 1;
}

The reason I wanted to use float.h for the repr precisions is that at some point long double is bound to be quad precision, I think it already is on some machines and it used to be on vaxen. So I wanted it to carry over naturally when the change came. I note that arrays and scalars print differently and I'm not sure where the array print is implemented.

Chuck
 

You may want to steal the function format_float() that is defined in
that file, too.

I'll look at it. At the moment we use %.*g

Chuck



_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: problem with float64's str()

Charles R Harris
In reply to this post by Robert Kern-2


On Thu, Apr 10, 2008 at 7:06 PM, Robert Kern <[hidden email]> wrote:
On Thu, Apr 10, 2008 at 7:57 PM, Charles R Harris
<[hidden email]> wrote:
>
> On Thu, Apr 10, 2008 at 6:38 PM, Robert Kern <[hidden email]> wrote:
> >
> > On Thu, Apr 10, 2008 at 7:31 PM, Charles R Harris
> > <[hidden email]> wrote:
> > > > That said, str(float_numpy_scalar) really should have the same rules
> > > > as str(some_python_float).
> > >
> > > For all different precisions?
> >
> > No. I should have said str(float64_numpy_scalar). I am content to
> > leave the other types alone.
> >
> > > And what should the rules be.
> >
> > All Python does is use a lower decimal precision for __str__ than
> __repr__.
> >
> >
> > > I note that
> > > numpy doesn't distinguish between repr and str, maybe we could specify
> > > different behavior for the two.
> >
> > Yes, precisely.
>
> Well, I know where to do that and have a ticket for it. What I would also
> like to do is use float.h for setting the repr precision, but I am not sure
> I can count on its presence as it only became part of the spec in 1999. Then
> again, that's almost ten years ago. Anyway,  python on my machine generates
> 12 significant digits. Is that common to everyone?

Here is the relevant portion of Objects/floatobject.c:

/* Precisions used by repr() and str(), respectively.

  The repr() precision (17 significant decimal digits) is the minimal number
  that is guaranteed to have enough precision so that if the number is read
  back in the exact same binary value is recreated.  This is true for IEEE
  floating point by design, and also happens to work for all other modern
  hardware.

  The str() precision is chosen so that in most cases, the rounding noise
  created by various operations is suppressed, while giving plenty of
  precision for practical use.

*/

#define PREC_REPR       17
#define PREC_STR        12


OK, I've committed a change that fixes the problem that started this thread, but I'm going to leave the ticket open for a while until I decide what to do about longdouble. The precisions are now

#define FLOATPREC_REPR 8
#define FLOATPREC_STR 6
#define DOUBLEPREC_REPR 17
#define DOUBLEPREC_STR 12
#if SIZEOF_LONGDOUBLE == SIZEOF_DOUBLE
#define LONGDOUBLEPREC_REPR DOUBLEPREC_REPR
#define LONGDOUBLEPREC_STR DOUBLEPREC_STR
#else /* More than probably needed on Intel FP */
#define LONGDOUBLEPREC_REPR 20
#define LONGDOUBLEPREC_STR 12
#endif

I'm open to suggestions.

Chuck



_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: problem with float64's str()

Robert Kern-2
On Thu, Apr 10, 2008 at 8:58 PM, Charles R Harris
<[hidden email]> wrote:
> I'm open to suggestions.

I have nothing better to offer than what you've done. Thank you!

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
 -- Umberto Eco
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: problem with float64's str()

Charles R Harris


On Thu, Apr 10, 2008 at 8:01 PM, Robert Kern <[hidden email]> wrote:
On Thu, Apr 10, 2008 at 8:58 PM, Charles R Harris
<[hidden email]> wrote:
> I'm open to suggestions.

I have nothing better to offer than what you've done. Thank you!

OK, but it looks like I need to implement our own conversion to strings functions to correctly display longdouble. PyOS_snprintf is what we are currently using and it only takes doubles. Grrrr.

Chuck


_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion