Proposal to support __format__

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Proposal to support __format__

Gustav Larsson
Hi everyone!

I want to discuss adding support for __format__ in ndarray and I am willing to
contribute code-wise once consensus has been reached. It was briefly
discussed on GitHub two years ago (https://github.com/numpy/numpy/issues/5543)
and I will re-iterate some of the points made there and build off of that. I
have been thinking about this a lot in the last few weeks and my thoughts turned
into a fairly fleshed out proposal. The discussion should probably start more
high-level, so I apologize if the level of detail is inappropriate at this
point in time.

I decided on a gist, since the email got too long and clear formatting helps:

https://gist.github.com/gustavla/2783543be1204d2b5d368f6a1fb4d069

OK, those are my thoughts for now. What do you think?

Cheers,
Gustav
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Proposal to support __format__

Stephan Hoyer-2
On Tue, Feb 14, 2017 at 3:34 PM, Gustav Larsson <[hidden email]> wrote:
Hi everyone!

I want to discuss adding support for __format__ in ndarray and I am willing to
contribute code-wise once consensus has been reached. It was briefly
discussed on GitHub two years ago (https://github.com/numpy/numpy/issues/5543)
and I will re-iterate some of the points made there and build off of that. I
have been thinking about this a lot in the last few weeks and my thoughts turned
into a fairly fleshed out proposal. The discussion should probably start more
high-level, so I apologize if the level of detail is inappropriate at this
point in time.

I decided on a gist, since the email got too long and clear formatting helps:

https://gist.github.com/gustavla/2783543be1204d2b5d368f6a1fb4d069

This is a lovely and clearly written document. Thanks for taking the time to think through this!

I encourage you to submit it as a pull request to the NumPy repository as a "NumPy Enhancement Proposal", either now or after we've discussed it:
 
OK, those are my thoughts for now. What do you think?

Two thoughts for now:
1. For object arrays, I would default to calling format on each element (your "map principle") rather than raising an error.
2. It's absolutely OK to leave functionality unimplemented and not immediately nail down every edge case. As a default, I would suggest raising errors whenever non-empty type specifications are provided rather than raising errors in every case.

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Proposal to support __format__

Gustav Larsson
I encourage you to submit it as a pull request to the NumPy repository as a "NumPy Enhancement Proposal", either now or after we've discussed it:
https://docs.scipy.org/doc/numpy-dev/neps/index.html

OK, I will let it go through one iteration of comments and then I'll submit one. Thanks!

1. For object arrays, I would default to calling format on each element (your "map principle") rather than raising an error.

I'm glad you brought this up as a possibility. It might be possible, but there are some issues that would need to be resolved. First of all, {} and {:} always works and gives the same result it currently does. So, this only affects the situation where the format spec is non-empty. I think there are two main issues:

Heterogeneity: Let's say we have x = np.array([12.3, True, 'string', Foo(10)], dtype=np.object). Then, presumably {:.1f} should cause a ValueError since the string does not support format type 'f'. This could create a lot of ValueError land mines for the user. For x[:2] however it should work and produce something like [12.3  1.0]. Note, the "map principle" still can't be strictly true. Let's say we have an array with type object and mostly string-like elements. Then {:5s} will still not produce exactly {:5s} element-wise, because the string representations need to be repr-based inside the array (otherwise it could break for newlines and things like that and produce spaces that make the boundary between elements ambiguous). This brings me to the next issue.

Str vs. repr: If we have a homogeneous object-array with types Foo and Foo implements __format__, it would be great if this worked. However, one issue is that Foo.__format__ might return things like newline (or spaces), which would break (or confuse) the printed output (unless it is made incredibly smart to support "vertical alignment"). This issue is essentially the same as for strings in general, which is why they use repr instead. I can think of two solutions: 1) Try to sanitize (or repr-ify) the string returned by __format__ somehow; 2) Put the responsibility on the user and simply let the rendering break if Foo.__format__ does not play well.

2. It's absolutely OK to leave functionality unimplemented and not immediately nail down every edge case. As a default, I would suggest raising errors whenever non-empty type specifications are provided rather than raising errors in every case.

I agree.

Gustav
 

On Tue, Feb 14, 2017 at 3:59 PM, Stephan Hoyer <[hidden email]> wrote:
On Tue, Feb 14, 2017 at 3:34 PM, Gustav Larsson <[hidden email]> wrote:
Hi everyone!

I want to discuss adding support for __format__ in ndarray and I am willing to
contribute code-wise once consensus has been reached. It was briefly
discussed on GitHub two years ago (https://github.com/numpy/numpy/issues/5543)
and I will re-iterate some of the points made there and build off of that. I
have been thinking about this a lot in the last few weeks and my thoughts turned
into a fairly fleshed out proposal. The discussion should probably start more
high-level, so I apologize if the level of detail is inappropriate at this
point in time.

I decided on a gist, since the email got too long and clear formatting helps:

https://gist.github.com/gustavla/2783543be1204d2b5d368f6a1fb4d069

This is a lovely and clearly written document. Thanks for taking the time to think through this!

I encourage you to submit it as a pull request to the NumPy repository as a "NumPy Enhancement Proposal", either now or after we've discussed it:
 
OK, those are my thoughts for now. What do you think?

Two thoughts for now:
1. For object arrays, I would default to calling format on each element (your "map principle") rather than raising an error.
2. It's absolutely OK to leave functionality unimplemented and not immediately nail down every edge case. As a default, I would suggest raising errors whenever non-empty type specifications are provided rather than raising errors in every case.

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/numpy-discussion



_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Proposal to support __format__

Stephan Hoyer-2
On Tue, Feb 14, 2017 at 5:35 PM, Gustav Larsson <[hidden email]> wrote:
1. For object arrays, I would default to calling format on each element (your "map principle") rather than raising an error.

I'm glad you brought this up as a possibility. It might be possible, but there are some issues that would need to be resolved. First of all, {} and {:} always works and gives the same result it currently does. So, this only affects the situation where the format spec is non-empty. I think there are two main issues:

Heterogeneity: Let's say we have x = np.array([12.3, True, 'string', Foo(10)], dtype=np.object). Then, presumably {:.1f} should cause a ValueError since the string does not support format type 'f'. This could create a lot of ValueError land mines for the user.

Things will absolutely break if you try to do complex operations on in-homogeneously typed arrays. I would put the onus on the user in such a case.
 
For x[:2] however it should work and produce something like [12.3  1.0]. Note, the "map principle" still can't be strictly true. Let's say we have an array with type object and mostly string-like elements. Then {:5s} will still not produce exactly {:5s} element-wise, because the string representations need to be repr-based inside the array (otherwise it could break for newlines and things like that and produce spaces that make the boundary between elements ambiguous). This brings me to the next issue.

Indeed, this will be a departure from the behavior without a format string, which just uses repr. In my mind, this is the strongest argument against using the map principle here, because there is a discontinuous shift between providing and not providing a format string.
 
Str vs. repr: If we have a homogeneous object-array with types Foo and Foo implements __format__, it would be great if this worked. However, one issue is that Foo.__format__ might return things like newline (or spaces), which would break (or confuse) the printed output (unless it is made incredibly smart to support "vertical alignment"). This issue is essentially the same as for strings in general, which is why they use repr instead. I can think of two solutions: 1) Try to sanitize (or repr-ify) the string returned by __format__ somehow; 2) Put the responsibility on the user and simply let the rendering break if Foo.__format__ does not play well.

I wouldn't do anything fancy here to worry about line breaks. It's basically impossible to get this right for edge cases, so I would certainly put the responsibility on the user.

On another note, about Python 2 vs 3: I would definitely take the approach of copying the Python 3 behavior on all versions of NumPy (when feasible) and not being too concerned about compatibility with format on Python 2. The future is Python 3.

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Proposal to support __format__

Marten van Kerkwijk
In reply to this post by Gustav Larsson
Hi Gustav,

This is great!  A few quick comments (mostly echo-ing Stephan's).

1. You basically have a NEP already! Making a PR from it allows to
give line-by-line comments, so would help!

2. Don't worry about supporting python2 specifics; just try to ensure
it doesn't break; I would not say more about it!

3. On `set_printoptions` -- ideally, it will become possible to use
this as a context (i.e., `with set_printoption(...)`). It might make
sense to have an `override_format` keyword argument to it.

4. Otherwise, my main suggestion is to start small with the more
obvious ones, and not worry too much about format validation, but
rather about getting the simple ones to work well (e.g., for an object
array, just apply the format given; if it doesn't work, it will error
out on its own, which is OK).

5. One bit of detail: the "g" one does confuse me.

All the best,

Marten
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Proposal to support __format__

Gustav Larsson
This is great!

Thanks! Glad to be met by enthusiasm about this.

1. You basically have a NEP already! Making a PR from it allows to
give line-by-line comments, so would help!

I will do this soon.

2. Don't worry about supporting python2 specifics; just try to ensure
it doesn't break; I would not say more about it!

Sounds good to me.

3. On `set_printoptions` -- ideally, it will become possible to use
this as a context (i.e., `with set_printoption(...)`). It might make
sense to have an `override_format` keyword argument to it.

Having a `with np.printoptions(...)` context manager is a great idea. It does sound orthogonal to __format__ though, so it could be addressed separately.

4. Otherwise, my main suggestion is to start small with the more
obvious ones, and not worry too much about format validation, but
rather about getting the simple ones to work well (e.g., for an object
array, just apply the format given; if it doesn't work, it will error
out on its own, which is OK).

Sounds good to me. I was thinking of approaching the implementation by writing unit tests first and group them into different priority tiers. That way, the unit tests can go through another review before implementation gets going. I agree that __format__ doesn't have to check format validation if a ValueError is going to be raised anyway by sub-calls.

5. One bit of detail: the "g" one does confuse me.

I will re-write this a bit to make it clearer. Basically, the 'g' with the mix of 'e'/'f' depending on max/min>1000 is all from the current numpy behavior, so it is not something I had much creative input on at all. Although, as it is written right now it may seem so. That is, the goal is to have {:} == {:g} for float arrays, analogous to how {:} == {:g} for built-in floats. Then, if the user departs a bit, like {:.2g}, it will simply be identical to calling np.set_printoptions(precision=2) first.

Gustav

On Wed, Feb 15, 2017 at 8:03 AM, Marten van Kerkwijk <[hidden email]> wrote:
Hi Gustav,

This is great!  A few quick comments (mostly echo-ing Stephan's).

1. You basically have a NEP already! Making a PR from it allows to
give line-by-line comments, so would help!

2. Don't worry about supporting python2 specifics; just try to ensure
it doesn't break; I would not say more about it!

3. On `set_printoptions` -- ideally, it will become possible to use
this as a context (i.e., `with set_printoption(...)`). It might make
sense to have an `override_format` keyword argument to it.

4. Otherwise, my main suggestion is to start small with the more
obvious ones, and not worry too much about format validation, but
rather about getting the simple ones to work well (e.g., for an object
array, just apply the format given; if it doesn't work, it will error
out on its own, which is OK).

5. One bit of detail: the "g" one does confuse me.

All the best,

Marten
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Proposal to support __format__

Ilhan Polat
On the last item, do we really have to follow that strange, `d`,`g` and so on conventions on formatting? With all respect to the humongous historical baggage, I think that notation is pretty archaic and terminal like. If being pythonic is of a concern here, maybe it is better to use a more verbose syntax. Just throwing out an idea after 15 seconds of thought (so by no means an alternative suggestion)

eng:6i5d -> engineering notation (always powers of ten of multiples of 3) 6 integral digits and 5 decimal digits.
float (whatever the default is)
float:4i2d (you get the idea)

etc.


FULL DISCLOSURE: I am a very displeased customer of `fprintf ` of matlab (and others) and this archaic formatting. I never got a hang of it so it might be the case that I don't quite get the rationale behind it and I almost always get it wrong. Maybe at least the rationale can be clarified.


Lastly, repeating what others mentioned: thank you for this well prepared initiative




On Wed, Feb 15, 2017 at 10:48 PM, Gustav Larsson <[hidden email]> wrote:
This is great!

Thanks! Glad to be met by enthusiasm about this.

1. You basically have a NEP already! Making a PR from it allows to
give line-by-line comments, so would help!

I will do this soon.

2. Don't worry about supporting python2 specifics; just try to ensure
it doesn't break; I would not say more about it!

Sounds good to me.

3. On `set_printoptions` -- ideally, it will become possible to use
this as a context (i.e., `with set_printoption(...)`). It might make
sense to have an `override_format` keyword argument to it.

Having a `with np.printoptions(...)` context manager is a great idea. It does sound orthogonal to __format__ though, so it could be addressed separately.

4. Otherwise, my main suggestion is to start small with the more
obvious ones, and not worry too much about format validation, but
rather about getting the simple ones to work well (e.g., for an object
array, just apply the format given; if it doesn't work, it will error
out on its own, which is OK).

Sounds good to me. I was thinking of approaching the implementation by writing unit tests first and group them into different priority tiers. That way, the unit tests can go through another review before implementation gets going. I agree that __format__ doesn't have to check format validation if a ValueError is going to be raised anyway by sub-calls.

5. One bit of detail: the "g" one does confuse me.

I will re-write this a bit to make it clearer. Basically, the 'g' with the mix of 'e'/'f' depending on max/min>1000 is all from the current numpy behavior, so it is not something I had much creative input on at all. Although, as it is written right now it may seem so. That is, the goal is to have {:} == {:g} for float arrays, analogous to how {:} == {:g} for built-in floats. Then, if the user departs a bit, like {:.2g}, it will simply be identical to calling np.set_printoptions(precision=2) first.

Gustav

On Wed, Feb 15, 2017 at 8:03 AM, Marten van Kerkwijk <[hidden email]> wrote:
Hi Gustav,

This is great!  A few quick comments (mostly echo-ing Stephan's).

1. You basically have a NEP already! Making a PR from it allows to
give line-by-line comments, so would help!

2. Don't worry about supporting python2 specifics; just try to ensure
it doesn't break; I would not say more about it!

3. On `set_printoptions` -- ideally, it will become possible to use
this as a context (i.e., `with set_printoption(...)`). It might make
sense to have an `override_format` keyword argument to it.

4. Otherwise, my main suggestion is to start small with the more
obvious ones, and not worry too much about format validation, but
rather about getting the simple ones to work well (e.g., for an object
array, just apply the format given; if it doesn't work, it will error
out on its own, which is OK).

5. One bit of detail: the "g" one does confuse me.

All the best,

Marten
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/numpy-discussion



_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Proposal to support __format__

Nathan Goldbaum


On Wed, Feb 15, 2017 at 4:05 PM, Ilhan Polat <[hidden email]> wrote:
On the last item, do we really have to follow that strange, `d`,`g` and so on conventions on formatting? With all respect to the humongous historical baggage, I think that notation is pretty archaic and terminal like. If being pythonic is of a concern here, maybe it is better to use a more verbose syntax. Just throwing out an idea after 15 seconds of thought (so by no means an alternative suggestion)

eng:6i5d -> engineering notation (always powers of ten of multiples of 3) 6 integral digits and 5 decimal digits.
float (whatever the default is)
float:4i2d (you get the idea)

etc.


While I agree with you that printf format codes are arcane, unfortunately they need to be used here since they are supported by Python:

 

FULL DISCLOSURE: I am a very displeased customer of `fprintf ` of matlab (and others) and this archaic formatting. I never got a hang of it so it might be the case that I don't quite get the rationale behind it and I almost always get it wrong. Maybe at least the rationale can be clarified.


Lastly, repeating what others mentioned: thank you for this well prepared initiative




On Wed, Feb 15, 2017 at 10:48 PM, Gustav Larsson <[hidden email]> wrote:
This is great!

Thanks! Glad to be met by enthusiasm about this.

1. You basically have a NEP already! Making a PR from it allows to
give line-by-line comments, so would help!

I will do this soon.

2. Don't worry about supporting python2 specifics; just try to ensure
it doesn't break; I would not say more about it!

Sounds good to me.

3. On `set_printoptions` -- ideally, it will become possible to use
this as a context (i.e., `with set_printoption(...)`). It might make
sense to have an `override_format` keyword argument to it.

Having a `with np.printoptions(...)` context manager is a great idea. It does sound orthogonal to __format__ though, so it could be addressed separately.

4. Otherwise, my main suggestion is to start small with the more
obvious ones, and not worry too much about format validation, but
rather about getting the simple ones to work well (e.g., for an object
array, just apply the format given; if it doesn't work, it will error
out on its own, which is OK).

Sounds good to me. I was thinking of approaching the implementation by writing unit tests first and group them into different priority tiers. That way, the unit tests can go through another review before implementation gets going. I agree that __format__ doesn't have to check format validation if a ValueError is going to be raised anyway by sub-calls.

5. One bit of detail: the "g" one does confuse me.

I will re-write this a bit to make it clearer. Basically, the 'g' with the mix of 'e'/'f' depending on max/min>1000 is all from the current numpy behavior, so it is not something I had much creative input on at all. Although, as it is written right now it may seem so. That is, the goal is to have {:} == {:g} for float arrays, analogous to how {:} == {:g} for built-in floats. Then, if the user departs a bit, like {:.2g}, it will simply be identical to calling np.set_printoptions(precision=2) first.

Gustav

On Wed, Feb 15, 2017 at 8:03 AM, Marten van Kerkwijk <[hidden email]> wrote:
Hi Gustav,

This is great!  A few quick comments (mostly echo-ing Stephan's).

1. You basically have a NEP already! Making a PR from it allows to
give line-by-line comments, so would help!

2. Don't worry about supporting python2 specifics; just try to ensure
it doesn't break; I would not say more about it!

3. On `set_printoptions` -- ideally, it will become possible to use
this as a context (i.e., `with set_printoption(...)`). It might make
sense to have an `override_format` keyword argument to it.

4. Otherwise, my main suggestion is to start small with the more
obvious ones, and not worry too much about format validation, but
rather about getting the simple ones to work well (e.g., for an object
array, just apply the format given; if it doesn't work, it will error
out on its own, which is OK).

5. One bit of detail: the "g" one does confuse me.

All the best,

Marten
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/numpy-discussion



_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/numpy-discussion



_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/numpy-discussion