Possible

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Possible

Michal Radwanski
Hello,

I'm not sure if it's expected behaviour or a bug, so I decided to write
here. First an example:
In [4]: array([2**63])
Out[4]: array([9223372036854775808], dtype=uint64)

In [5]: array([2**63-1, 2**63])
Out[5]: array([9.22337204e+18, 9.22337204e+18])


The docs for `numpy.array` mention, that:

dtype : data-type, optional
 The desired data-type for the array. If not given, then the type
 will be determined as the minimum type required to hold 
 the objects in the sequence.

I understand the type promotions here, but I believe that the
documentation is wrong in this case. Indeed, the minumum type in the
latter case would be 'uint64'.

Is it a bug worth submitting/fixing?


--
Z wyrazami szacunku
Michał Radwański


With kind regards
Michał Radwański

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Possible

Michal Radwanski
Sorry for bad naming, I hit the "send" button too fast. Intended title:
"Possible documentation bug for numpy.array".

Jatorrizko mezua: al., 2021-03-01 01:30 +0100, egilea: Michal Radwanski

> Hello,
>
> I'm not sure if it's expected behaviour or a bug, so I decided to
> write
> here. First an example:
> In [4]: array([2**63])
> Out[4]: array([9223372036854775808], dtype=uint64)
>
> In [5]: array([2**63-1, 2**63])
> Out[5]: array([9.22337204e+18, 9.22337204e+18])
>
>
> The docs for `numpy.array` mention, that:
>
> dtype : data-type, optional
>  The desired data-type for the array. If not given, then the type
>  will be determined as the minimum type required to hold 
>  the objects in the sequence.
>
> I understand the type promotions here, but I believe that the
> documentation is wrong in this case. Indeed, the minumum type in the
> latter case would be 'uint64'.
>
> Is it a bug worth submitting/fixing?
>
>

--
Z wyrazami szacunku
Michał Radwański


With kind regards
Michał Radwański

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Possible documentation bug for numpy.array

Sebastian Berg
In reply to this post by Michal Radwanski
On Mon, 2021-03-01 at 01:30 +0100, Michal Radwanski wrote:

> Hello,
>
> I'm not sure if it's expected behaviour or a bug, so I decided to
> write
> here. First an example:
> In [4]: array([2**63])
> Out[4]: array([9223372036854775808], dtype=uint64)
>
> In [5]: array([2**63-1, 2**63])
> Out[5]: array([9.22337204e+18, 9.22337204e+18])
>
>
Thanks, this is a known issue, e.g.:
https://github.com/numpy/numpy/issues/14883
and https://github.com/numpy/numpy/issues/16287

Currently, my view is that trying to "fix" it so that the result is
truly minimal is probably doomed to introduce unnecessary complexity
and/or will just make the oddities slightly more hard to find.

Instead, my stance is that we should be to refuse to guess anything
beside the "default integer" users pass in integers.  That would
probably mean you get an error that `2**63` cannot be represented by
`int64` forcing you to be explicit about the dtype you expect.
(In the long run, it might also return an `object` array. [1])


With regards to the documentation... `np.array` promotes inputs as they
come in (depth first currently). I.e. in a "left-to-right" fashion.
That basically means, that you are right and "minimal" will not always
be true, due to our promotion rules.
But the bigger confusion is that Python Integers are mapped to NumPy
dtypes by finding the first one in the following list which can
represent the value:

  * C long: int64 on 64bit linux/mac, otherwise (all windows!) int32
  * C long long: int64 on all relevant platforms AFAIK
  * C unsigned long long: uint64 on all relevant platforms AFAIK
  * object

Which is an attempt at "minimal" of course.  If we have an idea how to
capture especially this integer behaviour in the docs, that may be a
good idea.  (The way the promotion is done also breaks the "minimal"
claim, but that is much more subtle.)

Cheers,

Sebastian


[1] However, before that happens, we may also consider an API where you
have to explicitly allow the `np.array` call to fall back to `object`
in cases where promotion fails – including this case. I.e. with
something like:

    np.array(..., dtype="allow-object-fallback")  # of course shorter

(I can't find the issue about it right now, there is at least one where
this was discussed.)


> The docs for `numpy.array` mention, that:
>
> dtype : data-type, optional
>  The desired data-type for the array. If not given, then the type
>  will be determined as the minimum type required to hold 
>  the objects in the sequence.
>
> I understand the type promotions here, but I believe that the
> documentation is wrong in this case. Indeed, the minumum type in the
> latter case would be 'uint64'.
>
> Is it a bug worth submitting/fixing?
>
>

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (849 bytes) Download Attachment