datetime64/timedelta64 support in linspace

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

datetime64/timedelta64 support in linspace

Lee Johnston
I propose adding support for datetime64/timedelta64 in linspace and solicit feedback on the feature. As is, linspace raises UFuncTypeError when parameters start and stop are datetime64/timedelta64. The complementary function arange supports these types. Work was started on this feature in PR 14700 but has stalled and I would like to complete it, but there are some issues worth getting feedback on.
  1. Supporting datetime64/timedelta64 will require a special case code path within linspace. The code path is selected based on the start parameter data type.
  2. The output dtype has to be explicitly set.
  3. The step size resolution is determined by the lesser resolution of start and dtype.
Issue 3 may lead to an unexpected result for an end-user. For example,

>>> import numpy as np
>>> np.linspace(np.timedelta64(0, "s"), np.timedelta64(1, "s"), 4, dtype="timedelta64[ms]")
array([   0,    0,    0, 1000], dtype='timedelta64[ms]')

The existing solution in PR 14700 does not override the end-user's start and dtype resolution. In this case, the end-user would have to set both start and dtype to "ms" resolution to get the expected result.

>>> np.linspace(np.timedelta64(0, "ms"), np.timedelta64(1, "s"), 4, dtype="timedelta64[ms]")
array([   0,  333,  666, 1000], dtype='timedelta64[ms]')

In PR 14700, there is some discussion of "NaT" handling. In my implementation, "NaT" works the same as "NaN" and I am not aware of any corner cases.

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: datetime64/timedelta64 support in linspace

Sebastian Berg
On Sat, 2020-09-26 at 09:52 -0500, Lee Johnston wrote:

> I propose adding support for datetime64/timedelta64 in linspace and
> solicit
> feedback on the feature. As is, linspace raises UFuncTypeError when
> parameters start and stop are datetime64/timedelta64. The
> complementary
> function arange supports these types. Work was started on this
> feature in PR
> 14700 <https://github.com/numpy/numpy/pull/14700> but has stalled and
> I
> would like to complete it, but there are some issues worth getting
> feedback
> on.
>
>    1. Supporting datetime64/timedelta64 will require a special case
> code
>    path within linspace. The code path is selected based on the start
>    parameter data type.
>    2. The output dtype has to be explicitly set.
>    3. The step size resolution is determined by the lesser resolution
> of
>    start and dtype.
>
> Issue 3 may lead to an unexpected result for an end-user. For
> example,
>
> > > > import numpy as np
> > > > np.linspace(np.timedelta64(0, "s"), np.timedelta64(1, "s"), 4,
> dtype="timedelta64[ms]")
> array([   0,    0,    0, 1000], dtype='timedelta64[ms]')
>
> The existing solution in PR 14700 does not override the end-user's
> start
> and dtype resolution. In this case, the end-user would have to set
> both
> start and dtype to "ms" resolution to get the expected result.
>
> > > > np.linspace(np.timedelta64(0, "ms"), np.timedelta64(1, "s"), 4,
> dtype="timedelta64[ms]")
> array([   0,  333,  666, 1000], dtype='timedelta64[ms]')
Thanks for taking the time and looking into this!

Can you explain why your solution of using the input units to represent
the step size is better then using the provided one?
If this turns out tricky, we could also make the rule: cast everything
to a single unit (as long as the cast is considered "safe"), that may
force the user to do the cast in the long run, but I maybe most users
are not dealing with a mix of units here to begin with?

The approach in the last state of the PR, had issues with the
timedelta/datetime equivalent of:

    >>> np.diff(np.linspace(0, 1000, 33, dtype='int64'))
    array([31, 31, 31, 32, 31, 31, 31, 32, 31, 31, 31, 32, 31,
           31, 31, 32, 31, 31, 31, 32, 31, 31, 31, 32, 31, 31,
           31, 32, 31, 31, 31, 32])

which has an uneven step size that was not spread out (note the 32
values).  I assume you have a solution for that?

Maybe it is best if you can just pick up the PR and create a new one
(if possible pull in the existing commits, or tests for attribution as
well), so we can discuss easier reading the tests.


>
> In PR 14700, there is some discussion of "NaT" handling. In my
> implementation, "NaT" works the same as "NaN" and I am not aware of
> any
> corner cases.

There may not be, I think this had to do with how we approached certain
difficulties in the PR (around viewing as int64 or using floats,
probably).  We just should make sure to have tests for both start and
end being NaT.
Maybe NaT is not a big issue, because we can probably add an explicit
code path if necessary.

Cheers,

Sebastian

> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (849 bytes) Download Attachment