Proposal: add the timestamp64 type

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Proposal: add the timestamp64 type

Noam Yorav-Raphael
Hi,

(I'm repeating things I wrote under the "datetime64: Remove deprecation warning..." thread, since I'm now proposing a new solution.)

I propose to add a new type called "timestamp64". It will be a pure timestamp, meaning that it represents a moment in time (as seconds/ms/us/ns since the epoch), without any timezone information. It will have the exact same behavior as datetime64 had before version 1.11, except that its only allowed units will be seconds, milliseconds, microseconds and nanoseconds. Removing the longer units will make it clear that it doesn't deal with calendar and dates. Also, all the business day functionality will not be applicable to timestamp64. In order to get calendar information (such as the year) from timestamp64, you will have to manually convert it to python's datetime (or perhaps to np.datetime64) with an explicit timezone (utc, local, an offset, or a timezone object).

This is needed because since the change introduced in 1.11, datetime64 no longer represents a timestamp, but rather a date and time of an abstract calendar. So given a datetime64, it is not possible to get an actual timestamp without knowing the timezone to which the datetime64 refers. If the datetime64 is in a timezone with daylight saving time, it can even be ambiguous, since the same written hour will occur twice on the transition from DST to winter time.

I would like it to work like this:

>>> np.timestamp64.now()
numpy.timestamp64('2020-11-07 22:42:52.871159+0200')

>>> np.timestamp64.now('s')
numpy.timestamp64('2020-11-07 22:42:52+0200')

>>> np.timestamp64(1604781916, 's')
numpy.timestamp64('2020-11-07 22:42:52+0200')

>>> np.timestamp64('2020-11-07 20:42:52Z')
numpy.timestamp64('2020-11-07 22:42:52+0200')

* timestamp64.now() will get an optional string argument with the base units. If not given, I think 'us' is a good default.
* The repr will format the timestamp using the environment's timezone.
* I like the repr to not include a 'T' between the date and the time. I find it much easier to read.
* I tend to think that it should be allowed to construct a timestamp64 from an ISO8601 string without a timezone offset, in which case the environment's timezone will be used to convert it to a timestamp. So in the Asia/Jerusalem timezone it will look like:

>>> np.timestamp64('2020-11-07 22:42:52')
numpy.timestamp64('2020-11-07 22:42:52+0200')

>>> np.timestamp64('2020-08-01 22:00:00')
numpy.timestamp64('2020-08-01 22:00:00+0300')


If I implement this, could it be added to numpy?


Thanks,
Noam

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: add the timestamp64 type

Noam Yorav-Raphael
I added discussing my proposal to the upcoming meeting agenda.

I thought of a refinement. Since numpy data types don't have static methods, instead of using "timestamp64.now()" it could be another function of the constructor. So timestamp64() will return the current timestamp in microseconds, and timestamp64('s'), timestamp64('ms'), timestamp64('us') and timestamp64('ns') will return the current timestamp in the given unit. This makes the interface even simpler!

Cheers,
Noam



On Sat, Nov 7, 2020 at 10:57 PM Noam Yorav-Raphael <[hidden email]> wrote:
Hi,

(I'm repeating things I wrote under the "datetime64: Remove deprecation warning..." thread, since I'm now proposing a new solution.)

I propose to add a new type called "timestamp64". It will be a pure timestamp, meaning that it represents a moment in time (as seconds/ms/us/ns since the epoch), without any timezone information. It will have the exact same behavior as datetime64 had before version 1.11, except that its only allowed units will be seconds, milliseconds, microseconds and nanoseconds. Removing the longer units will make it clear that it doesn't deal with calendar and dates. Also, all the business day functionality will not be applicable to timestamp64. In order to get calendar information (such as the year) from timestamp64, you will have to manually convert it to python's datetime (or perhaps to np.datetime64) with an explicit timezone (utc, local, an offset, or a timezone object).

This is needed because since the change introduced in 1.11, datetime64 no longer represents a timestamp, but rather a date and time of an abstract calendar. So given a datetime64, it is not possible to get an actual timestamp without knowing the timezone to which the datetime64 refers. If the datetime64 is in a timezone with daylight saving time, it can even be ambiguous, since the same written hour will occur twice on the transition from DST to winter time.

I would like it to work like this:

>>> np.timestamp64.now()
numpy.timestamp64('2020-11-07 22:42:52.871159+0200')

>>> np.timestamp64.now('s')
numpy.timestamp64('2020-11-07 22:42:52+0200')

>>> np.timestamp64(1604781916, 's')
numpy.timestamp64('2020-11-07 22:42:52+0200')

>>> np.timestamp64('2020-11-07 20:42:52Z')
numpy.timestamp64('2020-11-07 22:42:52+0200')

* timestamp64.now() will get an optional string argument with the base units. If not given, I think 'us' is a good default.
* The repr will format the timestamp using the environment's timezone.
* I like the repr to not include a 'T' between the date and the time. I find it much easier to read.
* I tend to think that it should be allowed to construct a timestamp64 from an ISO8601 string without a timezone offset, in which case the environment's timezone will be used to convert it to a timestamp. So in the Asia/Jerusalem timezone it will look like:

>>> np.timestamp64('2020-11-07 22:42:52')
numpy.timestamp64('2020-11-07 22:42:52+0200')

>>> np.timestamp64('2020-08-01 22:00:00')
numpy.timestamp64('2020-08-01 22:00:00+0300')


If I implement this, could it be added to numpy?


Thanks,
Noam

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Proposal: add the timestamp64 type

Stefano Miccoli
In reply to this post by Noam Yorav-Raphael
Discussion on time is endless! (Sorry for the extra noise, on the mailing list, but I would clarify some points.)

If I got it right, np.datetime64 is defined by these points.

1) Internal representation: 64bit signed integer *plus* a time unit. The time unit can be expressed as
- a SI valid unit (SI second and all decimal subunits up to the attosecond)
- a SI acceptable unit (minute, hour, day)
- a date unit (week, month, year)

2) Conversion routines: a bijective map from the internal representation to a proleptic Gregorian calendar [0] assuming a fixed epoch of 1970-01-01T00:00Z. The mapping neglects leap seconds and is not time-zone aware.

I think that the current choice of 2) is a sensible one: I agree with Dan that it is useful to a wide audience, easy to compute, not ambiguous.

I would discourage any attempt to implement in numpy more complex mappings, which are aware of time-zones and leap seconds, and why not, of the wide array of other time scales and time representation in use: this is a very complex task, and a nightmare from the point of view of maintenance. Other specialised libraries exist, like astropy.time [1] or dateutil [2] to this purpose.

However the docs of numpy.datetime64 should be updated, to explicitly mention the use of the proleptic Gregorian calendar, and better clarify how the date units (month, year) are handled when casted to other shorter units like seconds, etc.

Stefano



_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion