Accepting NEP 42 — New and extensible DTypes

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Accepting NEP 42 — New and extensible DTypes

Sebastian Berg
Hi all,

after another thorough revision of NEP 42 (much thanks to Ben!), I
propose accepting the NEP, with the note that details are expected
change.

I am always happy to clarify and review the document based on feedback,
but I feel the important technical points should be very clear and
settled.
Exposing all of the proposed API may need additional detailed API
discussion. My focus is still a bit on the big picture design choices
that the NEP makes need to move forward and settle the implementation
internal to NumPy, although I am happy to discuss the details!

The title of the NEP is:

     NEP 42 — New and extensible DTypes

And available at:

     https://numpy.org/neps/nep-0042-new-dtypes.html

While enabling new user-defined DTypes is the main goal, the main work
is the internal restructure of NumPy's own DTypes necessary to allow
that.

I have pasted the "Abstract" and "Motivation and scope" section below,
which give a good overview of the issues and we are trying to address.
It is followed by the "Usage and impact" section which gives a big-
picture overview of the design.
I will refer to the full NEP for more detailed technical decisions and
explanations.

Cheers,

Sebastian


PS: In some places NEP 42 references NEP 43, for which I hope to merge
the draft soon, the current status is here:

     https://github.com/numpy/numpy/pull/16723

However, this should be mainly interested for those wishing to go into
more technical details.




***********************************************************************
*******
Abstract
***********************************************************************
*******

NumPy's dtype architecture is monolithic -- each dtype is an instance
of  a
single class. There's no principled way to expand it for new dtypes,
and the
code is difficult to read and maintain.

As :ref:`NEP 41 <NEP41>` explains, we are proposing a new architecture
that is
modular and open to user additions. dtypes will derive from a new
``DType``
class serving as the extension point for new types.
``np.dtype("float64")``
will return an instance of a ``Float64`` class, a subclass of root
class
``np.dtype``.

This NEP is one of two that lay out the design and API of this new
architecture. This NEP addresses dtype implementation; NEP 43 addresses
universal functions.

.. note::

    Details of the private and external APIs may change to reflect user
    comments and implementation constraints. The underlying principles
and
    choices should not change significantly.


***********************************************************************
*******
Motivation and scope
***********************************************************************
*******

Our goal is to allow user code to create fully featured dtypes for a
broad
variety of uses, from physical units (such as meters) to domain-
specific
representations of geometric objects. :ref:`NEP 41 <NEP41>` describes a
number
of these new dtypes and their benefits.

Any design supporting dtypes must consider:

- How shape and dtype are determined when an array is created
- How array elements are stored and accessed
- The rules for casting dtypes to other dtypes

In addition:

- We want dtypes to comprise a class hierarchy open to new types and to
  subhierarchies, as motivated in :ref:`NEP 41 <NEP41>`.

And to provide this,

- We need to define a user API.

All these are the subjects of this NEP.

- The class hierarchy, its relation to the Python scalar types, and its
  important attributes are described in `nep42_DType class`_.

- The functionality that will support dtype casting is described in
`Casting`_.

- The implementation of item access and storage, and the way shape and
dtype
  are determined when creating an array, are described in
:ref:`nep42_array_coercion`.

- The functionality for users to define their own DTypes is described
in
  `Public C-API`_.

The API here and in NEP 43 is entirely on the C side. A Python-side
version
will be proposed in a future NEP. A future Python API is expected to be
similar, but provide a more convenient API to reuse the functionality
of
existing DTypes. It could also provide shorthands to create structured
DTypes
similar to Python's
`dataclasses <https://docs.python.org/3.8/library/dataclasses.html>`_.


***********************************************************************
*******
Usage and impact
***********************************************************************
*******

We believe the few structures in this section are sufficient to
consolidate
NumPy's present functionality and also to support complex user-defined
DTypes.

The rest of the NEP fills in details and provides support for the
claim.

Again, though Python is used for illustration, the implementation is a
C API only; a
future NEP will tackle the Python API.

After implementing this NEP, creating a DType will be possible by
implementing
the following outlined DType base class,
that is further described in `nep42_DType class`_:

    class DType(np.dtype):
        type : type        # Python scalar type
        parametric : bool  # (may be indicated by superclass)

        @property
        def canonical(self) -> bool:
            raise NotImplementedError

        def ensure_canonical(self : DType) -> DType:
            raise NotImplementedError

For casting, a large part of the functionality is provided by the
"methods" stored
in ``_castingimpl``

        @classmethod
        def common_dtype(cls : DTypeMeta, other : DTypeMeta) ->
DTypeMeta:
            raise NotImplementedError

        def common_instance(self : DType, other : DType) -> DType:
            raise NotImplementedError

        # A mapping of "methods" each detailing how to cast to another
DType
        # (further specified at the end of the section)
        _castingimpl = {}

For array-coercion, also part of casting:

        def __dtype_setitem__(self, item_pointer, value):
            raise NotImplementedError

        def __dtype_getitem__(self, item_pointer, base_obj) -> object:
            raise NotImplementedError

        @classmethod
        def __discover_descr_from_pyobject__(cls, obj : object) ->
DType:
            raise NotImplementedError

        # initially private:
        @classmethod
        def _known_scalar_type(cls, obj : object) -> bool:
            raise NotImplementedError


Other elements of the casting implementation is the ``CastingImpl``:

    casting = Union["safe", "same_kind", "unsafe"]

    class CastingImpl:
        # Object describing and performing the cast
        casting : casting

        def resolve_descriptors(self, Tuple[DType] : input) ->
(casting, Tuple[DType]):
            raise NotImplementedError

        # initially private:
        def _get_loop(...) -> lowlevel_C_loop:
            raise NotImplementedError

which describes the casting from one DType to another. In
NEP 43 this ``CastingImpl`` object is used unchanged to
support universal functions.


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Accepting NEP 42 — New and extensible DTypes

Sebastian Berg
Hi all,


On Thu, 2020-10-08 at 07:51 -0500, Sebastian Berg wrote:

> Hi all,
>
> after another thorough revision of NEP 42 (much thanks to Ben!), I
> propose accepting the NEP, with the note that details are expected
> change.
>
> I am always happy to clarify and review the document based on
> feedback,
> but I feel the important technical points should be very clear and
> settled.
> Exposing all of the proposed API may need additional detailed API
> discussion. My focus is still a bit on the big picture design choices
> that the NEP makes need to move forward and settle the implementation
> internal to NumPy, although I am happy to discuss the details!
>
> The title of the NEP is:
>
>      NEP 42 — New and extensible DTypes
>
This has been a while ago, and a draft for NEP 43 (UFunc redesign) is
now available at:

    https://numpy.org/neps/nep-0043-extensible-ufuncs.html

I would appreciate any feedback and am happy to go into more details
where necessary. Do we have a consensus about the general big picture
API design or are there any concerns?

These documents outline (most importantly):

1. How DTypes should be created (NEP 42)
2. How Casting will be implemented (NEP 42)
3. How UFuncs will be redesigned:  (NEP 43)
   * This changes the calling convention
   * It also unifies casting largely with ufuncs
4. How ufunc promotion will be handled in the future: (NEP 43)
   * This is what happens when you add mixed types, for
     example float64 + int32 casts int32 to float64 and
     uses the float64 + float64 implementation.

Point 1. is finished to the extend currently necessary.

Right now I am basically finishing with Casting (point 2). And I expect
it to move forward very soon at least in part.
This does have a big overlap with UFuncs (point 3), though. So if you
are interested in that, it is a good time to dive in, even if many
details can still be changed easily for a while!

Cheers,

Sebastian


> And available at:
>
>      https://numpy.org/neps/nep-0042-new-dtypes.html
>
> While enabling new user-defined DTypes is the main goal, the main
> work
> is the internal restructure of NumPy's own DTypes necessary to allow
> that.
>
> I have pasted the "Abstract" and "Motivation and scope" section
> below,
> which give a good overview of the issues and we are trying to
> address.
> It is followed by the "Usage and impact" section which gives a big-
> picture overview of the design.
> I will refer to the full NEP for more detailed technical decisions
> and
> explanations.
>
> Cheers,
>
> Sebastian
>
>
> PS: In some places NEP 42 references NEP 43, for which I hope to
> merge
> the draft soon, the current status is here:
>
>      https://github.com/numpy/numpy/pull/16723
>
> However, this should be mainly interested for those wishing to go
> into
> more technical details.
>
>
>
>
> *********************************************************************
> **
> *******
> Abstract
> *********************************************************************
> **
> *******
>
> NumPy's dtype architecture is monolithic -- each dtype is an instance
> of  a
> single class. There's no principled way to expand it for new dtypes,
> and the
> code is difficult to read and maintain.
>
> As :ref:`NEP 41 <NEP41>` explains, we are proposing a new
> architecture
> that is
> modular and open to user additions. dtypes will derive from a new
> ``DType``
> class serving as the extension point for new types.
> ``np.dtype("float64")``
> will return an instance of a ``Float64`` class, a subclass of root
> class
> ``np.dtype``.
>
> This NEP is one of two that lay out the design and API of this new
> architecture. This NEP addresses dtype implementation; NEP 43
> addresses
> universal functions.
>
> .. note::
>
>     Details of the private and external APIs may change to reflect
> user
>     comments and implementation constraints. The underlying
> principles
> and
>     choices should not change significantly.
>
>
> *********************************************************************
> **
> *******
> Motivation and scope
> *********************************************************************
> **
> *******
>
> Our goal is to allow user code to create fully featured dtypes for a
> broad
> variety of uses, from physical units (such as meters) to domain-
> specific
> representations of geometric objects. :ref:`NEP 41 <NEP41>` describes
> a
> number
> of these new dtypes and their benefits.
>
> Any design supporting dtypes must consider:
>
> - How shape and dtype are determined when an array is created
> - How array elements are stored and accessed
> - The rules for casting dtypes to other dtypes
>
> In addition:
>
> - We want dtypes to comprise a class hierarchy open to new types and
> to
>   subhierarchies, as motivated in :ref:`NEP 41 <NEP41>`.
>
> And to provide this,
>
> - We need to define a user API.
>
> All these are the subjects of this NEP.
>
> - The class hierarchy, its relation to the Python scalar types, and
> its
>   important attributes are described in `nep42_DType class`_.
>
> - The functionality that will support dtype casting is described in
> `Casting`_.
>
> - The implementation of item access and storage, and the way shape
> and
> dtype
>   are determined when creating an array, are described in
> :ref:`nep42_array_coercion`.
>
> - The functionality for users to define their own DTypes is described
> in
>   `Public C-API`_.
>
> The API here and in NEP 43 is entirely on the C side. A Python-side
> version
> will be proposed in a future NEP. A future Python API is expected to
> be
> similar, but provide a more convenient API to reuse the functionality
> of
> existing DTypes. It could also provide shorthands to create
> structured
> DTypes
> similar to Python's
> `dataclasses <https://docs.python.org/3.8/library/dataclasses.html>`_
> .
>
>
> *********************************************************************
> **
> *******
> Usage and impact
> *********************************************************************
> **
> *******
>
> We believe the few structures in this section are sufficient to
> consolidate
> NumPy's present functionality and also to support complex user-
> defined
> DTypes.
>
> The rest of the NEP fills in details and provides support for the
> claim.
>
> Again, though Python is used for illustration, the implementation is
> a
> C API only; a
> future NEP will tackle the Python API.
>
> After implementing this NEP, creating a DType will be possible by
> implementing
> the following outlined DType base class,
> that is further described in `nep42_DType class`_:
>
>     class DType(np.dtype):
>         type : type        # Python scalar type
>         parametric : bool  # (may be indicated by superclass)
>
>         @property
>         def canonical(self) -> bool:
>             raise NotImplementedError
>
>         def ensure_canonical(self : DType) -> DType:
>             raise NotImplementedError
>
> For casting, a large part of the functionality is provided by the
> "methods" stored
> in ``_castingimpl``
>
>         @classmethod
>         def common_dtype(cls : DTypeMeta, other : DTypeMeta) ->
> DTypeMeta:
>             raise NotImplementedError
>
>         def common_instance(self : DType, other : DType) -> DType:
>             raise NotImplementedError
>
>         # A mapping of "methods" each detailing how to cast to
> another
> DType
>         # (further specified at the end of the section)
>         _castingimpl = {}
>
> For array-coercion, also part of casting:
>
>         def __dtype_setitem__(self, item_pointer, value):
>             raise NotImplementedError
>
>         def __dtype_getitem__(self, item_pointer, base_obj) ->
> object:
>             raise NotImplementedError
>
>         @classmethod
>         def __discover_descr_from_pyobject__(cls, obj : object) ->
> DType:
>             raise NotImplementedError
>
>         # initially private:
>         @classmethod
>         def _known_scalar_type(cls, obj : object) -> bool:
>             raise NotImplementedError
>
>
> Other elements of the casting implementation is the ``CastingImpl``:
>
>     casting = Union["safe", "same_kind", "unsafe"]
>
>     class CastingImpl:
>         # Object describing and performing the cast
>         casting : casting
>
>         def resolve_descriptors(self, Tuple[DType] : input) ->
> (casting, Tuple[DType]):
>             raise NotImplementedError
>
>         # initially private:
>         def _get_loop(...) -> lowlevel_C_loop:
>             raise NotImplementedError
>
> which describes the casting from one DType to another. In
> NEP 43 this ``CastingImpl`` object is used unchanged to
> support universal functions.
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion

signature.asc (849 bytes) Download Attachment