NEP 34 - deprecate automatic dtype=object on ragged arrays

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

NEP 34 - deprecate automatic dtype=object on ragged arrays

After a few iterations by reviewers, I would like to submit NEP 34 to
deprecate automatically using dtype=object for ragged arrays.

and an associated PR for the implementation




When users create arrays with sequences-of-sequences, they sometimes err in
matching the lengths of the nested sequences_, commonly called "ragged
arrays".  Here we will refer to them as ragged nested sequences.
Creating such
arrays via ``np.array([<ragged_nested_sequence>])`` with no ``dtype``
argument will today default to an ``object``-dtype array. Change the
behaviour to
raise a ``ValueError`` instead.

Motivation and Scope

Users who specify lists-of-lists when creating a `numpy.ndarray` via
``np.array`` may mistakenly pass in lists of different lengths. Currently we
accept this input and automatically create an array with
``dtype=object``. This
can be confusing, since it is rarely what is desired. Changing the automatic
dtype detection to never return ``object`` for ragged nested sequences
(defined as a
recursive sequence of sequences, where not all the sequences on the same
level have the same length) will force users who actually wish to create
``object`` arrays to specify that explicitly. Note that ``lists``,
and ``nd.ndarrays`` are all sequences [0]_. See for instance `issue 5303`_.

Usage and Impact

After this change, array creation with ragged nested sequences must
define a dtype:

     >>> np.array([[1, 2], [1]])
     ValueError: cannot guess the desired dtype from the input

     >>> np.array([[1, 2], [1]], dtype=object)
     # succeeds, with no change from current behaviour

The deprecation will affect any call that internally calls
``np.asarray``.  For
instance, the ``assert_equal`` family of functions calls ``np.asarray``, so
users will have to change code like::

     np.assert_equal(a, [[1, 2], 3])


     np.assert_equal(a, np.array([[1, 2], 3], dtype=object)

NumPy-Discussion mailing list
[hidden email]