After a few iterations by reviewers, I would like to submit NEP 34 to

deprecate automatically using dtype=object for ragged arrays.

https://github.com/numpy/numpy/pull/14674and an associated PR for the implementation

https://github.com/numpy/numpy/pull/14794Commments?

Matti

Abstract

--------

When users create arrays with sequences-of-sequences, they sometimes err in

matching the lengths of the nested sequences_, commonly called "ragged

arrays". Here we will refer to them as ragged nested sequences.

Creating such

arrays via ``np.array([<ragged_nested_sequence>])`` with no ``dtype``

keyword

argument will today default to an ``object``-dtype array. Change the

behaviour to

raise a ``ValueError`` instead.

Motivation and Scope

--------------------

Users who specify lists-of-lists when creating a `numpy.ndarray` via

``np.array`` may mistakenly pass in lists of different lengths. Currently we

accept this input and automatically create an array with

``dtype=object``. This

can be confusing, since it is rarely what is desired. Changing the automatic

dtype detection to never return ``object`` for ragged nested sequences

(defined as a

recursive sequence of sequences, where not all the sequences on the same

level have the same length) will force users who actually wish to create

``object`` arrays to specify that explicitly. Note that ``lists``,

``tuples``,

and ``nd.ndarrays`` are all sequences [0]_. See for instance `issue 5303`_.

Usage and Impact

----------------

After this change, array creation with ragged nested sequences must

explicitly

define a dtype:

>>> np.array([[1, 2], [1]])

ValueError: cannot guess the desired dtype from the input

>>> np.array([[1, 2], [1]], dtype=object)

# succeeds, with no change from current behaviour

The deprecation will affect any call that internally calls

``np.asarray``. For

instance, the ``assert_equal`` family of functions calls ``np.asarray``, so

users will have to change code like::

np.assert_equal(a, [[1, 2], 3])

to::

np.assert_equal(a, np.array([[1, 2], 3], dtype=object)

_______________________________________________

NumPy-Discussion mailing list

[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion