Structured array creation with list of lists and others

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Structured array creation with list of lists and others

Kirill Balunov
It was the first time I tried to create a structured array in numpy. Usually I use pandas for heterogeneous arrays, but it is one more dependency to my project.

It took me some time (really, much more than some), to understand the problem with structured array creation. As example:

I had list of list of this kind:
b=[[ 1, 10.3, 12.1, 2.12 ],...]

And tried:
np.array(b, dtype='i4,f4,f4,f4')

Which raises some weird exception:
TypeError
: a bytes-like object is required, not 'int'

Two hours later I found that I need list of tuples. I didn't find any help in documentation and could not realize that the problem with the inner lists...

Why there is such restriction - 'list of tuples' to create structured array? What is the idea behind that, why not list of lists, or tuple of lists or ...?

Also the exception does not help at all...
p.s.: It looks like that dtype also accepts only list of tuples. But I can not catch the idea for this restrictions.

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Structured array creation with list of lists and others

Slavin, Jonathan
Hi Kirill,

​T​
he idea is that each tuple assigns a name to the field and a data type.  There are a variety of ways to create structured arrays but they all involve giving both a name and data type to each field (I think).  See https://docs.scipy.org/doc/numpy/user/basics.rec.html

​Jon​

On Fri, Mar 24, 2017 at 5:09 AM,  <[hidden email]> wrote:

> From: Kirill Balunov <[hidden email]>
> To: [hidden email]
> Cc:
> Bcc:
> Date: Thu, 23 Mar 2017 21:16:28 +0300
> Subject: [Numpy-discussion] Structured array creation with list of lists and others
> It was the first time I tried to create a structured array in numpy. Usually I use pandas for heterogeneous arrays, but it is one more dependency to my project.
>
> It took me some time (really, much more than some), to understand the problem with structured array creation. As example:
>
> I had list of list of this kind:
> b=[[ 1, 10.3, 12.1, 2.12 ],...]
>
> And tried:
> np.array(b, dtype='i4,f4,f4,f4')
>
> Which raises some weird exception:
> TypeError: a bytes-like object is required, not 'int'
>
> Two hours later I found that I need list of tuples. I didn't find any help in documentation and could not realize that the problem with the inner lists...
>
> Why there is such restriction - 'list of tuples' to create structured array? What is the idea behind that, why not list of lists, or tuple of lists or ...?
>
> Also the exception does not help at all...
> p.s.: It looks like that dtype also accepts only list of tuples. But I can not catch the idea for this restrictions.




--
________________________________________________________
Jonathan D. Slavin                 Harvard-Smithsonian CfA
[hidden email]       60 Garden Street, MS 83
phone: (617) 496-7981       Cambridge, MA 02138-1516
cell: (781) 363-0035             USA
________________________________________________________

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Structured array creation with list of lists and others

Allan Haldane
In reply to this post by Kirill Balunov
On 03/23/2017 02:16 PM, Kirill Balunov wrote:

> It was the first time I tried to create a structured array in numpy.
> Usually I use pandas for heterogeneous arrays, but it is one more
> dependency to my project.
>
> It took me some time (really, much more than some), to understand the
> problem with structured array creation. As example:
>
> I had list of list of this kind:
> b=[[ 1, 10.3, 12.1, 2.12 ],...]
>
> And tried:
> np.array(b, dtype='i4,f4,f4,f4')
>
> Which raises some weird exception:
> TypeError: a bytes-like object is required, not 'int'
>
> Two hours later I found that I need list of tuples. I didn't find any help
> in documentation and could not realize that the problem with the inner
> lists...
>
> Why there is such restriction - 'list of tuples' to create structured
> array? What is the idea behind that, why not list of lists, or tuple of
> lists or ...?
>
> Also the exception does not help at all...
> p.s.: It looks like that dtype also accepts only list of tuples. But I can
> not catch the idea for this restrictions.
>

The problem is that numpy needs to distinguish between multidimensional
arrays and structured elements. A "list of lists" will often trigger
numpy's broadcasting rules, which is not what you want here.

For instance, should numpy interpret your input list as a 2d array of
dimension Lx4 containing integer elements, or a 1d array of length L of
structs with 4 fields?

In this particular case maybe numpy could, in principle, figure it out
from what you gave it by calculating that the innermost dimension is
the same length as the number of fields. But there are other cases (such
as assignment) where similar ambiguities arise that are harder to
resolve. So to preserve our sanity we want to require that structures be
formatted as tuples all the time.

I have a draft of potential updated structured array docs you can read here:
https://gist.github.com/ahaldane/7d1873d33d4d0f80ba7a54ccf1052eee

See the section "Assignment from Python Native Types (Tuples)", which
hopefully better warns that tuples are needed. Let me know if you think
something is missing from the draft.

(WARNING: the section about multi-field assignment in the doc draft is
incorrect for current numpy - that's what I'm proposing for the next
release. The rest of the docs are accurate for current numpy)

Agreed that the error message could be changed.

Allan

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Structured array creation with list of lists and others

Kirill Balunov
Allan thank you for your draft! I agree with you that  (not in mine ) in general case it would be hard to resolve all corner cases. Also I think if someone read numpy reference linearly, he/she will have some insight that list of tuples are necessary (but it was not my case).

For me one problem is that in some cases numpy allows a lot freedom, but in other it is unnecessarily strict. Another one is exception messages (but this is certainly subjective).



2017-03-24 19:48 GMT+03:00 Allan Haldane <[hidden email]>:
On 03/23/2017 02:16 PM, Kirill Balunov wrote:
> It was the first time I tried to create a structured array in numpy.
> Usually I use pandas for heterogeneous arrays, but it is one more
> dependency to my project.
>
> It took me some time (really, much more than some), to understand the
> problem with structured array creation. As example:
>
> I had list of list of this kind:
> b=[[ 1, 10.3, 12.1, 2.12 ],...]
>
> And tried:
> np.array(b, dtype='i4,f4,f4,f4')
>
> Which raises some weird exception:
> TypeError: a bytes-like object is required, not 'int'
>
> Two hours later I found that I need list of tuples. I didn't find any help
> in documentation and could not realize that the problem with the inner
> lists...
>
> Why there is such restriction - 'list of tuples' to create structured
> array? What is the idea behind that, why not list of lists, or tuple of
> lists or ...?
>
> Also the exception does not help at all...
> p.s.: It looks like that dtype also accepts only list of tuples. But I can
> not catch the idea for this restrictions.
>

The problem is that numpy needs to distinguish between multidimensional
arrays and structured elements. A "list of lists" will often trigger
numpy's broadcasting rules, which is not what you want here.

For instance, should numpy interpret your input list as a 2d array of
dimension Lx4 containing integer elements, or a 1d array of length L of
structs with 4 fields?

In this particular case maybe numpy could, in principle, figure it out
from what you gave it by calculating that the innermost dimension is
the same length as the number of fields. But there are other cases (such
as assignment) where similar ambiguities arise that are harder to
resolve. So to preserve our sanity we want to require that structures be
formatted as tuples all the time.

I have a draft of potential updated structured array docs you can read here:
https://gist.github.com/ahaldane/7d1873d33d4d0f80ba7a54ccf1052eee

See the section "Assignment from Python Native Types (Tuples)", which
hopefully better warns that tuples are needed. Let me know if you think
something is missing from the draft.

(WARNING: the section about multi-field assignment in the doc draft is
incorrect for current numpy - that's what I'm proposing for the next
release. The rest of the docs are accurate for current numpy)

Agreed that the error message could be changed.

Allan

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Loading...