data exchange format

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

data exchange format

Gary Pajer-2
I want to store data in a way that can be read by a C or Matlab program.

Not too much data, not too complicated:  a dozen or so floats, a few
integers, a few strings, and a (3, x) numpy array where typically 500
< x < 30000.

I was about to create my own format for storage when it occurred to me
that I might want to use XML or some other standard format.  Like
JSON, perhaps.   Can anyone comment, esp relating to numpy
implementation issues, or offer suggestions?

Thanks,
Gary
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: data exchange format

Gabriel Beckers-2
PyTables is an efficient way of doing it (http://www.pytables.org). You
essentially write data to a HDF5 file, which is portable and can be read
in Matlab or in a C program (using the HDF5 library).

Gabriel

On Tue, 2008-05-20 at 09:32 -0400, Gary Pajer wrote:

> I want to store data in a way that can be read by a C or Matlab program.
>
> Not too much data, not too complicated:  a dozen or so floats, a few
> integers, a few strings, and a (3, x) numpy array where typically 500
> < x < 30000.
>
> I was about to create my own format for storage when it occurred to me
> that I might want to use XML or some other standard format.  Like
> JSON, perhaps.   Can anyone comment, esp relating to numpy
> implementation issues, or offer suggestions?
>
> Thanks,
> Gary
> _______________________________________________
> Numpy-discussion mailing list
> [hidden email]
> http://projects.scipy.org/mailman/listinfo/numpy-discussion


_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: data exchange format

Gary Pajer-2
On Tue, May 20, 2008 at 10:26 AM, Gabriel J.L. Beckers
<[hidden email]> wrote:
> PyTables is an efficient way of doing it (http://www.pytables.org). You
> essentially write data to a HDF5 file, which is portable and can be read
> in Matlab or in a C program (using the HDF5 library).
>
> Gabriel

I thought about that.  It seems to have much more than I need, so I
wonder if it's got more overhead / less speed / more complex API than
I need.   But big isn't necessarily bad, but it might be.  Is pytables
overkill?


>
> On Tue, 2008-05-20 at 09:32 -0400, Gary Pajer wrote:
>> I want to store data in a way that can be read by a C or Matlab program.
>>
>> Not too much data, not too complicated:  a dozen or so floats, a few
>> integers, a few strings, and a (3, x) numpy array where typically 500
>> < x < 30000.
>>
>> I was about to create my own format for storage when it occurred to me
>> that I might want to use XML or some other standard format.  Like
>> JSON, perhaps.   Can anyone comment, esp relating to numpy
>> implementation issues, or offer suggestions?
>>
>> Thanks,
>> Gary
>> _______________________________________________
>> Numpy-discussion mailing list
>> [hidden email]
>> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>
>
> _______________________________________________
> Numpy-discussion mailing list
> [hidden email]
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: data exchange format

Charles R Harris


On Tue, May 20, 2008 at 10:11 AM, Gary Pajer <[hidden email]> wrote:
On Tue, May 20, 2008 at 10:26 AM, Gabriel J.L. Beckers
<[hidden email]> wrote:
> PyTables is an efficient way of doing it (http://www.pytables.org). You
> essentially write data to a HDF5 file, which is portable and can be read
> in Matlab or in a C program (using the HDF5 library).
>
> Gabriel

I thought about that.  It seems to have much more than I need, so I
wonder if it's got more overhead / less speed / more complex API than
I need.   But big isn't necessarily bad, but it might be.  Is pytables
overkill?

PyTables is a nice bit of software and is worth getting familiar with if you want portable data. It will solve issues of endianess, annotation, and data organization, which can all be important, especially if your data sits around for a while and you forget exactly what's in it. Both Matlab and IDL support reading HDF5 files.

Chuck



_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: data exchange format

Gabriel Beckers-2
In reply to this post by Gary Pajer-2
I am not exactly an expert on data storage, but I use PyTables a lot for
all kinds of scientific data sets and am very happy with it. Indeed it
has many advanced capabilities; so it may seem overkill at first glance.
But for simple tasks such as the one you describe the api is simple;
indeed I also use it for small data sets because it is such a quick way
of storing data in a portable way. Regarding speed and overhead: I don't
know in general what the penalties or gains are for very small files. On
my system an empty file is 1032 bytes, and if I fill it with an array of
3 by 30000 random float64's it is 723080. Not so bad.

Just try it out yourself:

>>> import numpy, tables
>>> ta = numpy.random.random((3,30000))
>>> f = tables.openFile('test.h5','w')
>>> f.createArray('/','testarray',ta)
>>> f.close()

With most real data file size can be smaller because you have the
option of enabling compression.

But I must admit that I haven't tried reading HDF5 in Matlab or C (and
never will); I know it is possible, but I don't know how difficult it
is.

Cheers, Gabriel

On Tue, 2008-05-20 at 12:11 -0400, Gary Pajer wrote:

> On Tue, May 20, 2008 at 10:26 AM, Gabriel J.L. Beckers
> <[hidden email]> wrote:
> > PyTables is an efficient way of doing it (http://www.pytables.org). You
> > essentially write data to a HDF5 file, which is portable and can be read
> > in Matlab or in a C program (using the HDF5 library).
> >
> > Gabriel
>
> I thought about that.  It seems to have much more than I need, so I
> wonder if it's got more overhead / less speed / more complex API than
> I need.   But big isn't necessarily bad, but it might be.  Is pytables
> overkill?
>
>
> >
> > On Tue, 2008-05-20 at 09:32 -0400, Gary Pajer wrote:
> >> I want to store data in a way that can be read by a C or Matlab program.
> >>
> >> Not too much data, not too complicated:  a dozen or so floats, a few
> >> integers, a few strings, and a (3, x) numpy array where typically 500
> >> < x < 30000.
> >>
> >> I was about to create my own format for storage when it occurred to me
> >> that I might want to use XML or some other standard format.  Like
> >> JSON, perhaps.   Can anyone comment, esp relating to numpy
> >> implementation issues, or offer suggestions?
> >>
> >> Thanks,
> >> Gary
> >> _______________________________________________
> >> Numpy-discussion mailing list
> >> [hidden email]
> >> http://projects.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
> > _______________________________________________
> > Numpy-discussion mailing list
> > [hidden email]
> > http://projects.scipy.org/mailman/listinfo/numpy-discussion
> >
> _______________________________________________
> Numpy-discussion mailing list
> [hidden email]
> http://projects.scipy.org/mailman/listinfo/numpy-discussion


_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: data exchange format

Rob Hetland
In reply to this post by Gary Pajer-2

On May 20, 2008, at 6:11 PM, Gary Pajer wrote:

> I thought about that.  It seems to have much more than I need, so I
> wonder if it's got more overhead / less speed / more complex API than
> I need.   But big isn't necessarily bad, but it might be.  Is pytables
> overkill?


I use netCDF (which uses the HDF5 libraries netCDF4).  NetCDF is good  
for large, gridded datasets, where the grid does not change in time.  
For my datasets (numerical ocean models), this format is perfect.

HDF is more general and flexible, but a bit more complex.  Take a look  
at the netcdf4-python.googlecode.com package, if you are interested.

-Rob


_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion