How to limit the numpy.memmap's RAM usage?

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

How to limit the numpy.memmap's RAM usage?

LittleBigBrain
Hi everyone,
I noticed the numpy.memmap using RAM to buffer data from memmap files.
If I get a 100GB array in a memmap file and process it block by block,
the RAM usage is going to increasing with the process running until
there is no available space in RAM (4GB), even though the block size is
only 1MB.
for example:
####
a = numpy.memmap(‘a.bin’, dtype='float64', mode='r')
blocklen=1e5
b=npy.zeros((len(a)/blocklen,))
for i in range(0,len(a)/blocklen):
b[i]=npy.mean(a[i*blocklen:(i+1)*blocklen])
####
Is there any way to restrict the memory usage in numpy.memmap?

LittleBigBrain

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: How to limit the numpy.memmap's RAM usage?

David Cournapeau
2010/10/23 braingateway <[hidden email]>:

> Hi everyone,
> I noticed the numpy.memmap using RAM to buffer data from memmap files.
> If I get a 100GB array in a memmap file and process it block by block,
> the RAM usage is going to increasing with the process running until
> there is no available space in RAM (4GB), even though the block size is
> only 1MB.
> for example:
> ####
> a = numpy.memmap(‘a.bin’, dtype='float64', mode='r')
> blocklen=1e5
> b=npy.zeros((len(a)/blocklen,))
> for i in range(0,len(a)/blocklen):
> b[i]=npy.mean(a[i*blocklen:(i+1)*blocklen])
> ####
> Is there any way to restrict the memory usage in numpy.memmap?

The whole point of using memmap is to let the OS do the buffering for
you (which is likely to do a better job than you in many cases). Which
OS are you using ? And how do you measure how much memory is taken by
numpy for your array ?

David
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: How to limit the numpy.memmap's RAM usage?

LittleBigBrain
David Cournapeau :

> 2010/10/23 braingateway <[hidden email]>:
>  
>> Hi everyone,
>> I noticed the numpy.memmap using RAM to buffer data from memmap files.
>> If I get a 100GB array in a memmap file and process it block by block,
>> the RAM usage is going to increasing with the process running until
>> there is no available space in RAM (4GB), even though the block size is
>> only 1MB.
>> for example:
>> ####
>> a = numpy.memmap(‘a.bin’, dtype='float64', mode='r')
>> blocklen=1e5
>> b=npy.zeros((len(a)/blocklen,))
>> for i in range(0,len(a)/blocklen):
>> b[i]=npy.mean(a[i*blocklen:(i+1)*blocklen])
>> ####
>> Is there any way to restrict the memory usage in numpy.memmap?
>>    
>
> The whole point of using memmap is to let the OS do the buffering for
> you (which is likely to do a better job than you in many cases). Which
> OS are you using ? And how do you measure how much memory is taken by
> numpy for your array ?
>
> David
> _______________________________________________
>  
Hi David,

I agree with you about the point of using memmap. That is why the
behavior is so strange to me.
I actually measure the size of resident set (pink trace in figure2) of
the python process on Windows. Here I attached the  result. You can see
the  RAM  usage is definitely not file system cache.

LittleBigBrain

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

numpyMemmapAvaRAM.PNG (17K) Download Attachment
numpyMemmapAvaRAM2.PNG (25K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: How to limit the numpy.memmap's RAM usage?

Charles R Harris


On Sat, Oct 23, 2010 at 9:44 AM, braingateway <[hidden email]> wrote:
David Cournapeau :

2010/10/23 braingateway <[hidden email]>:
 
Hi everyone,
I noticed the numpy.memmap using RAM to buffer data from memmap files.
If I get a 100GB array in a memmap file and process it block by block,
the RAM usage is going to increasing with the process running until
there is no available space in RAM (4GB), even though the block size is
only 1MB.
for example:
####
a = numpy.memmap(‘a.bin’, dtype='float64', mode='r')
blocklen=1e5
b=npy.zeros((len(a)/blocklen,))
for i in range(0,len(a)/blocklen):
b[i]=npy.mean(a[i*blocklen:(i+1)*blocklen])
####
Is there any way to restrict the memory usage in numpy.memmap?
   

The whole point of using memmap is to let the OS do the buffering for
you (which is likely to do a better job than you in many cases). Which
OS are you using ? And how do you measure how much memory is taken by
numpy for your array ?

David
_______________________________________________
 
Hi David,

I agree with you about the point of using memmap. That is why the behavior is so strange to me.
I actually measure the size of resident set (pink trace in figure2) of the python process on Windows. Here I attached the  result. You can see the  RAM  usage is definitely not file system cache.


Umm, a good operating system will use *all* of ram for buffering because ram is fast and it assumes you are likely to reuse data you have already used once. If it needs some memory for something else it just writes a page to disk, if dirty, and reads in the new data from disk and changes the address of the page. Where you get into trouble is if pages can't be evicted for some reason. Most modern OS's also have special options available for reading in streaming data from disk that can lead to significantly faster access for that sort of thing, but I don't think you can do that with memmapped files.

I'm not sure how windows labels it's memory. IIRC, Memmaping a file leads to what is called file backed memory, it is essentially virtual memory. Now, I won't bet my life that there isn't a problem, but I think a misunderstanding of the memory information is more likely.

Chuck


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: How to limit the numpy.memmap's RAM usage?

Charles R Harris


On Sat, Oct 23, 2010 at 10:15 AM, Charles R Harris <[hidden email]> wrote:


On Sat, Oct 23, 2010 at 9:44 AM, braingateway <[hidden email]> wrote:
David Cournapeau :

2010/10/23 braingateway <[hidden email]>:
 
Hi everyone,
I noticed the numpy.memmap using RAM to buffer data from memmap files.
If I get a 100GB array in a memmap file and process it block by block,
the RAM usage is going to increasing with the process running until
there is no available space in RAM (4GB), even though the block size is
only 1MB.
for example:
####
a = numpy.memmap(‘a.bin’, dtype='float64', mode='r')
blocklen=1e5
b=npy.zeros((len(a)/blocklen,))
for i in range(0,len(a)/blocklen):
b[i]=npy.mean(a[i*blocklen:(i+1)*blocklen])
####
Is there any way to restrict the memory usage in numpy.memmap?
   

The whole point of using memmap is to let the OS do the buffering for
you (which is likely to do a better job than you in many cases). Which
OS are you using ? And how do you measure how much memory is taken by
numpy for your array ?

David
_______________________________________________
 
Hi David,

I agree with you about the point of using memmap. That is why the behavior is so strange to me.
I actually measure the size of resident set (pink trace in figure2) of the python process on Windows. Here I attached the  result. You can see the  RAM  usage is definitely not file system cache.


Umm, a good operating system will use *all* of ram for buffering because ram is fast and it assumes you are likely to reuse data you have already used once. If it needs some memory for something else it just writes a page to disk, if dirty, and reads in the new data from disk and changes the address of the page. Where you get into trouble is if pages can't be evicted for some reason. Most modern OS's also have special options available for reading in streaming data from disk that can lead to significantly faster access for that sort of thing, but I don't think you can do that with memmapped files.

I'm not sure how windows labels it's memory. IIRC, Memmaping a file leads to what is called file backed memory, it is essentially virtual memory. Now, I won't bet my life that there isn't a problem, but I think a misunderstanding of the memory information is more likely.


It is also possible that something else in your program is hanging onto memory but without knowing a lot more it is hard to tell. Are you seeing symptoms besides the memory graphs? It looks like you aren't running on windows, actually, so what OS are you running on?

Chuck

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: How to limit the numpy.memmap's RAM usage?

LittleBigBrain
Charles R Harris :

>
>
> On Sat, Oct 23, 2010 at 10:15 AM, Charles R Harris
> <[hidden email] <mailto:[hidden email]>> wrote:
>
>
>
>     On Sat, Oct 23, 2010 at 9:44 AM, braingateway
>     <[hidden email] <mailto:[hidden email]>> wrote:
>
>         David Cournapeau :
>
>             2010/10/23 braingateway <[hidden email]
>             <mailto:[hidden email]>>:
>              
>
>                 Hi everyone,
>                 I noticed the numpy.memmap using RAM to buffer data
>                 from memmap files.
>                 If I get a 100GB array in a memmap file and process it
>                 block by block,
>                 the RAM usage is going to increasing with the process
>                 running until
>                 there is no available space in RAM (4GB), even though
>                 the block size is
>                 only 1MB.
>                 for example:
>                 ####
>                 a = numpy.memmap(‘a.bin’, dtype='float64', mode='r')
>                 blocklen=1e5
>                 b=npy.zeros((len(a)/blocklen,))
>                 for i in range(0,len(a)/blocklen):
>                 b[i]=npy.mean(a[i*blocklen:(i+1)*blocklen])
>                 ####
>                 Is there any way to restrict the memory usage in
>                 numpy.memmap?
>                    
>
>
>             The whole point of using memmap is to let the OS do the
>             buffering for
>             you (which is likely to do a better job than you in many
>             cases). Which
>             OS are you using ? And how do you measure how much memory
>             is taken by
>             numpy for your array ?
>
>             David
>             _______________________________________________
>              
>
>         Hi David,
>
>         I agree with you about the point of using memmap. That is why
>         the behavior is so strange to me.
>         I actually measure the size of resident set (pink trace in
>         figure2) of the python process on Windows. Here I attached the
>          result. You can see the  RAM  usage is definitely not file
>         system cache.
>
>
>     Umm, a good operating system will use *all* of ram for buffering
>     because ram is fast and it assumes you are likely to reuse data
>     you have already used once. If it needs some memory for something
>     else it just writes a page to disk, if dirty, and reads in the new
>     data from disk and changes the address of the page. Where you get
>     into trouble is if pages can't be evicted for some reason. Most
>     modern OS's also have special options available for reading in
>     streaming data from disk that can lead to significantly faster
>     access for that sort of thing, but I don't think you can do that
>     with memmapped files.
>
>     I'm not sure how windows labels it's memory. IIRC, Memmaping a
>     file leads to what is called file backed memory, it is essentially
>     virtual memory. Now, I won't bet my life that there isn't a
>     problem, but I think a misunderstanding of the memory information
>     is more likely.
>
>
> It is also possible that something else in your program is hanging
> onto memory but without knowing a lot more it is hard to tell. Are you
> seeing symptoms besides the memory graphs? It looks like you aren't
> running on windows, actually, so what OS are you running on?
>
> Chuck
> ------------------------------------------------------------------------
>
>  
Hi Chuck,

Thanks a lot for quick response. I do run following supper simple script
on windows:

####
a = numpy.memmap(‘a.bin’, dtype='float64', mode='r')
blocklen=1e5
b=npy.zeros((len(a)/blocklen,))
for i in range(0,len(a)/blocklen):
b[i]=npy.mean(a[i*blocklen:(i+1)*blocklen])
####
Everything became supper slow after python ate all the RAM.
By the way, I also tried Qt QFile::map() there is no problem at all...

LittleBigBrain
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: How to limit the numpy.memmap's RAM usage?

Charles R Harris


On Sat, Oct 23, 2010 at 10:27 AM, braingateway <[hidden email]> wrote:
Charles R Harris :
>
>
> On Sat, Oct 23, 2010 at 10:15 AM, Charles R Harris
> <[hidden email] <mailto:[hidden email]>> wrote:
>
>
>
>     On Sat, Oct 23, 2010 at 9:44 AM, braingateway
>     <[hidden email] <mailto:[hidden email]>> wrote:
>
>         David Cournapeau :
>
>             2010/10/23 braingateway <[hidden email]
>             <mailto:[hidden email]>>:
>
>
>                 Hi everyone,
>                 I noticed the numpy.memmap using RAM to buffer data
>                 from memmap files.
>                 If I get a 100GB array in a memmap file and process it
>                 block by block,
>                 the RAM usage is going to increasing with the process
>                 running until
>                 there is no available space in RAM (4GB), even though
>                 the block size is
>                 only 1MB.
>                 for example:
>                 ####
>                 a = numpy.memmap(‘a.bin’, dtype='float64', mode='r')
>                 blocklen=1e5
>                 b=npy.zeros((len(a)/blocklen,))
>                 for i in range(0,len(a)/blocklen):
>                 b[i]=npy.mean(a[i*blocklen:(i+1)*blocklen])
>                 ####
>                 Is there any way to restrict the memory usage in
>                 numpy.memmap?
>
>
>
>             The whole point of using memmap is to let the OS do the
>             buffering for
>             you (which is likely to do a better job than you in many
>             cases). Which
>             OS are you using ? And how do you measure how much memory
>             is taken by
>             numpy for your array ?
>
>             David
>             _______________________________________________
>
>
>         Hi David,
>
>         I agree with you about the point of using memmap. That is why
>         the behavior is so strange to me.
>         I actually measure the size of resident set (pink trace in
>         figure2) of the python process on Windows. Here I attached the
>          result. You can see the  RAM  usage is definitely not file
>         system cache.
>
>
>     Umm, a good operating system will use *all* of ram for buffering
>     because ram is fast and it assumes you are likely to reuse data
>     you have already used once. If it needs some memory for something
>     else it just writes a page to disk, if dirty, and reads in the new
>     data from disk and changes the address of the page. Where you get
>     into trouble is if pages can't be evicted for some reason. Most
>     modern OS's also have special options available for reading in
>     streaming data from disk that can lead to significantly faster
>     access for that sort of thing, but I don't think you can do that
>     with memmapped files.
>
>     I'm not sure how windows labels it's memory. IIRC, Memmaping a
>     file leads to what is called file backed memory, it is essentially
>     virtual memory. Now, I won't bet my life that there isn't a
>     problem, but I think a misunderstanding of the memory information
>     is more likely.
>
>
> It is also possible that something else in your program is hanging
> onto memory but without knowing a lot more it is hard to tell. Are you
> seeing symptoms besides the memory graphs? It looks like you aren't
> running on windows, actually, so what OS are you running on?
>
> Chuck
> ------------------------------------------------------------------------
>
>
Hi Chuck,

Thanks a lot for quick response. I do run following supper simple script
on windows:

####
a = numpy.memmap(‘a.bin’, dtype='float64', mode='r')
blocklen=1e5
b=npy.zeros((len(a)/blocklen,))
for i in range(0,len(a)/blocklen):
b[i]=npy.mean(a[i*blocklen:(i+1)*blocklen])
####
Everything became supper slow after python ate all the RAM.
By the way, I also tried Qt QFile::map() there is no problem at all...


Hmm. Nothing looks suspicious. For reference, can you be specific about the OS/version, python version, and numpy version?

What happens if you simply do
for i in range(0,len(a)/blocklen):
     a[i*blocklen:(i+1)*blocklen].copy()

Chuck


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: How to limit the numpy.memmap's RAM usage?

LittleBigBrain
Charles R Harris

>
>
> On Sat, Oct 23, 2010 at 10:27 AM, braingateway <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Charles R Harris :
>     >
>     >
>     > On Sat, Oct 23, 2010 at 10:15 AM, Charles R Harris
>     > <[hidden email] <mailto:[hidden email]>
>     <mailto:[hidden email]
>     <mailto:[hidden email]>>> wrote:
>     >
>     >
>     >
>     >     On Sat, Oct 23, 2010 at 9:44 AM, braingateway
>     >     <[hidden email] <mailto:[hidden email]>
>     <mailto:[hidden email] <mailto:[hidden email]>>>
>     wrote:
>     >
>     >         David Cournapeau :
>     >
>     >             2010/10/23 braingateway <[hidden email]
>     <mailto:[hidden email]>
>     >             <mailto:[hidden email]
>     <mailto:[hidden email]>>>:
>     >
>     >
>     >                 Hi everyone,
>     >                 I noticed the numpy.memmap using RAM to buffer data
>     >                 from memmap files.
>     >                 If I get a 100GB array in a memmap file and
>     process it
>     >                 block by block,
>     >                 the RAM usage is going to increasing with the
>     process
>     >                 running until
>     >                 there is no available space in RAM (4GB), even
>     though
>     >                 the block size is
>     >                 only 1MB.
>     >                 for example:
>     >                 ####
>     >                 a = numpy.memmap(‘a.bin’, dtype='float64', mode='r')
>     >                 blocklen=1e5
>     >                 b=npy.zeros((len(a)/blocklen,))
>     >                 for i in range(0,len(a)/blocklen):
>     >                 b[i]=npy.mean(a[i*blocklen:(i+1)*blocklen])
>     >                 ####
>     >                 Is there any way to restrict the memory usage in
>     >                 numpy.memmap?
>     >
>     >
>     >
>     >             The whole point of using memmap is to let the OS do the
>     >             buffering for
>     >             you (which is likely to do a better job than you in many
>     >             cases). Which
>     >             OS are you using ? And how do you measure how much
>     memory
>     >             is taken by
>     >             numpy for your array ?
>     >
>     >             David
>     >             _______________________________________________
>     >
>     >
>     >         Hi David,
>     >
>     >         I agree with you about the point of using memmap. That
>     is why
>     >         the behavior is so strange to me.
>     >         I actually measure the size of resident set (pink trace in
>     >         figure2) of the python process on Windows. Here I
>     attached the
>     >          result. You can see the  RAM  usage is definitely not file
>     >         system cache.
>     >
>     >
>     >     Umm, a good operating system will use *all* of ram for buffering
>     >     because ram is fast and it assumes you are likely to reuse data
>     >     you have already used once. If it needs some memory for
>     something
>     >     else it just writes a page to disk, if dirty, and reads in
>     the new
>     >     data from disk and changes the address of the page. Where
>     you get
>     >     into trouble is if pages can't be evicted for some reason. Most
>     >     modern OS's also have special options available for reading in
>     >     streaming data from disk that can lead to significantly faster
>     >     access for that sort of thing, but I don't think you can do that
>     >     with memmapped files.
>     >
>     >     I'm not sure how windows labels it's memory. IIRC, Memmaping a
>     >     file leads to what is called file backed memory, it is
>     essentially
>     >     virtual memory. Now, I won't bet my life that there isn't a
>     >     problem, but I think a misunderstanding of the memory
>     information
>     >     is more likely.
>     >
>     >
>     > It is also possible that something else in your program is hanging
>     > onto memory but without knowing a lot more it is hard to tell.
>     Are you
>     > seeing symptoms besides the memory graphs? It looks like you aren't
>     > running on windows, actually, so what OS are you running on?
>     >
>     > Chuck
>     >
>     ------------------------------------------------------------------------
>     >
>     >
>     Hi Chuck,
>
>     Thanks a lot for quick response. I do run following supper simple
>     script
>     on windows:
>
>     ####
>     a = numpy.memmap(‘a.bin’, dtype='float64', mode='r')
>     blocklen=1e5
>     b=npy.zeros((len(a)/blocklen,))
>     for i in range(0,len(a)/blocklen):
>     b[i]=npy.mean(a[i*blocklen:(i+1)*blocklen])
>     ####
>     Everything became supper slow after python ate all the RAM.
>     By the way, I also tried Qt QFile::map() there is no problem at all...
>
>
> Hmm. Nothing looks suspicious. For reference, can you be specific
> about the OS/version, python version, and numpy version?
>
> What happens if you simply do
> for i in range(0,len(a)/blocklen):
>      a[i*blocklen:(i+1)*blocklen].copy()
>
> Chuck
>
Hi Chuck,
Here is the versions:
print sys.version
2.6.5 (r265:79096, Mar 19 2010, 18:02:59) [MSC v.1500 64 bit (AMD64)]
print numpy.__version__
1.4.1
print sys.getwindowsversion()
(5, 2, 3790, 2, 'Service Pack 2')

Besides, a[i*blocklen:(i+1)*blocklen].copy() gave out the same result.

LittleBigBrain


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

numpyMemmapAvaRAM3.png (26K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: How to limit the numpy.memmap's RAM usage?

David Cournapeau
In reply to this post by LittleBigBrain
On Sun, Oct 24, 2010 at 12:44 AM, braingateway <[hidden email]> wrote:

>
> I agree with you about the point of using memmap.

> That is why the behavior
> is so strange to me.

I think it is expected. What kind of behavior were you expecting ? To
be clear, if I have a lot of available ram, I expect memmap arrays to
take almost all of it (virtual memroy ~ resident memory). Now, if at
the same time, another process starts taking a lot of memory, I expect
the OS to automatically lower resident memory for the process using
memmap.

I did a small experiment on mac os x, creating a giant mmap'd array in
numpy, and at the same time running a small C program using mlock (to
lock pages into physical memory). As soon as I lock a big area (where
big means most of my physical ram), the python process dealing with
the mmap area sees its resident memory decrease. As soon as I kill the
C program locking the memory, the resident memory starts increasing
again.

cheers,

David
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: How to limit the numpy.memmap's RAM usage?

Simon Lyngby Kokkendorff
Hi List,

  I had similar problems on windows. I tried to use memmaps to buffer a large amount of data and process it in chunks. But I found that whenever I tried to do this, I always ended filling up RAM completely which led to crashes of my python script with a MemoryError. This led me to consider, actually from an advice via this list, the module h5py, which has a nice numpy interface to the hdf5 file format. It seemed more clear to me with the h5py-module, what was being buffered on disk and what was stored in RAM.

  Cheers,
  Simon


On Sun, Oct 24, 2010 at 2:15 AM, David Cournapeau <[hidden email]> wrote:
On Sun, Oct 24, 2010 at 12:44 AM, braingateway <[hidden email]> wrote:

>
> I agree with you about the point of using memmap.

> That is why the behavior
> is so strange to me.

I think it is expected. What kind of behavior were you expecting ? To
be clear, if I have a lot of available ram, I expect memmap arrays to
take almost all of it (virtual memroy ~ resident memory). Now, if at
the same time, another process starts taking a lot of memory, I expect
the OS to automatically lower resident memory for the process using
memmap.

I did a small experiment on mac os x, creating a giant mmap'd array in
numpy, and at the same time running a small C program using mlock (to
lock pages into physical memory). As soon as I lock a big area (where
big means most of my physical ram), the python process dealing with
the mmap area sees its resident memory decrease. As soon as I kill the
C program locking the memory, the resident memory starts increasing
again.

cheers,

David
_______________________________________________
NumPy-Discussion mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/numpy-discussion