python memory use

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

python memory use

Robin-62
Hi,

I am starting to push the limits of the available memory and I'd like
to understand a bit better how Python handles memory...

If I try to allocate something too big for the available memory I
often get a MemoryError exception. However, in other situations,
Python memory use continues to grow until the machine falls over. I
was hoping to understand the difference between those cases. From what
I've read Python never returns memory to the OS (is this right?) so
the second case, python is holding on to memory that it isn't really
using (for objects that have been destroyed). I guess my question is
why doesn't it reuse the memory freed from object deletions instead of
requesting more - and even then when requesting more, why does it
continue until the machine falls over and not cause a MemoryError?

While investigating this I found this script:
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/511474
which does wonders for my code. I was wondering if this function
should be included in Numpy as it seems to provide an important
feature, or perhaps an entry on the wiki (in Cookbook section?)

Thanks,

Robin
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: python memory use

Christian Heimes-2
Robin schrieb:

> If I try to allocate something too big for the available memory I
> often get a MemoryError exception. However, in other situations,
> Python memory use continues to grow until the machine falls over. I
> was hoping to understand the difference between those cases. From what
> I've read Python never returns memory to the OS (is this right?) so
> the second case, python is holding on to memory that it isn't really
> using (for objects that have been destroyed). I guess my question is
> why doesn't it reuse the memory freed from object deletions instead of
> requesting more - and even then when requesting more, why does it
> continue until the machine falls over and not cause a MemoryError?

Your assumption isn't correct. Python releases memory. For small objects
Python uses its own memory allocation system as explained in
http://svn.python.org/projects/python/trunk/Objects/obmalloc.c . For
integer and floats uses a separate block allocation schema.

Christian

_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: python memory use

Andrew Straw
In reply to this post by Robin-62
Robin wrote:
> Hi,
>
> I am starting to push the limits of the available memory and I'd like
> to understand a bit better how Python handles memory...
>  
This is why I switched to 64 bit linux and never looked back.
> If I try to allocate something too big for the available memory I
> often get a MemoryError exception. However, in other situations,
> Python memory use continues to grow until the machine falls over. I
> was hoping to understand the difference between those cases.
I don't know what "falls over" mean. It could be that you're getting
swap death -- the kernel starts attempting to use virtual memory (hard
disk) for some of the RAM. This would be characterized by your CPU use
dropping to near-zero, your hard disk grinding away, and your swap space
use increasing.

The MemoryError simply means that Python made a request for memory that
the kernel didn't grant.

There's something else you might run into -- the maximum memory size of
a process before the kernel kills that process. On linux i686, IIRC this
limit is 3 GB.

I'm not sure why you get different behavior on different runs.

FWIW, with 64 bit linux the worst that happens to me now is swap death,
which can be forestalled by adding lots of RAM.
>  From what
> I've read Python never returns memory to the OS (is this right?) so
> the second case, python is holding on to memory that it isn't really
> using (for objects that have been destroyed). I guess my question is
> why doesn't it reuse the memory freed from object deletions instead of
> requesting more - and even then when requesting more, why does it
> continue until the machine falls over and not cause a MemoryError?
>  
It's hard to say without knowing what your code does. A first guess is
that you're allocating lots of memory without allowing it to be freed.
Specifically, you may have references to objects which you no longer
need, and you should eliminate those references and allow them to be
garbage collected. In some cases, circular references can be hard for
python to detect, so you might want to play around with the gc module
and judicious use of the del statement. Note also that IPython keeps
references to past results by default (the history).

> While investigating this I found this script:
> http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/511474
> which does wonders for my code. I was wondering if this function
> should be included in Numpy as it seems to provide an important
> feature, or perhaps an entry on the wiki (in Cookbook section?)
>  
I don't think it belongs in numpy per se, and I'm not sure of the
necessity of a spot on the scipy cookbook given that it's in the python
cookbook. Perhaps more useful would be starting a page called
"MemoryIssues" on the scipy wiki -- I imagine this subject, as a whole,
is of particular interest for many in the numpy/scipy crowd. Certainly
adding a link and description to that recipe would be useful in that
context. But please, feel free to add to or edit the wiki as you see fit
-- if you think something will be useful, by all means, go ahead and do
it. I think there are enough eyes on the wiki that it's fairly
self-regulating.

-Andrew
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: python memory use

Muhammad Alkarouri-2
In reply to this post by Robin-62

--- Robin <[hidden email]> wrote:
[...]
> While investigating this I found this script:
> http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/511474
> which does wonders for my code. I was wondering if this function
> should be included in Numpy as it seems to provide an important
> feature, or perhaps an entry on the wiki (in Cookbook section?)

I am the author of the mentioned recipe, and the reason I have written it is
similar to your situation. I would add, however that ideally there shouldn't be
such a problem but in reality there is. I have no clue why.

As Christian said, Python does release memory. There was a problem before
Python 2.5 as I understand, but the memory manager was patched (see
http://evanjones.ca/python-memory-part3.html) and now I personally don't use
Python <2.5 for that reason. The new manager helped, but still I face that
problem, so I wrote the recipe.

--- Andrew wrote:
[...]
> It's hard to say without knowing what your code does. A first guess is
> that you're allocating lots of memory without allowing it to be freed.
> Specifically, you may have references to objects which you no longer
> need, and you should eliminate those references and allow them to be
> garbage collected. In some cases, circular references can be hard for
> python to detect, so you might want to play around with the gc module
> and judicious use of the del statement. Note also that IPython keeps
> references to past results by default (the history).

Sound advice, specially the part about iPython which is often overlooked. I
would have to say I have tried to play a lot with the gc module, calling
gc.collect / enable / disable / playing with thresholds. In practice it helps a
little but not much. In my experience, it is more likely in numpy code using
only arrays of numbers to have references/views to arrays that you do not need
than to have circular references.
I haven't looked at the internals of gc, obmalloc or any other Python code.

What happens to me is usually the machine starts to use virtual memory, slowing
the whole computation a lot. I wonder if your algorithm that needs allocation
of huge memory to cause a MemoryError can be modified to avoid that. I have
found that to be the case is some situations. As an example, for PCA you might
find depending on you matrix size the use of the transpose or other algorithms
more suitable -- I ended up using http://folk.uio.no/henninri/pca_module.

While I am of course partial to the fate of the cookbook recipe, I also feel
that it doesn't directly belong in numpy -- it should be useful for other
Pythonistas. May be in numpy, python proper somewhere, or one of the parallel
processing libraries. I agree that a wiki page will be more beneficial --
though not sure what else should be there.

Regards,

Muhammad Alkarouri


      __________________________________________________________
Sent from Yahoo! Mail.
A Smarter Email http://uk.docs.yahoo.com/nowyoucan.html
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion