Re: NumPy-Discussion Digest, Vol 162, Issue 27

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Re: NumPy-Discussion Digest, Vol 162, Issue 27

Keyvis Damptey
Thanks. This is soooo embarressing, but I wasn't able to create a new matrix because I forgot to delete the original massive matrix. I was testing how big it could go in terms of rows/columns before reaching the limit and forgot to delete the last object before creating a new one.
 Sadly that data usage was not reflected in the task manager for the VM instance.

On Tue, Mar 24, 2020, 6:44 PM <[hidden email]> wrote:
Send NumPy-Discussion mailing list submissions to
        [hidden email]

To subscribe or unsubscribe via the World Wide Web, visit
        https://mail.python.org/mailman/listinfo/numpy-discussion
or, via email, send a message with subject or body 'help' to
        [hidden email]

You can reach the person managing the list at
        [hidden email]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of NumPy-Discussion digest..."


Today's Topics:

   1. Re: Numpy doesn't use RAM (Sebastian Berg)
   2. Re: Numpy doesn't use RAM (Stanley Seibert)
   3. Re: Numpy doesn't use RAM (Benjamin Root)
   4. Re: Put type annotations in NumPy proper? (Joshua Wilson)


----------------------------------------------------------------------

Message: 1
Date: Tue, 24 Mar 2020 13:15:47 -0500
From: Sebastian Berg <[hidden email]>
To: [hidden email]
Subject: Re: [Numpy-discussion] Numpy doesn't use RAM
Message-ID:
        <[hidden email]>
Content-Type: text/plain; charset="utf-8"

On Tue, 2020-03-24 at 13:59 -0400, Keyvis Damptey wrote:
> Hi Numpy dev community,
>
> I'm keyvis, a statistical data scientist.
>
> I'm currently using numpy in python 3.8.2 64-bit for a clustering
> problem,
> on a machine with 1.9 TB RAM. When I try using np.zeros to create a
> 600,000
> by 600,000 matrix of dtype=np.float32 it says
> "Unable to allocate 1.31 TiB for an array with shape (600000, 600000)
> and
>
> data type float32"
>

If this error happens, allocating the memory failed. This should be
pretty much a simple `malloc` call in C, so this is the kernel
complaining, not Python/NumPy.

I am not quite sure, but maybe memory fragmentation plays its part, or
simply are actually out of memory for that process, 1.44TB is a
significant portion of the total memory after all.

Not sure what to say, but I think you should probably look into other
solutions, maybe using HDF5, zarr, or memory-mapping (although I am not
sure the last actually helps). It will be tricky to work with arrays of
a size that is close to the available total memory.

Maybe someone who works more with such data here can give you tips on
what projects can help you or what solutions to look into.

- Sebastian



> I used psutils to determine how much RAM python thinks it has access
> to and
> it return with 1.8 TB approx.
>
> Is there some way I can fix numpy to create these large arrays?
> Thanks for your time and consideration
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200324/16501583/attachment-0001.sig>

------------------------------

Message: 2
Date: Tue, 24 Mar 2020 13:35:49 -0500
From: Stanley Seibert <[hidden email]>
To: Discussion of Numerical Python <[hidden email]>
Subject: Re: [Numpy-discussion] Numpy doesn't use RAM
Message-ID:
        <[hidden email]>
Content-Type: text/plain; charset="utf-8"

In addition to what Sebastian said about memory fragmentation and OS limits
about memory allocations, I do think it will be hard to work with an array
that close to the memory limit in NumPy regardless.  Almost any operation
will need to make a temporary array and exceed your memory limit.  You
might want to look at Dask Array for a NumPy-like API for working with
chunked arrays that can be staged in and out of memory:

https://docs.dask.org/en/latest/array.html

As a bonus, Dask will also let you make better use of the large number of
CPU cores that you likely have in your 1.9 TB RAM system.  :)

On Tue, Mar 24, 2020 at 1:00 PM Keyvis Damptey <[hidden email]>
wrote:

> Hi Numpy dev community,
>
> I'm keyvis, a statistical data scientist.
>
> I'm currently using numpy in python 3.8.2 64-bit for a clustering problem,
> on a machine with 1.9 TB RAM. When I try using np.zeros to create a 600,000
> by 600,000 matrix of dtype=np.float32 it says
> "Unable to allocate 1.31 TiB for an array with shape (600000, 600000) and
> data type float32"
>
> I used psutils to determine how much RAM python thinks it has access to
> and it return with 1.8 TB approx.
>
> Is there some way I can fix numpy to create these large arrays?
> Thanks for your time and consideration
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200324/02cbeb71/attachment-0001.html>

------------------------------

Message: 3
Date: Tue, 24 Mar 2020 14:36:45 -0400
From: Benjamin Root <[hidden email]>
To: Discussion of Numerical Python <[hidden email]>
Subject: Re: [Numpy-discussion] Numpy doesn't use RAM
Message-ID:
        <CANNq6Fk2vczBWgPPJmbxmSijViwaR=[hidden email]>
Content-Type: text/plain; charset="utf-8"

Another thing to point out about having an array of that percentage of the
available memory is that it severely restricts what you can do with it.
Since you are above 50% of the available memory, you won't be able to
create another array that would be the result of computing something with
that array. So, you are restricted to querying (which you could do without
having everything in-memory), or in-place operations.

Dask arrays might be what you are really looking for.

Ben Root

On Tue, Mar 24, 2020 at 2:18 PM Sebastian Berg <[hidden email]>
wrote:

> On Tue, 2020-03-24 at 13:59 -0400, Keyvis Damptey wrote:
> > Hi Numpy dev community,
> >
> > I'm keyvis, a statistical data scientist.
> >
> > I'm currently using numpy in python 3.8.2 64-bit for a clustering
> > problem,
> > on a machine with 1.9 TB RAM. When I try using np.zeros to create a
> > 600,000
> > by 600,000 matrix of dtype=np.float32 it says
> > "Unable to allocate 1.31 TiB for an array with shape (600000, 600000)
> > and
> >
> > data type float32"
> >
>
> If this error happens, allocating the memory failed. This should be
> pretty much a simple `malloc` call in C, so this is the kernel
> complaining, not Python/NumPy.
>
> I am not quite sure, but maybe memory fragmentation plays its part, or
> simply are actually out of memory for that process, 1.44TB is a
> significant portion of the total memory after all.
>
> Not sure what to say, but I think you should probably look into other
> solutions, maybe using HDF5, zarr, or memory-mapping (although I am not
> sure the last actually helps). It will be tricky to work with arrays of
> a size that is close to the available total memory.
>
> Maybe someone who works more with such data here can give you tips on
> what projects can help you or what solutions to look into.
>
> - Sebastian
>
>
>
> > I used psutils to determine how much RAM python thinks it has access
> > to and
> > it return with 1.8 TB approx.
> >
> > Is there some way I can fix numpy to create these large arrays?
> > Thanks for your time and consideration
> > _______________________________________________
> > NumPy-Discussion mailing list
> > [hidden email]
> > https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200324/12a718d2/attachment-0001.html>

------------------------------

Message: 4
Date: Tue, 24 Mar 2020 15:42:27 -0700
From: Joshua Wilson <[hidden email]>
To: Discussion of Numerical Python <[hidden email]>
Subject: Re: [Numpy-discussion] Put type annotations in NumPy proper?
Message-ID:
        <[hidden email]>
Content-Type: text/plain; charset="UTF-8"

> That is, is this an all-or-nothing thing where as soon as we start, numpy-stubs becomes unusable?

Until NumPy is made PEP 561 compatible by adding a `py.typed` file,
type checkers will ignore the types in the repo, so in theory you can
avoid the all or nothing. In practice it's maybe trickier because
currently people can use the stubs, but they won't be able to use the
types in the repo until the PEP 561 switch is flipped. So e.g.
currently SciPy pulls the stubs from `numpy-stubs` master, allowing
for a short

find place where NumPy stubs are lacking -> improve stubs -> improve SciPy types

loop. If all development moves into the main repo then SciPy is
blocked on it becoming PEP 561 compatible before moving forward. But,
you could complain that I put the cart before the horse with
introducing typing in the SciPy repo before the NumPy types were more
resolved, and that's probably a fair complaint.

> Anyone interested in taking the lead on this?

Not that I am a core developer or anything, but I am interested in
helping to improve typing in NumPy.

On Tue, Mar 24, 2020 at 11:15 AM Eric Wieser
<[hidden email]> wrote:
>
> >  Putting
> > aside ndarray, as more challenging, even annotations for numpy functions
> > and method parameters with built-in types would help, as a start.
>
> This is a good idea in principle, but one thing concerns me.
>
> If we add type annotations to numpy, does it become an error to have numpy-stubs installed?
> That is, is this an all-or-nothing thing where as soon as we start, numpy-stubs becomes unusable?
>
> Eric
>
> On Tue, 24 Mar 2020 at 17:28, Roman Yurchak <[hidden email]> wrote:
>>
>> Thanks for re-starting this discussion, Stephan! I think there is
>> definitely significant interest in this topic:
>> https://github.com/numpy/numpy/issues/7370 is the issue with the largest
>> number of user likes in the issue tracker (FWIW).
>>
>> Having them in numpy, as opposed to a separate numpy-stubs repository
>> would indeed be ideal from a user perspective. When looking into it in
>> the past, I was never sure how well in sync numpy-stubs was. Putting
>> aside ndarray, as more challenging, even annotations for numpy functions
>> and method parameters with built-in types would help, as a start.
>>
>> To add to the previously listed projects that would benefit from this,
>> we are currently considering to start using some (minimal) type
>> annotations in scikit-learn.
>>
>> --
>> Roman Yurchak
>>
>> On 24/03/2020 18:00, Stephan Hoyer wrote:
>> > When we started numpy-stubs [1] a few years ago, putting type
>> > annotations in NumPy itself seemed premature. We still supported Python
>> > 2, which meant that we would need to use awkward comments for type
>> > annotations.
>> >
>> > Over the past few years, using type annotations has become increasingly
>> > popular, even in the scientific Python stack. For example, off-hand I
>> > know that at least SciPy, pandas and xarray have at least part of their
>> > APIs type annotated. Even without annotations for shapes or dtypes, it
>> > would be valuable to have near complete annotations for NumPy, the
>> > project at the bottom of the scientific stack.
>> >
>> > Unfortunately, numpy-stubs never really took off. I can think of a few
>> > reasons for that:
>> > 1. Missing high level guidance on how to write type annotations,
>> > particularly for how (or if) to annotate particularly dynamic parts of
>> > NumPy (e.g., consider __array_function__), and whether we should
>> > prioritize strictness or faithfulness [2].
>> > 2. We didn't have a good experience for new contributors. Due to the
>> > relatively low level of interest in the project, when a contributor
>> > would occasionally drop in, I often didn't even notice their PR for a
>> > few weeks.
>> > 3. Developing type annotations separately from the main codebase makes
>> > them a little harder to keep in sync. This means that type annotations
>> > couldn't serve their typical purpose of self-documenting code. Part of
>> > this may be necessary for NumPy (due to our use of C extensions), but
>> > large parts of NumPy's user facing APIs are written in Python. We no
>> > longer support Python 2, so at least we no longer need to worry about
>> > putting annotations in comments.
>> >
>> > We eventually could probably use a formal NEP (or several) on how we
>> > want to use type annotations in NumPy, but I think a good first step
>> > would be to think about how to start moving the annotations from
>> > numpy-stubs into numpy proper.
>> >
>> > Any thoughts? Anyone interested in taking the lead on this?
>> >
>> > Cheers,
>> > Stephan
>> >
>> > [1] https://github.com/numpy/numpy-stubs
>> > [2] https://github.com/numpy/numpy-stubs/issues/12
>> >
>> > _______________________________________________
>> > NumPy-Discussion mailing list
>> > [hidden email]
>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>> >
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> [hidden email]
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> [hidden email]
> https://mail.python.org/mailman/listinfo/numpy-discussion


------------------------------

Subject: Digest Footer

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion


------------------------------

End of NumPy-Discussion Digest, Vol 162, Issue 27
*************************************************

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion