Hi,
While working again on the fftpack module, to clean things up and speed some backends (in particular fftw3, which is really sub-optimal right now), I remembered how much unaligned data pointer in numpy arrays hurt performances. So I would like to relaunch the discussion on aligned allocators and default alignement for numpy arrays : http://www.mail-archive.com/numpy-discussion@.../msg04005.html Basically, what I have in mind is, in a first step (for numpy 1.2): - define functions to allocate on a given alignement - make PyMemData_NEW 16 byte aligned by default (to be compatible with SSE and co). The problem was, and still is, realloc. It is not possible to implement realloc with malloc/free (in a portable way), and as such, it is not possible to have an aligned realloc. In numpy, we can always replace realloc by malloc/free, because we know the size of the old block: would deprecating PyMemData_RENEW and replacing them by PyMemeData_NEW/PyMemData_FREE be possible, such as to make all numpy arrays follow a default alignement ? There are only a few of them in numpy (6 of them), 0 in scipy, and I guess extensions never really used them ? cheers, David _______________________________________________ Numpy-discussion mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/numpy-discussion |
On Mon, May 5, 2008 at 5:44 AM, David Cournapeau <[hidden email]> wrote: Hi, I don't think you would want to do this in the core of PyArray_FromIter; presumably realloc can sometimes reuse the existing pointer and save on allocating a new chunk of memory. Since there are lots of allocations in fromiter, this could potentially be a big performance hit. (At least I think so, realloc has always been kinda voodoo to me). One could use PyMemData_NEW/PyMemData_FREE in the final allocation to make sure that the data is alligned, we allready do a realloc there to dump any extra space. Or, possibly better, one could choose which allocation strategy to use here depending on whether the data was alligned or not.
-- . __ . |-\ . . [hidden email] _______________________________________________ Numpy-discussion mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/numpy-discussion |
In reply to this post by cdavid
On Mon, May 5, 2008 at 7:44 AM, David Cournapeau
<[hidden email]> wrote: > In numpy, we can always replace realloc by malloc/free, because we know > the size of the old block: would deprecating PyMemData_RENEW and > replacing them by PyMemeData_NEW/PyMemData_FREE be possible, such as to > make all numpy arrays follow a default alignement ? There are only a few > of them in numpy (6 of them), 0 in scipy, and I guess extensions never > really used them ? I am in favor of at least trying this out. We will have to have a set of benchmarks to make sure we haven't hurt the current uses of PyMemData_RENEW which Tim points out. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco _______________________________________________ Numpy-discussion mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/numpy-discussion |
In reply to this post by Tim Hochberg
On Tue, May 6, 2008 at 1:59 AM, Timothy Hochberg <[hidden email]> wrote:
> I don't think you would want to do this in the core of PyArray_FromIter; > presumably realloc can sometimes reuse the existing pointer and save on > allocating a new chunk of memory. Since there are lots of allocations in > fromiter, this could potentially be a big performance hit. Looking at PyArray_FromIter, there is already a log(n) behavior for allocation, so I am not sure it would hurt so much: I would guess that realloc often need to allocate a new block if the size is the double of the former pointer. That's how realloc seems to work on FreeBSD, at least: http://www.freebsd.org/cgi/cvsweb.cgi/src/lib/libc/stdlib/malloc.c?rev=1.171;content-type=text%2Fplain > (At least I think > so, realloc has always been kinda voodoo to me). Once malloc is implemented, I would guess realloc to be simple, no ? For each block of memory in the malloc chain, just check whereas there is enough size to reuse the same block and adjacent block if any, otherwise, free + malloc + copy, non ? One could use > PyMemData_NEW/PyMemData_FREE in the final allocation to make sure that the > data is alligned, we allready do a realloc there to dump any extra space. I guess that would be the easiest: directly use realloc in the loop, and use PyDataMem_NEW/FREE at the end if necessary. Using realloc is no problem if the buffer is always freed in the same function. > Or, possibly better, one could choose which allocation strategy to use here > depending on whether the data was alligned or not. I mentioned step 1, because there would be a step 2 (but maybe only numpy 2): extending the C api functions to create arrays such as asking for an explicit alignment is possible. David _______________________________________________ Numpy-discussion mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/numpy-discussion |
In reply to this post by Robert Kern-2
2008/5/5 Robert Kern <[hidden email]>:
> On Mon, May 5, 2008 at 7:44 AM, David Cournapeau > <[hidden email]> wrote: > > > In numpy, we can always replace realloc by malloc/free, because we know > > the size of the old block: would deprecating PyMemData_RENEW and > > replacing them by PyMemeData_NEW/PyMemData_FREE be possible, such as to > > make all numpy arrays follow a default alignement ? There are only a few > > of them in numpy (6 of them), 0 in scipy, and I guess extensions never > > really used them ? > > I am in favor of at least trying this out. We will have to have a set > of benchmarks to make sure we haven't hurt the current uses of > PyMemData_RENEW which Tim points out. realloc() may not be a performance win anyway. Allocating new memory is quite fast, copying data is quite fast, and in-place realloc() tends to cause memory fragmentation - take an arena full of 1024-byte blocks and suddenly make one of them 256 bytes long and you haven't gained anything at all in most malloc() implementations. So yes, benchmarks. Anne _______________________________________________ Numpy-discussion mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/numpy-discussion |
In reply to this post by Robert Kern-2
On Tue, May 6, 2008 at 2:11 AM, Robert Kern <[hidden email]> wrote:
> > I am in favor of at least trying this out. We will have to have a set > of benchmarks to make sure we haven't hurt the current uses of > PyMemData_RENEW which Tim points out. What would be a good stress test for PyArray_FromIter ? I think using array iterator was the main hotspot for reading large arff files in scipy.io.arff, would that be enough ? cheers, David _______________________________________________ Numpy-discussion mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/numpy-discussion |
On Mon, May 5, 2008 at 1:30 PM, David Cournapeau <[hidden email]> wrote:
> On Tue, May 6, 2008 at 2:11 AM, Robert Kern <[hidden email]> wrote: > > > > I am in favor of at least trying this out. We will have to have a set > > of benchmarks to make sure we haven't hurt the current uses of > > PyMemData_RENEW which Tim points out. > > What would be a good stress test for PyArray_FromIter ? I think using > array iterator was the main hotspot for reading large arff files in > scipy.io.arff, would that be enough ? Since there are only 6 places where PyMemData_RENEW is used, all 6 uses should be benchmarked. I would prefer a more targeted benchmark so we know exactly what we are measuring. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco _______________________________________________ Numpy-discussion mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/numpy-discussion |
Robert Kern wrote:
> > Since there are only 6 places where PyMemData_RENEW is used, all 6 > uses should be benchmarked. I would prefer a more targeted benchmark > so we know exactly what we are measuring. > Ok, I started a new branch for aligned allocator: http://projects.scipy.org/scipy/numpy/browser/branches/aligned_alloca For now, I've only commited: - a new bunch of tests for all functions using PyDataMem_RENEW (in numpy/core/tests/test_renew.py) - If NOUSE_PYDATAMEM_RENEW is defined, PyDataMEM_RENEW is not used: instead, SYS_REALLOC is used in loops (this is defined to realloc), and when outside a loop, _fake_realloc is used, which does not use realloc, but use PyDataMem_NEW/FREE. I got a small slowdown, but I am not sure it is really relevant: the figures change almost as much between several runs as between runs of different implementations. I am not convinced about the quality of my tests either. cheers, David _______________________________________________ Numpy-discussion mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/numpy-discussion |
In reply to this post by cdavid
2008/5/5 David Cournapeau <[hidden email]>:
> Basically, what I have in mind is, in a first step (for numpy 1.2): > - define functions to allocate on a given alignement > - make PyMemData_NEW 16 byte aligned by default (to be compatible > with SSE and co). > > The problem was, and still is, realloc. It is not possible to implement > realloc with malloc/free (in a portable way), and as such, it is not > possible to have an aligned realloc. How much does this matter? I mean, what if you simply left all the reallocs as-is? The arrays that resulted from reallocs would not typically be aligned, but we cannot in any case expect all arrays to be aligned. Or, actually, depending on how you manage the alignment, it may be possible to write a portable aligned realloc. As I understand it, a portable aligned malloc allocates extra memory, then adds something to the pointer to make the memory arena aligned. free() must then recover the original pointer and call free() on it - though this can presumably be avoided if garbage collection is smart enough. realloc() also needs to recover the pointer to call the underlying realloc(). One answer is to ensure that at least one byte of padding is always done at the beginning, so that the number of pad bytes used on an aligned pointer p is accessible as ((uint8*)p)[-1]. Other schemes are available; I have a peculiar affection for the pure-python solution of constructing a view. In any of them, it seems to me, if free() can recover the original pointer, so can realloc()... I don't think any numpy function can assume that its input arrays are aligned, since users may like to pass, say, mmap()ed hunks of a file, over which numpy has no control of the alignment. In this case, then, how much of a problem is it if a few stray realloc()s produce non-aligned arrays? Anne _______________________________________________ Numpy-discussion mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/numpy-discussion |
Anne Archibald wrote:
> > How much does this matter? I mean, what if you simply left all the > reallocs as-is? The arrays that resulted from reallocs would not > typically be aligned, but we cannot in any case expect all arrays to > be aligned. The problem would be the interaction between the aligned allocator and realloc. If we do it by "ourselves", then it is no problem, but we cannot do it if we rely on posix_memalign/memalign. I was not precise enough when I mentioned portability: I meant that it was not possible to implement realloc in terms of malloc/free portably, and as such, it is not possible to implement an aligned realloc with memalign/free. Also, I have not found any indication that the pointer given by posix_memalign can be fed to realloc: depending on the implementation, it cannot be freed... Maybe the solution is to never use posix_memalign... > As I understand it, a > portable aligned malloc allocates extra memory, then adds something to > the pointer to make the memory arena aligned. Yes, that's how it works in fftw, and the implementation I have for numpy is derived from fftw code. > I don't think any numpy function can assume that its input arrays are > aligned, since users may like to pass, say, mmap()ed hunks of a file, > over which numpy has no control of the alignment. In this case, then, > how much of a problem is it if a few stray realloc()s produce > non-aligned arrays? There are a few realloc, but they are used in core routines (PyArray_FromIter, PyArray_Resize). The point is to get aligned arrays most of the time, and avoiding losing alignement unless it is too complicated to do otherwise. cheers, David _______________________________________________ Numpy-discussion mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/numpy-discussion |
Free forum by Nabble | Edit this page |