# Multiple Boolean Operations

26 messages
12
Open this post in threaded view
|

## Multiple Boolean Operations

 Hi All,     I am building some 3D grids for visualization starting from a much bigger grid. I build these grids by satisfying certain conditions on x, y, z coordinates of their cells: up to now I was using VTK to perform this operation, but VTK is slow as a turtle, so I thought to use numpy to get the cells I am interested in. Basically, for every cell I have the coordinates of its center point (centroids), named xCent, yCent and zCent. These values are stored in numpy arrays (i.e., if I have 10,000 cells, I have 3 vectors xCent, yCent and zCent with 10,000 values in them). What I'd like to do is: # Filter cells which do not satisfy Z requirements: zReq = zMin <= zCent <= zMax # After that, filter cells which do not satisfy Y requirements, # but apply this filter only on cells who satisfy the above condition: yReq = yMin <= yCent <= yMax # After that, filter cells which do not satisfy X requirements, # but apply this filter only on cells who satisfy the 2 above conditions: xReq = xMin <= xCent <= xMax I'd like to end up with a vector of indices which tells me which are the cells in the original grid that satisfy all 3 conditions. I know that something like this: zReq = zMin <= zCent <= zMax Can not be done directly in numpy, as the first statement executed returns a vector of boolean. Also, if I do something like: zReq1 = numpy.nonzero(zCent <= zMax) zReq2 = numpy.nonzero(zCent[zReq1] >= zMin) I lose the original indices of the grid, as in the second statement zCent[zReq1] has no more the size of the original grid but it has already been filtered out. Is there anything I could try in numpy to get what I am looking for? Sorry if the description is not very clear :-D Thank you very much for your suggestions. Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/_______________________________________________ Numpy-discussion mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: Multiple Boolean Operations

 Hi Andrea 2008/5/22 Andrea Gavana <[hidden email]>: >    I am building some 3D grids for visualization starting from a much > bigger grid. I build these grids by satisfying certain conditions on > x, y, z coordinates of their cells: up to now I was using VTK to > perform this operation, but VTK is slow as a turtle, so I thought to > use numpy to get the cells I am interested in. > Basically, for every cell I have the coordinates of its center point > (centroids), named xCent, yCent and zCent. These values are stored in > numpy arrays (i.e., if I have 10,000 cells, I have 3 vectors xCent, > yCent and zCent with 10,000 values in them). What I'd like to do is: You clearly have a large dataset, otherwise speed wouldn't have been a concern to you.  You can do your operation in one pass over the data, and I'd suggest you try doing that with Cython or Ctypes.  If you need an example on how to access data using those methods, let me know. Of course, it *can* be done using NumPy (maybe not in one pass), but thinking in terms of for-loops is sometimes easier, and immediately takes you to a highly optimised execution time. Cheers Stéfan _______________________________________________ Numpy-discussion mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: Multiple Boolean Operations

 Hi Stefan & All, On Thu, May 22, 2008 at 12:29 PM, Stéfan van der Walt wrote: > Hi Andrea > > 2008/5/22 Andrea Gavana <[hidden email]>: >>    I am building some 3D grids for visualization starting from a much >> bigger grid. I build these grids by satisfying certain conditions on >> x, y, z coordinates of their cells: up to now I was using VTK to >> perform this operation, but VTK is slow as a turtle, so I thought to >> use numpy to get the cells I am interested in. >> Basically, for every cell I have the coordinates of its center point >> (centroids), named xCent, yCent and zCent. These values are stored in >> numpy arrays (i.e., if I have 10,000 cells, I have 3 vectors xCent, >> yCent and zCent with 10,000 values in them). What I'd like to do is: > > You clearly have a large dataset, otherwise speed wouldn't have been a > concern to you.  You can do your operation in one pass over the data, > and I'd suggest you try doing that with Cython or Ctypes.  If you need > an example on how to access data using those methods, let me know. > > Of course, it *can* be done using NumPy (maybe not in one pass), but > thinking in terms of for-loops is sometimes easier, and immediately > takes you to a highly optimised execution time. First of all, thank you for your answer. I know next to nothing about Cython and very little about Ctypes, but it would be nice to have an example on how to use them to speed up the operations. Actually, I don't really know if my dataset is "large", as I work normally with xCent, yCent and zCent vectors of about 100,000-300,000 elements in them. However, all the other operations I do with numpy on these vectors are pretty fast (reshaping, re-casting, min(), max() and so on). So I believe that also a pure numpy solution might perform well enough for my needs: but I am really no expert in numpy, so please forgive any mistake I'm doing :-D. Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/_______________________________________________ Numpy-discussion mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: Multiple Boolean Operations

 In reply to this post by Andrea Gavana A Thursday 22 May 2008, Andrea Gavana escrigué: > Hi All, > >     I am building some 3D grids for visualization starting from a > much bigger grid. I build these grids by satisfying certain > conditions on x, y, z coordinates of their cells: up to now I was > using VTK to perform this operation, but VTK is slow as a turtle, so > I thought to use numpy to get the cells I am interested in. > Basically, for every cell I have the coordinates of its center point > (centroids), named xCent, yCent and zCent. These values are stored in > numpy arrays (i.e., if I have 10,000 cells, I have 3 vectors xCent, > yCent and zCent with 10,000 values in them). What I'd like to do is: > > # Filter cells which do not satisfy Z requirements: > zReq = zMin <= zCent <= zMax > > # After that, filter cells which do not satisfy Y requirements, > # but apply this filter only on cells who satisfy the above > condition: > > yReq = yMin <= yCent <= yMax > > # After that, filter cells which do not satisfy X requirements, > # but apply this filter only on cells who satisfy the 2 above > conditions: > > xReq = xMin <= xCent <= xMax > > I'd like to end up with a vector of indices which tells me which are > the cells in the original grid that satisfy all 3 conditions. I know > that something like this: > > zReq = zMin <= zCent <= zMax > > Can not be done directly in numpy, as the first statement executed > returns a vector of boolean. Also, if I do something like: > > zReq1 = numpy.nonzero(zCent <= zMax) > zReq2 = numpy.nonzero(zCent[zReq1] >= zMin) > > I lose the original indices of the grid, as in the second statement > zCent[zReq1] has no more the size of the original grid but it has > already been filtered out. > > Is there anything I could try in numpy to get what I am looking for? > Sorry if the description is not very clear :-D > > Thank you very much for your suggestions. I don't know if this is what you want, but you can get the boolean arrays separately, do the intersection and finally get the interesting values (by using fancy indexing) or coordinates (by using .nonzero()).   Here it is an example: In [105]: a = numpy.arange(10,20) In [106]: c1=(a>=13)&(a<=17) In [107]: c2=(a>=14)&(a<=18) In [109]: all=c1&c2 In [110]: a[all] Out[110]: array([14, 15, 16, 17])   # the values In [111]: all.nonzero() Out[111]: (array([4, 5, 6, 7]),)    # the coordinates Hope that helps, -- Francesc Alted _______________________________________________ Numpy-discussion mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: Multiple Boolean Operations

 In reply to this post by Andrea Gavana On Thu, 22 May 2008, Andrea Gavana apparently wrote: > # Filter cells which do not satisfy Z requirements: > zReq = zMin <= zCent <= zMax This seems to raise a question: should numpy arrays support this standard Python idiom? Cheers, Alan Isaac _______________________________________________ Numpy-discussion mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: Multiple Boolean Operations

 In reply to this post by Andrea Gavana 2008/5/22 Andrea Gavana <[hidden email]>: > Hi All, > >    I am building some 3D grids for visualization starting from a much > bigger grid. I build these grids by satisfying certain conditions on > x, y, z coordinates of their cells: up to now I was using VTK to > perform this operation, but VTK is slow as a turtle, so I thought to > use numpy to get the cells I am interested in. > Basically, for every cell I have the coordinates of its center point > (centroids), named xCent, yCent and zCent. These values are stored in > numpy arrays (i.e., if I have 10,000 cells, I have 3 vectors xCent, > yCent and zCent with 10,000 values in them). What I'd like to do is: > > # Filter cells which do not satisfy Z requirements: > zReq = zMin <= zCent <= zMax > > # After that, filter cells which do not satisfy Y requirements, > # but apply this filter only on cells who satisfy the above condition: > > yReq = yMin <= yCent <= yMax > > # After that, filter cells which do not satisfy X requirements, > # but apply this filter only on cells who satisfy the 2 above conditions: > > xReq = xMin <= xCent <= xMax > > I'd like to end up with a vector of indices which tells me which are > the cells in the original grid that satisfy all 3 conditions. I know > that something like this: > > zReq = zMin <= zCent <= zMax > > Can not be done directly in numpy, as the first statement executed > returns a vector of boolean. Also, if I do something like: > > zReq1 = numpy.nonzero(zCent <= zMax) > zReq2 = numpy.nonzero(zCent[zReq1] >= zMin) > > I lose the original indices of the grid, as in the second statement > zCent[zReq1] has no more the size of the original grid but it has > already been filtered out. > > Is there anything I could try in numpy to get what I am looking for? > Sorry if the description is not very clear :-D > > Thank you very much for your suggestions. How about (as a pure numpy solution): valid = (z >= zMin) & (z <= zMax) valid[valid] &= (y[valid] >= yMin) & (y[valid] <= yMax) valid[valid] &= (x[valid] >= xMin) & (x[valid] <= xMax) inds = valid.nonzero() ? -- AJC McMorland, PhD candidate Physiology, University of Auckland (Nearly) post-doctoral research fellow Neurobiology, University of Pittsburgh _______________________________________________ Numpy-discussion mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: Multiple Boolean Operations

 In reply to this post by Francesc Alted Hi Francesc & All, On Thu, May 22, 2008 at 1:04 PM, Francesc Alted wrote: > I don't know if this is what you want, but you can get the boolean > arrays separately, do the intersection and finally get the interesting > values (by using fancy indexing) or coordinates (by using .nonzero()). > Here it is an example: > > In [105]: a = numpy.arange(10,20) > > In [106]: c1=(a>=13)&(a<=17) > > In [107]: c2=(a>=14)&(a<=18) > > In [109]: all=c1&c2 > > In [110]: a[all] > Out[110]: array([14, 15, 16, 17])   # the values > > In [111]: all.nonzero() > Out[111]: (array([4, 5, 6, 7]),)    # the coordinates Thank you for this suggestion! I had forgotten  that this worked in numpy :-( . I have written a couple of small functions to test your method and my method (hopefully I did it correctly for both). On my computer (Toshiba Notebook 2.00 GHz, Windows XP SP2, 1GB Ram, Python 2.5, numpy 1.0.3.1), your solution is about 30 times faster than mine (implemented when I didn't know about multiple boolean operations in numpy). This is my code: # Begin Code import numpy from timeit import Timer # Number of cells in my original grid nCells = 150000 # Define some constraints for X, Y, Z xMin, xMax = 250.0, 700.0 yMin, yMax = 1000.0, 1900.0 zMin, zMax = 120.0, 300.0 # Generate random centroids for the cells xCent = 1000.0*numpy.random.rand(nCells) yCent = 2500.0*numpy.random.rand(nCells) zCent = 400.0*numpy.random.rand(nCells) def MultipleBoolean1():     """ Andrea's solution, slow :-( ."""     xReq_1 = numpy.nonzero(xCent >= xMin)     xReq_2 = numpy.nonzero(xCent <= xMax)     yReq_1 = numpy.nonzero(yCent >= yMin)     yReq_2 = numpy.nonzero(yCent <= yMax)     zReq_1 = numpy.nonzero(zCent >= zMin)     zReq_2 = numpy.nonzero(zCent <= zMax)     xReq = numpy.intersect1d_nu(xReq_1, xReq_2)     yReq = numpy.intersect1d_nu(yReq_1, yReq_2)     zReq = numpy.intersect1d_nu(zReq_1, zReq_2)     xyReq = numpy.intersect1d_nu(xReq, yReq)     xyzReq = numpy.intersect1d_nu(xyReq, zReq) def MultipleBoolean2():     """ Francesc's's solution, Much faster :-) ."""     xyzReq = (xCent >= xMin) & (xCent <= xMax) &  \              (yCent >= yMin) & (yCent <= yMax) &  \              (zCent >= zMin) & (zCent <= zMax)     xyzReq = numpy.nonzero(xyzReq)[0] if __name__ == "__main__":     trial = 10     t = Timer("MultipleBoolean1()", "from __main__ import MultipleBoolean1")     print "\n\nAndrea's Solution: %0.8g Seconds/Trial"%(t.timeit(number=trial)/trial)     t = Timer("MultipleBoolean2()", "from __main__ import MultipleBoolean2")     print "Francesc's Solution: %0.8g Seconds/Trial\n"%(t.timeit(number=trial)/trial) # End Code And I get this timing on my PC: Andrea's Solution: 0.34946193 Seconds/Trial Francesc's Solution: 0.011288139 Seconds/Trial If I implemented everything correctly, this is an amazing improvement. Thank you to everyone who provided suggestions, and thanks to the list :-D Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/_______________________________________________ Numpy-discussion mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: Multiple Boolean Operations

 In reply to this post by Alan G Isaac Alan G Isaac wrote: > On Thu, 22 May 2008, Andrea Gavana apparently wrote: >   >> # Filter cells which do not satisfy Z requirements: >> zReq = zMin <= zCent <= zMax >>     > > This seems to raise a question: > should numpy arrays support this standard Python idiom? >   It would be nice, but alas it requires a significant change to Python first to give us the hooks to modify.   (We need the 'and' and 'or' operations to return "vectors" instead of just numbers as they do now).     There is a PEP to allow this, but it has not received much TLC as of late.   The difficulty in the implementation is supporting "short-circuited" evaluation. -Travis _______________________________________________ Numpy-discussion mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: Multiple Boolean Operations

 In reply to this post by Andrea Gavana Hi Andrea 2008/5/22 Andrea Gavana <[hidden email]>: >> You clearly have a large dataset, otherwise speed wouldn't have been a >> concern to you.  You can do your operation in one pass over the data, >> and I'd suggest you try doing that with Cython or Ctypes.  If you need >> an example on how to access data using those methods, let me know. >> >> Of course, it *can* be done using NumPy (maybe not in one pass), but >> thinking in terms of for-loops is sometimes easier, and immediately >> takes you to a highly optimised execution time. > > First of all, thank you for your answer. I know next to nothing about > Cython and very little about Ctypes, but it would be nice to have an > example on how to use them to speed up the operations. Actually, I > don't really know if my dataset is "large", as I work normally with > xCent, yCent and zCent vectors of about 100,000-300,000 elements in > them. Just to clarify things in my mind: is VTK *that* slow?  I find that surprising, since it is written in C or C++. Regards Stéfan _______________________________________________ Numpy-discussion mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: Multiple Boolean Operations

 On Thu, May 22, 2008 at 12:26 PM, Stéfan van der Walt <[hidden email]> wrote: > Just to clarify things in my mind: is VTK *that* slow?  I find that > surprising, since it is written in C or C++. Performance can depend more on the design of the code than the implementation language. There are several places in VTK which are slower than they strictly could be because VTK exposes data primarily through abstract interfaces and only sometimes expose underlying data structure for faster processing. Quite sensibly, they implement the general form first. It's much the same with parts of numpy. The iterator abstraction lets you work on arbitrarily strided arrays, but for contiguous arrays, just using the pointer lets you, and the compiler, optimize your code more. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth."  -- Umberto Eco _______________________________________________ Numpy-discussion mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: Multiple Boolean Operations

 Hi All, On Thu, May 22, 2008 at 7:46 PM, Robert Kern  wrote: > On Thu, May 22, 2008 at 12:26 PM, Stéfan van der Walt <[hidden email]> wrote: >> Just to clarify things in my mind: is VTK *that* slow?  I find that >> surprising, since it is written in C or C++. > > Performance can depend more on the design of the code than the > implementation language. There are several places in VTK which are > slower than they strictly could be because VTK exposes data primarily > through abstract interfaces and only sometimes expose underlying data > structure for faster processing. Quite sensibly, they implement the > general form first. Yes, Robert is perfectly right. VTK is quite handy in most of the situations, but in this case I had to recursively apply 3 thresholds (each one for X, Y and Z respectively) and the threshold construction (initialization) and its execution were much slower than my (sloppy) numpy result. Compared to the solution Francesc posted, the VTK approach simply disappears. By the way, about the solution Francesc posted: xyzReq = (xCent >= xMin) & (xCent <= xMax) &  \             (yCent >= yMin) & (yCent <= yMax) &  \             (zCent >= zMin) & (zCent <= zMax) xyzReq = numpy.nonzero(xyzReq)[0] Do you think is there any chance that a C extension (or something similar) could be faster? Or something else using weave? I understand that this solution is already highly optimized as it uses the power of numpy with the logic operations in Python, but I was wondering if I can make it any faster: on my PC, the algorithm runs in 0.01 seconds, more or less, for 150,000 cells, but today I encountered a case in which I had 10800 sub-grids... 10800*0.01 is close to 2 minutes :-( Otherwise, I will try and implement it in Fortran and wrap it with f2py, assuming I am able to do it correctly and the overhead of calling an external extension is not killing the execution time. Thank you very much for your sugestions. Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/_______________________________________________ Numpy-discussion mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: Multiple Boolean Operations

 Andrea Gavana wrote: > By the way, about the solution Francesc posted: > > xyzReq = (xCent >= xMin) & (xCent <= xMax) &  \ >             (yCent >= yMin) & (yCent <= yMax) &  \ >             (zCent >= zMin) & (zCent <= zMax) > > xyzReq = numpy.nonzero(xyzReq)[0] > > Do you think is there any chance that a C extension (or something > similar) could be faster? yep -- if I've be got this right, the above creates 7 temporary arrays. creating that many and pushing the data in and out of memory can be pretty slow for large arrays. In C, C++, Cython or Fortran, you can just do one loop, and one output array. It should be much faster for the big arrays. > Otherwise, I will try and implement it in Fortran and wrap it with > f2py, assuming I am able to do it correctly and the overhead of > calling an external extension is not killing the execution time. nope, that's one function call for the whole thing, negligible. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R            (206) 526-6959   voice 7600 Sand Point Way NE   (206) 526-6329   fax Seattle, WA  98115       (206) 526-6317   main reception [hidden email] _______________________________________________ Numpy-discussion mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: Multiple Boolean Operations

 In reply to this post by Andrea Gavana On Thu, May 22, 2008 at 2:16 PM, Andrea Gavana <[hidden email]> wrote: > By the way, about the solution Francesc posted: > > xyzReq = (xCent >= xMin) & (xCent <= xMax) &  \ >            (yCent >= yMin) & (yCent <= yMax) &  \ >            (zCent >= zMin) & (zCent <= zMax) > You could implement this with inplace operations to save memory: xyzReq = (xCent >= xMin) xyzReq &= (xCent <= xMax) xyzReq &= (yCent >= yMin) xyzReq &= (yCent <= yMax) xyzReq &= (zCent >= zMin) xyzReq &= (zCent <= zMax) > Do you think is there any chance that a C extension (or something > similar) could be faster? Or something else using weave? I understand > that this solution is already highly optimized as it uses the power of > numpy with the logic operations in Python, but I was wondering if I > can make it any faster A C implementation would certainly be faster, perhaps 5x faster, due to short-circuiting the AND operations and the fact that you'd only pass over the data once. OTOH I'd be very surprised if this is the slowest part of your application. -- Nathan Bell [hidden email] http://graphics.cs.uiuc.edu/~wnbell/_______________________________________________ Numpy-discussion mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: Multiple Boolean Operations

Open this post in threaded view
|

## Re: Multiple Boolean Operations

 Stéfan van der Walt wrote: > I wrote a quick proof of concept (no guarantees). Thanks for the example -- I like how Cython understands ndarrays! It looks like this code would break if x,y,and z are not C-contiguous -- should there be a check for that? -Chris > here (download using bzr, http://bazaar-vcs.org, or just grab the > files with your web browser): > > https://code.launchpad.net/~stefanv/+junk/xyz-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R            (206) 526-6959   voice 7600 Sand Point Way NE   (206) 526-6329   fax Seattle, WA  98115       (206) 526-6317   main reception [hidden email] _______________________________________________ Numpy-discussion mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: Multiple Boolean Operations

 In reply to this post by Chris Barker - NOAA Federal Hi Chris and All, On Thu, May 22, 2008 at 8:40 PM, Christopher Barker wrote: > Andrea Gavana wrote: >> By the way, about the solution Francesc posted: >> >> xyzReq = (xCent >= xMin) & (xCent <= xMax) &  \ >>             (yCent >= yMin) & (yCent <= yMax) &  \ >>             (zCent >= zMin) & (zCent <= zMax) >> >> xyzReq = numpy.nonzero(xyzReq)[0] >> >> Do you think is there any chance that a C extension (or something >> similar) could be faster? > > yep -- if I've be got this right, the above creates 7 temporary arrays. > creating that many and pushing the data in and out of memory can be > pretty slow for large arrays. > > In C, C++, Cython or Fortran, you can just do one loop, and one output > array. It should be much faster for the big arrays. Well, I have implemented it in 2 ways in Fortran, and actually the Fortran solutions are slower than the numpy one (2 and 3 times slower respectively). I attach the source code of the timing code and the 5 implementations I have at the moment (I have included Nathan's implementation, which is as fast as Francesc's one but it has the advantage of saving memory). The timing I get on my home PC are: Andrea's Solution: 0.42807561 Seconds/Trial Francesc's Solution: 0.018297884 Seconds/Trial Fortran Solution 1: 0.035862072 Seconds/Trial Fortran Solution 2: 0.029822338 Seconds/Trial Nathan's Solution: 0.018930507 Seconds/Trial Maybe my fortran coding is sloppy but I don't really know fortran so well to implement it better... Thank you so much to everybody for your suggestions so far :-D Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/_______________________________________________ Numpy-discussion mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/numpy-discussion MultipleBoolean.py (3K) Download Attachment MultipleBoolean3.f90 (780 bytes) Download Attachment MultipleBoolean4.f90 (866 bytes) Download Attachment
Open this post in threaded view
|

## Re: Multiple Boolean Operations

Open this post in threaded view
|

## Re: Multiple Boolean Operations

 Hi Andrea 2008/5/23 Andrea Gavana <[hidden email]>: > Thank you very much for this! I am going to try it and time it, > comparing it with the other implementations. I think I need to study a > bit your code as I know almost nothing about Cython :-D That won't be necessary -- the Fortran-implementation is guaranteed to win! Just to make sure, I timed it anyway (on somewhat larger arrays): Francesc's Solution: 0.062797403 Seconds/Trial Fortran Solution 1: 0.050316906 Seconds/Trial Fortran Solution 2: 0.052595496 Seconds/Trial Nathan's Solution: 0.055562282 Seconds/Trial Cython Solution: 0.06250751 Seconds/Trial Nathan's version runs over the data 6 times, and still does better than the Pyrex version.  I don't know why! But, hey, this algorithm is parallelisable!  Wait, no, it's bedtime. Regards Stéfan _______________________________________________ Numpy-discussion mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/numpy-discussion
Open this post in threaded view
|

## Re: Multiple Boolean Operations

 Hi Stefan & All, On Fri, May 23, 2008 at 1:02 AM, Stéfan van der Walt wrote: > Hi Andrea > > 2008/5/23 Andrea Gavana <[hidden email]>: >> Thank you very much for this! I am going to try it and time it, >> comparing it with the other implementations. I think I need to study a >> bit your code as I know almost nothing about Cython :-D > > That won't be necessary -- the Fortran-implementation is guaranteed to win! > > Just to make sure, I timed it anyway (on somewhat larger arrays): > > Francesc's Solution: 0.062797403 Seconds/Trial > Fortran Solution 1: 0.050316906 Seconds/Trial > Fortran Solution 2: 0.052595496 Seconds/Trial > Nathan's Solution: 0.055562282 Seconds/Trial > Cython Solution: 0.06250751 Seconds/Trial > > Nathan's version runs over the data 6 times, and still does better > than the Pyrex version.  I don't know why! > > But, hey, this algorithm is parallelisable!  Wait, no, it's bedtime. Thank you so much for testing, and thanks to the list for the kind help and suggestions. In any case, after all these different implementations, I think the only way to make it faster is to progressively reduce the size of the vectors on which I make the inequality tests, while somehow keeping the original vectors indices. Let me explain with a example: # step1 will be a vector with nCells elements, filled with True # or False values step1 = xCent >= xMin # step2 will be a vector with nonzero(step1) cells, filled # with True or False values step2 = xCent[step1] <= xMax # step3 will be a vector with nonzero(step2) cells, filled # with True or False values step3 = yCent[step2] >= yMin And so on. The probelm with this approach is that I lose the original indices for which I want all the inequality tests to succeed: for example, consider the following: >>> xMin, xMax = 3, 7 >>> xCent = numpy.arange(10) >>> step1 = xCent >= xMin >>> numpy.nonzero(step1)[0] array([3, 4, 5, 6, 7, 8, 9]) >>> step2 = xCent[step1] <= xMax >>> numpy.nonzero(step2)[0] array([0, 1, 2, 3, 4]) Which are no more the indices of the original vector as I shrunk down xCent to xCent[step1], and the real indices should be: >>> realStep = (xCent >= xMin)  & (xCent <= xMax) >>> numpy.nonzero(realStep)[0] array([0, 1, 2, 3, 4, 5, 6, 7]) So, now the question is. If I iteratively shrink down the vectors as I did before, is there any way to get back the original indices for which all the conditions are satisfied? Sorry if this looks like a dummy/noob question, is just I am not sure on how to implement it. Thank you very much for your help.  Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/_______________________________________________ Numpy-discussion mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/numpy-discussion