
12

How can I make this function faster? It removes the ith row and
column from an array.
def cut(x, i):
idx = range(x.shape[0])
idx.remove(i)
y = x[idx,:]
y = y[:,idx]
return y
>> import numpy as np
>> x = np.random.rand(500,500)
>> timeit cut(x, 100)
100 loops, best of 3: 16.8 ms per loop
_______________________________________________
Numpydiscussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpydiscussion


On Fri, May 2, 2008 at 5:47 PM, Keith Goodman < [hidden email]> wrote:
>
> On Fri, May 2, 2008 at 5:38 PM, Charles R Harris
> < [hidden email]> wrote:
> > On Fri, May 2, 2008 at 6:24 PM, Keith Goodman < [hidden email]> wrote:
> > > How can I make this function faster? It removes the ith row and
> > > column from an array.
> > >
> >
> > Why do you want to do that?
>
> Single linkage clustering; x is the distance matrix.
Here's the full code if you are interested. I haven't used it yet
other than running the test and test2 so it may be full of bugs.
import time
import numpy as np
class Cluster:
"Single linkage hierarchical clustering"
def __init__(self, dist, label=None, debug=False):
"""
dist Distance matrix, NxN numpy array
label Labels of each row of the distance matrix, list of N items,
default is range(N)
"""
assert dist.shape[0] == dist.shape[1], 'dist must be square (nxn)'
assert (np.abs(dist  dist.T) < 1e8).all(), 'dist must be symmetric'
if label is None:
label = range(dist.shape[0])
assert dist.shape[0] == len(label), 'dist and label must match
in size'
self.c = [[[z] for z in label]]
self.label = label
self.dist = dist
self.debug = debug
def run(self):
for level in xrange(len(self.label)  1):
i, j = self.min_dist()
self.join(i, j)
def join(self, i, j):
assert i != j, 'Cannot combine a cluster with itself'
# Join labels
new = list(self.c[1])
new[i] = new[i] + new[j]
new.pop(j)
self.c.append(new)
# Join distance matrix
self.dist[:,i] = self.dist[:,[i,j]].min(1)
self.dist[i,:] = self.dist[:,i]
idx = range(self.dist.shape[1])
idx.remove(j)
self.dist = self.dist[:,idx]
self.dist = self.dist[idx,:]
# Debug output
if self.debug:
print
print len(self.c)  1
print 'Clusters'
print self.c[1]
print 'Distance'
print self.dist
def min_dist(self):
dist = self.dist + 1e10 * np.eye(self.dist.shape[0])
i, j = np.where(dist == dist.min())
return i[0], j[0]
def test():
# Example from
# home.dei.polimi.it/matteucc/Clustering/tutorial_html/hierarchical.html
label = ['BA', 'FI', 'MI', 'NA', 'RM', 'TO']
dist = np.array([[0, 662, 877, 255, 412, 996],
[662, 0, 295, 468, 268, 400],
[877, 295, 0, 754, 564, 138],
[255, 468, 754, 0, 219, 869],
[412, 268, 564, 219, 0, 669],
[996, 400, 138, 869, 669, 0 ]])
clust = Cluster(dist, label, debug=True)
clust.run()
def test2(n):
x = np.random.rand(n,n)
x = x + x.T
c = Cluster(x)
t1 = time.time()
c.run()
t2 = time.time()
print 'n = %d took %0.2f seconds' % (n, t2t1)
_______________________________________________
Numpydiscussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpydiscussion


On Fri, May 2, 2008 at 7:24 PM, Keith Goodman < [hidden email]> wrote:
> How can I make this function faster? It removes the ith row and
> column from an array.
>
> def cut(x, i):
> idx = range(x.shape[0])
> idx.remove(i)
> y = x[idx,:]
> y = y[:,idx]
> return y
>
> >> import numpy as np
> >> x = np.random.rand(500,500)
> >> timeit cut(x, 100)
> 100 loops, best of 3: 16.8 ms per loop
I can get a ~20% improvement with the following:
In [8]: %timeit cut(x, 100)
10 loops, best of 3: 21.6 ms per loop
In [9]: def mycut(x, i):
...: A = x[:i,:i]
...: B = x[:i,i+1:]
...: C = x[i+1:,:i]
...: D = x[i+1:,i+1:]
...: return hstack([vstack([A,C]),vstack([B,D])])
...:
In [10]: %timeit mycut(x, 100)
10 loops, best of 3: 17.3 ms per loop
The hstack(vstack, vstack) seems to be somewhat better than
vstack(hstack, hstack), at least for these sizes.

Robert Kern
"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
 Umberto Eco
_______________________________________________
Numpydiscussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpydiscussion


On Fri, May 2, 2008 at 6:05 PM, Robert Kern < [hidden email]> wrote:
> On Fri, May 2, 2008 at 7:24 PM, Keith Goodman < [hidden email]> wrote:
>
>
> > How can I make this function faster? It removes the ith row and
> > column from an array.
> >
> > def cut(x, i):
> > idx = range(x.shape[0])
> > idx.remove(i)
> > y = x[idx,:]
> > y = y[:,idx]
> > return y
> >
> > >> import numpy as np
> > >> x = np.random.rand(500,500)
> > >> timeit cut(x, 100)
> > 100 loops, best of 3: 16.8 ms per loop
>
> I can get a ~20% improvement with the following:
>
> In [8]: %timeit cut(x, 100)
> 10 loops, best of 3: 21.6 ms per loop
>
> In [9]: def mycut(x, i):
> ...: A = x[:i,:i]
> ...: B = x[:i,i+1:]
> ...: C = x[i+1:,:i]
> ...: D = x[i+1:,i+1:]
> ...: return hstack([vstack([A,C]),vstack([B,D])])
> ...:
>
> In [10]: %timeit mycut(x, 100)
> 10 loops, best of 3: 17.3 ms per loop
>
> The hstack(vstack, vstack) seems to be somewhat better than
> vstack(hstack, hstack), at least for these sizes.
Wow. I never would have come up with that. And I probably never will.
Original code:
>> cluster.test2(500)
n = 500 took 5.28 seconds
Your improvement:
>> cluster.test2(500)
n = 500 took 3.52 seconds
Much more than a 20% improvement when used in the larger program. Thank you.
_______________________________________________
Numpydiscussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpydiscussion


On Fri, May 2, 2008 at 7:16 PM, Keith Goodman < [hidden email]> wrote:
On Fri, May 2, 2008 at 6:05 PM, Robert Kern < [hidden email]> wrote:
> On Fri, May 2, 2008 at 7:24 PM, Keith Goodman < [hidden email]> wrote:
>
>
> > How can I make this function faster? It removes the ith row and
> > column from an array.
> >
> > def cut(x, i):
> > idx = range(x.shape[0])
> > idx.remove(i)
> > y = x[idx,:]
> > y = y[:,idx]
> > return y
> >
> > >> import numpy as np
> > >> x = np.random.rand(500,500)
> > >> timeit cut(x, 100)
> > 100 loops, best of 3: 16.8 ms per loop
>
> I can get a ~20% improvement with the following:
>
> In [8]: %timeit cut(x, 100)
> 10 loops, best of 3: 21.6 ms per loop
>
> In [9]: def mycut(x, i):
> ...: A = x[:i,:i]
> ...: B = x[:i,i+1:]
> ...: C = x[i+1:,:i]
> ...: D = x[i+1:,i+1:]
> ...: return hstack([vstack([A,C]),vstack([B,D])])
> ...:
>
> In [10]: %timeit mycut(x, 100)
> 10 loops, best of 3: 17.3 ms per loop
>
> The hstack(vstack, vstack) seems to be somewhat better than
> vstack(hstack, hstack), at least for these sizes.
Wow. I never would have come up with that. And I probably never will.
Original code:
>> cluster.test2(500)
n = 500 took 5.28 seconds
Your improvement:
>> cluster.test2(500)
n = 500 took 3.52 seconds
Much more than a 20% improvement when used in the larger program. Thank you. Isn't the lengthy part finding the distance between clusters? I can think of several ways to do that, but I think you will get a real speedup by doing that in c or c++. I have a module made in boost python that holds clusters and returns a list of lists containing their elements. Clusters are joined by joining any two elements, one from each. It wouldn't take much to add a distance function, but you could use the list of indices in each cluster to pull a subset out of the distance matrix and then find the minimum function in that. This also reminds me of Huffman codes.
Chuck
_______________________________________________
Numpydiscussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpydiscussion


On Fri, May 2, 2008 at 6:29 PM, Charles R Harris
< [hidden email]> wrote:
> Isn't the lengthy part finding the distance between clusters? I can think
> of several ways to do that, but I think you will get a real speedup by doing
> that in c or c++. I have a module made in boost python that holds clusters
> and returns a list of lists containing their elements. Clusters are joined
> by joining any two elements, one from each. It wouldn't take much to add a
> distance function, but you could use the list of indices in each cluster to
> pull a subset out of the distance matrix and then find the minimum function
> in that. This also reminds me of Huffman codes.
You're right. Finding the distance is slow. Is there any way to speed
up the function below? It returns the row and column indices of the
min value of the NxN array x.
def dist(x):
x = x + 1e10 * np.eye(x.shape[0])
i, j = np.where(x == x.min())
return i[0], j[0]
>> x = np.random.rand(500,500)
>> timeit dist(x)
100 loops, best of 3: 14.1 ms per loop
If the clustering gives me useful results, I'll ask you about your
boost code. I'll also take a look at Damian Eads's scipycluster.
_______________________________________________
Numpydiscussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpydiscussion


On Fri, May 2, 2008 at 8:02 PM, Keith Goodman < [hidden email]> wrote:
On Fri, May 2, 2008 at 6:29 PM, Charles R Harris
> Isn't the lengthy part finding the distance between clusters? I can think
> of several ways to do that, but I think you will get a real speedup by doing
> that in c or c++. I have a module made in boost python that holds clusters
> and returns a list of lists containing their elements. Clusters are joined
> by joining any two elements, one from each. It wouldn't take much to add a
> distance function, but you could use the list of indices in each cluster to
> pull a subset out of the distance matrix and then find the minimum function
> in that. This also reminds me of Huffman codes.
You're right. Finding the distance is slow. Is there any way to speed
up the function below? It returns the row and column indices of the
min value of the NxN array x.
def dist(x):
x = x + 1e10 * np.eye(x.shape[0]) x += x + diag(ones(x.shape[0])*1e10
would be faster.
i, j = np.where(x == x.min())
return i[0], j[0] i = x.argmin() j = i % x.shape[0] i = i / x.shape[0]
But I wouldn't worry about speed yet if you are just trying things out.
Chuck
_______________________________________________
Numpydiscussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpydiscussion


On Fri, May 2, 2008 at 9:02 PM, Keith Goodman < [hidden email]> wrote:
> On Fri, May 2, 2008 at 6:29 PM, Charles R Harris
>
> < [hidden email]> wrote:
>
> > Isn't the lengthy part finding the distance between clusters? I can think
> > of several ways to do that, but I think you will get a real speedup by doing
> > that in c or c++. I have a module made in boost python that holds clusters
> > and returns a list of lists containing their elements. Clusters are joined
> > by joining any two elements, one from each. It wouldn't take much to add a
> > distance function, but you could use the list of indices in each cluster to
> > pull a subset out of the distance matrix and then find the minimum function
> > in that. This also reminds me of Huffman codes.
>
> You're right. Finding the distance is slow. Is there any way to speed
> up the function below? It returns the row and column indices of the
> min value of the NxN array x.
>
> def dist(x):
> x = x + 1e10 * np.eye(x.shape[0])
> i, j = np.where(x == x.min())
>
> return i[0], j[0]
Assuming x is contiguous and you can modify x inplace:
In [1]: from numpy import *
In [2]: def dist(x):
...: x = x + 1e10 * eye(x.shape[0])
...: i, j = where(x == x.min())
...: return i[0], j[0]
...:
In [3]: def symmdist(N):
...: x = random.rand(N, N)
...: x = x + x.T
...: x.flat[::N+1] = 0
...: return x
...:
In [4]: symmdist(5)
Out[4]:
array([[ 0. , 0.87508654, 1.11691704, 0.80366071, 0.57966808],
[ 0.87508654, 0. , 1.5521685 , 1.74010886, 0.52156877],
[ 1.11691704, 1.5521685 , 0. , 1.22725396, 1.04101992],
[ 0.80366071, 1.74010886, 1.22725396, 0. , 1.94577965],
[ 0.57966808, 0.52156877, 1.04101992, 1.94577965, 0. ]])
In [5]: def kerndist(x):
...: N = x.shape[0]
...: x.flat[::N+1] = x.max()
...: ij = argmin(x.flat)
...: i, j = divmod(ij, N)
...: return i, j
...:
In [10]: x = symmdist(500)
In [15]: %timeit dist(x)
10 loops, best of 3: 19.9 ms per loop
In [16]: %timeit kerndist(x)
100 loops, best of 3: 4.38 ms per loop

Robert Kern
"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
 Umberto Eco
_______________________________________________
Numpydiscussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpydiscussion


On Fri, May 2, 2008 at 8:02 PM, Keith Goodman < [hidden email]> wrote:
On Fri, May 2, 2008 at 6:29 PM, Charles R Harris
> Isn't the lengthy part finding the distance between clusters? I can think
> of several ways to do that, but I think you will get a real speedup by doing
> that in c or c++. I have a module made in boost python that holds clusters
> and returns a list of lists containing their elements. Clusters are joined
> by joining any two elements, one from each. It wouldn't take much to add a
> distance function, but you could use the list of indices in each cluster to
> pull a subset out of the distance matrix and then find the minimum function
> in that. This also reminds me of Huffman codes.
You're right. Finding the distance is slow. Is there any way to speed
up the function below? It returns the row and column indices of the
min value of the NxN array x.
def dist(x):
x = x + 1e10 * np.eye(x.shape[0])
i, j = np.where(x == x.min())
return i[0], j[0]
>> x = np.random.rand(500,500)
>> timeit dist(x)
100 loops, best of 3: 14.1 ms per loop
If the clustering gives me useful results, I'll ask you about your
boost code. I'll also take a look at Damian Eads's scipycluster. That package looks nice. I think your time would be better spent learning how to use it than in rolling your own routines.
Chuck
_______________________________________________
Numpydiscussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpydiscussion


On Fri, May 2, 2008 at 7:25 PM, Robert Kern < [hidden email]> wrote:
> On Fri, May 2, 2008 at 9:02 PM, Keith Goodman < [hidden email]> wrote:
> > You're right. Finding the distance is slow. Is there any way to speed
> > up the function below? It returns the row and column indices of the
> > min value of the NxN array x.
> >
> > def dist(x):
> > x = x + 1e10 * np.eye(x.shape[0])
> > i, j = np.where(x == x.min())
> >
> > return i[0], j[0]
>
> Assuming x is contiguous and you can modify x inplace:
>
>
> In [1]: from numpy import *
>
> In [2]: def dist(x):
> ...: x = x + 1e10 * eye(x.shape[0])
> ...: i, j = where(x == x.min())
>
> ...: return i[0], j[0]
> ...:
>
> In [3]: def symmdist(N):
> ...: x = random.rand(N, N)
> ...: x = x + x.T
> ...: x.flat[::N+1] = 0
> ...: return x
> ...:
>
> In [4]: symmdist(5)
> Out[4]:
> array([[ 0. , 0.87508654, 1.11691704, 0.80366071, 0.57966808],
> [ 0.87508654, 0. , 1.5521685 , 1.74010886, 0.52156877],
> [ 1.11691704, 1.5521685 , 0. , 1.22725396, 1.04101992],
> [ 0.80366071, 1.74010886, 1.22725396, 0. , 1.94577965],
> [ 0.57966808, 0.52156877, 1.04101992, 1.94577965, 0. ]])
>
> In [5]: def kerndist(x):
> ...: N = x.shape[0]
> ...: x.flat[::N+1] = x.max()
> ...: ij = argmin(x.flat)
> ...: i, j = divmod(ij, N)
> ...: return i, j
> ...:
>
> In [10]: x = symmdist(500)
>
> In [15]: %timeit dist(x)
> 10 loops, best of 3: 19.9 ms per loop
>
> In [16]: %timeit kerndist(x)
> 100 loops, best of 3: 4.38 ms per loop
This change and your previous one cut the run time from 5.28 to 2.23
seconds. Thank you.
_______________________________________________
Numpydiscussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpydiscussion


On Fri, May 2, 2008 at 7:36 PM, Charles R Harris
< [hidden email]> wrote:
> On Fri, May 2, 2008 at 8:02 PM, Keith Goodman < [hidden email]> wrote:
> > If the clustering gives me useful results, I'll ask you about your
> > boost code. I'll also take a look at Damian Eads's scipycluster.
>
> That package looks nice. I think your time would be better spent learning
> how to use it than in rolling your own routines.
Yeah, but this is more fun.
_______________________________________________
Numpydiscussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpydiscussion


On Fri, May 2, 2008 at 7:25 PM, Robert Kern < [hidden email]> wrote:
> In [5]: def kerndist(x):
> ...: N = x.shape[0]
> ...: x.flat[::N+1] = x.max()
> ...: ij = argmin(x.flat)
> ...: i, j = divmod(ij, N)
> ...: return i, j
I replaced
ij = argmin(x.flat)
with
x.argmin()
(they're the same in this context, right?) for a slight speed up. Now
I'm down to 1.9 seconds.
>> timeit argmin(x.flat)
10000 loops, best of 3: 24.8 µs per loop
>> timeit argmin(x)
100000 loops, best of 3: 9.19 µs per loop
>> timeit x.argmin()
100000 loops, best of 3: 8.06 µs per loop
OK, enough waytooearly optimization. But I'll definately add i, j =
modiv(x.argmin(), x.shape[0]) to my toolbelt.
_______________________________________________
Numpydiscussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpydiscussion


On Fri, May 2, 2008 at 10:48 PM, Keith Goodman < [hidden email]> wrote:
> On Fri, May 2, 2008 at 7:25 PM, Robert Kern < [hidden email]> wrote:
> > In [5]: def kerndist(x):
> > ...: N = x.shape[0]
> > ...: x.flat[::N+1] = x.max()
> > ...: ij = argmin(x.flat)
> > ...: i, j = divmod(ij, N)
> > ...: return i, j
>
> I replaced
>
> ij = argmin(x.flat)
>
> with
>
> x.argmin()
>
> (they're the same in this context, right?) for a slight speed up. Now
> I'm down to 1.9 seconds.
Yes, they're the same. I forgot that .argmin() flattened by default.

Robert Kern
"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
 Umberto Eco
_______________________________________________
Numpydiscussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpydiscussion


You know, for linkage clustering and BHC, I've found it a lot easier
to work with an intermediate 1d map of indices and never resize the
distance matrix. I then just remove one element from this map at each
iteration, which is a LOT faster than removing a column and a row from
a matrix. if idxmap were the map, you would be working with
X[idxmap[i], idxmap[j] ] instead of X[i, j]. Also, you could even use
a python list in this case, as they are a lot better for deletions
than an array.
Hoyt
On Fri, May 2, 2008 at 5:47 PM, Keith Goodman < [hidden email]> wrote:
>
> On Fri, May 2, 2008 at 5:38 PM, Charles R Harris
> < [hidden email]> wrote:
> > On Fri, May 2, 2008 at 6:24 PM, Keith Goodman < [hidden email]> wrote:
> > > How can I make this function faster? It removes the ith row and
> > > column from an array.
> > >
> >
> > Why do you want to do that?
>
> Single linkage clustering; x is the distance matrix.
>
>
> _______________________________________________
> Numpydiscussion mailing list
> [hidden email]
> http://projects.scipy.org/mailman/listinfo/numpydiscussion>

+++++++++++++++++++++++++++++++++++
Hoyt Koepke
UBC Department of Computer Science
http://www.cs.ubc.ca/~hoytak/[hidden email]
+++++++++++++++++++++++++++++++++++
_______________________________________________
Numpydiscussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpydiscussion


On Fri, May 2, 2008 at 11:51 PM, Hoyt Koepke < [hidden email]> wrote:
> You know, for linkage clustering and BHC, I've found it a lot easier
> to work with an intermediate 1d map of indices and never resize the
> distance matrix. I then just remove one element from this map at each
> iteration, which is a LOT faster than removing a column and a row from
> a matrix. if idxmap were the map, you would be working with
> X[idxmap[i], idxmap[j] ] instead of X[i, j]. Also, you could even use
> a python list in this case, as they are a lot better for deletions
> than an array.
I thought of replacing the row and column with 1e10 instead of
deleting it. But your idea is better. If I use lists then I bet Psyco
will speed things up. Thanks for the tip.
_______________________________________________
Numpydiscussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpydiscussion


Robert Kern wrote:
> I can get a ~20% improvement with the following:
> In [9]: def mycut(x, i):
> ...: A = x[:i,:i]
> ...: B = x[:i,i+1:]
> ...: C = x[i+1:,:i]
> ...: D = x[i+1:,i+1:]
> ...: return hstack([vstack([A,C]),vstack([B,D])])
Might it be a touch faster to built the final array first, then fill it:
def mycut(x, i):
r,c = x.shape
out = np.empty((r1, c1), dtype=x.dtype)
out[:i,:i] = x[:i,:i]
out[:i,i:] = x[:i,i+1:]
out[i:,:i] = x[i+1:,:i]
out[i:,i+1:] = x[i+1:,i+1:]
return out
totally untested.
That should save the creation of two temporaries.
Chris

Christopher Barker, Ph.D.
Oceanographer
NOAA/OR&R/HAZMAT (206) 5266959 voice
7600 Sand Point Way NE (206) 5266329 fax
Seattle, WA 98115 (206) 5266317 main reception
_______________________________________________
Numpydiscussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpydiscussion


On Sat, May 3, 2008 at 7:05 PM, Christopher Barker
< [hidden email]> wrote:
> Robert Kern wrote:
> > I can get a ~20% improvement with the following:
>
> > In [9]: def mycut(x, i):
> > ...: A = x[:i,:i]
> > ...: B = x[:i,i+1:]
> > ...: C = x[i+1:,:i]
> > ...: D = x[i+1:,i+1:]
> > ...: return hstack([vstack([A,C]),vstack([B,D])])
>
> Might it be a touch faster to built the final array first, then fill it:
>
> def mycut(x, i):
> r,c = x.shape
> out = np.empty((r1, c1), dtype=x.dtype)
> out[:i,:i] = x[:i,:i]
> out[:i,i:] = x[:i,i+1:]
> out[i:,:i] = x[i+1:,:i]
> out[i:,i+1:] = x[i+1:,i+1:]
> return out
>
> totally untested.
>
> That should save the creation of two temporaries.
After fixing the last statement to "out[i:,i:] = ...", yes, much
faster. About a factor of 5 with N=500.

Robert Kern
"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
 Umberto Eco
_______________________________________________
Numpydiscussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpydiscussion


In reply to this post by Chris Barker  NOAA Federal
On Sat, May 3, 2008 at 5:05 PM, Christopher Barker
< [hidden email]> wrote:
> Robert Kern wrote:
> > I can get a ~20% improvement with the following:
>
>
> > In [9]: def mycut(x, i):
> > ...: A = x[:i,:i]
> > ...: B = x[:i,i+1:]
> > ...: C = x[i+1:,:i]
> > ...: D = x[i+1:,i+1:]
> > ...: return hstack([vstack([A,C]),vstack([B,D])])
>
> Might it be a touch faster to built the final array first, then fill it:
>
> def mycut(x, i):
> r,c = x.shape
> out = np.empty((r1, c1), dtype=x.dtype)
> out[:i,:i] = x[:i,:i]
> out[:i,i:] = x[:i,i+1:]
> out[i:,:i] = x[i+1:,:i]
> out[i:,i+1:] = x[i+1:,i+1:]
> return out
>
> totally untested.
>
> That should save the creation of two temporaries.
Initializing the array makes sense. And it is super fast:
>> timeit mycut(x, 6)
100 loops, best of 3: 7.48 ms per loop
>> timeit mycut2(x, 6)
1000 loops, best of 3: 1.5 ms per loop
The time it takes to cluster went from about 1.9 seconds to 0.7
seconds! Thank you.
When I run the single linkage clustering on my data I get one big
cluster and a bunch of tiny clusters. So I need to try a different
linkage method. Average linkage sounds good, but it sounds hard to
code.
_______________________________________________
Numpydiscussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpydiscussion

12
