Hi,
In my script, I need to compare big NumPy arrays (2D or 3D), and return a list of all cells with difference bigger than a defined threshold.
The compare itself can be done easily done with "allclose" function, like that:
Threshold = 0.1
if (np.allclose(Arr1, Arr2, Threshold, equal_nan=True)):
Print('Same')
But this compare does not return which cells are not the same.
The easiest (yet naive) way to know which cells are not the same is to use a simple for loops code like this one:
def CheckWhichCellsAreNotEqualInArrays(Arr1,Arr2,Threshold):
if not Arr1.shape == Arr2.shape:
return ['Arrays size not the same']
Dimensions = Arr1.shape
Diff = []
for i in range(Dimensions [0]):
for j in range(Dimensions [1]):
if not np.allclose(Arr1[i][j], Arr2[i][j], Threshold, equal_nan=True):
Diff.append(',' + str(i) + ',' + str(j) + ',' + str(Arr1[i,j]) + ','
+ str(Arr2[i,j]) + ',' + str(Threshold) + ',Fail\n')
return Diff
(and same for 3D arrays - with 1 more for loop)
This way is very slow when the Arrays are big and full of none-equal cells.
Is there a fast straight forward way in case they are not the same - to get a list of the uneven cells? maybe some built-in function in the NumPy itself?
Thanks!
Nissim
_______________________________________________ NumPy-Discussion mailing list [hidden email] https://mail.python.org/mailman/listinfo/numpy-discussion |
I would do something like: diff_is_large = (array1 - array2) > threshold index_at_large_diff = numpy.nonzero(diff_is_large) array1[index_at_large_diff].tolist() On Wed, May 17, 2017 at 9:50 AM, Nissim Derdiger <[hidden email]> wrote:
_______________________________________________ NumPy-Discussion mailing list [hidden email] https://mail.python.org/mailman/listinfo/numpy-discussion |
In reply to this post by Nissim Derdiger
On Wed, May 17, 2017 at 9:50 AM, Nissim Derdiger <[hidden email]> wrote:
for i, j in zip(i_idx, j_idx): print("{0}, {1}, {2}, {3}, {4}, Fail".format(i, j, Arr1[i, j], Arr2[i, j], Threshold)) Robert Kern
_______________________________________________ NumPy-Discussion mailing list [hidden email] https://mail.python.org/mailman/listinfo/numpy-discussion |
In reply to this post by Nissim Derdiger
On 5/17/17 10:50 AM, Nissim Derdiger wrote:
> Hi, > In my script, I need to compare big NumPy arrays (2D or 3D), and return > a list of all cells with difference bigger than a defined threshold. > The compare itself can be done easily done with "allclose" function, > like that: > Threshold = 0.1 > if (np.allclose(Arr1, Arr2, Threshold, equal_nan=True)): > Print('Same') > But this compare does not return *_which_* cells are not the same. > > The easiest (yet naive) way to know which cells are not the same is to > use a simple for loops code like this one: > def CheckWhichCellsAreNotEqualInArrays(Arr1,Arr2,Threshold): > if not Arr1.shape == Arr2.shape: > return ['Arrays size not the same'] I think you have been exposed to too much Matlab :-) Why the [] around the string? The pythonic way to react to unexpected conditions is to raise an exception: raise ValueError('arrays size not the same') > Dimensions = Arr1.shape > Diff = [] > for i in range(Dimensions [0]): > for j in range(Dimensions [1]): > if not np.allclose(Arr1[i][j], Arr2[i][j], Threshold, > equal_nan=True): > Diff.append(',' + str(i) + ',' + str(j) + ',' + > str(Arr1[i,j]) + ',' > + str(Arr2[i,j]) + ',' + str(Threshold) + ',Fail\n') Here you are also doing something very unusual. Why do you concatenate all those strings? It would be more efficient to return the indexes of the array elements matching the conditions and print them out in a second step. > return Diff > (and same for 3D arrays - with 1 more for loop) > This way is very slow when the Arrays are big and full of none-equal cells. > > Is there a fast straight forward way in case they are not the same - to > get a list of the uneven cells? maybe some built-in function in the > NumPy itself? a = np.random.randn(100, 100) b = np.random.randn(100, 100) ids = np.nonzero(np.abs(a - b) > threshold) gives you a tuple of the indexes of the array elements pairs satisfying your condition. If you want to print them: matcha = a[ids] matchb = b[ids] idt = np.vstack(ids).T for i, ai, bi in zip(ids, matcha, matchb): c = ','.join(str(x) for x in i) print('{},{},{},{},Fail'.format(c, ai, bi,threshold)) works for 2D and 3D (on nD) arrays. However, if you have many elements matching your condition this is going to be slow and not very useful to look at. Maybe you can think about a different way to visualize this result. Cheers, Dan _______________________________________________ NumPy-Discussion mailing list [hidden email] https://mail.python.org/mailman/listinfo/numpy-discussion |
Free forum by Nabble | Edit this page |