Compare NumPy arrays with threshold and return the differences

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Compare NumPy arrays with threshold and return the differences

Nissim Derdiger
Hi,
In my script, I need to compare big NumPy arrays (2D or 3D), and return a list of all cells with difference bigger than a defined threshold.
The compare itself can be done easily done with "allclose" function, like that:
Threshold = 0.1
if (np.allclose(Arr1, Arr2, Threshold, equal_nan=True)):
    Print('Same')
But this compare does not return which cells are not the same.
 
The easiest (yet naive) way to know which cells are not the same is to use a simple for loops code like this one:
def CheckWhichCellsAreNotEqualInArrays(Arr1,Arr2,Threshold):
   if not Arr1.shape == Arr2.shape:
       return ['Arrays size not the same']
   Dimensions = Arr1.shape 
   Diff = []
   for i in range(Dimensions [0]):
       for j in range(Dimensions [1]):
           if not np.allclose(Arr1[i][j], Arr2[i][j], Threshold, equal_nan=True):
               Diff.append(',' + str(i) + ',' + str(j) + ',' + str(Arr1[i,j]) + ','
               + str(Arr2[i,j]) + ',' + str(Threshold) + ',Fail\n')
       return Diff
(and same for 3D arrays - with 1 more for loop)
This way is very slow when the Arrays are big and full of none-equal cells.
 
Is there a fast straight forward way in case they are not the same - to get a list of the uneven cells? maybe some built-in function in the NumPy itself?
Thanks!
Nissim
 
 

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Compare NumPy arrays with threshold and return the differences

Paul Hobson-2
I would do something like:

diff_is_large = (array1 - array2) > threshold
index_at_large_diff = numpy.nonzero(diff_is_large)
array1[index_at_large_diff].tolist()


On Wed, May 17, 2017 at 9:50 AM, Nissim Derdiger <[hidden email]> wrote:
Hi,
In my script, I need to compare big NumPy arrays (2D or 3D), and return a list of all cells with difference bigger than a defined threshold.
The compare itself can be done easily done with "allclose" function, like that:
Threshold = 0.1
if (np.allclose(Arr1, Arr2, Threshold, equal_nan=True)):
    Print('Same')
But this compare does not return which cells are not the same.
 
The easiest (yet naive) way to know which cells are not the same is to use a simple for loops code like this one:
def CheckWhichCellsAreNotEqualInArrays(Arr1,Arr2,Threshold):
   if not Arr1.shape == Arr2.shape:
       return ['Arrays size not the same']
   Dimensions = Arr1.shape 
   Diff = []
   for i in range(Dimensions [0]):
       for j in range(Dimensions [1]):
           if not np.allclose(Arr1[i][j], Arr2[i][j], Threshold, equal_nan=True):
               Diff.append(',' + str(i) + ',' + str(j) + ',' + str(Arr1[i,j]) + ','
               + str(Arr2[i,j]) + ',' + str(Threshold) + ',Fail\n')
       return Diff
(and same for 3D arrays - with 1 more for loop)
This way is very slow when the Arrays are big and full of none-equal cells.
 
Is there a fast straight forward way in case they are not the same - to get a list of the uneven cells? maybe some built-in function in the NumPy itself?
Thanks!
Nissim
 
 

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion



_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Compare NumPy arrays with threshold and return the differences

Robert Kern-2
In reply to this post by Nissim Derdiger
On Wed, May 17, 2017 at 9:50 AM, Nissim Derdiger <[hidden email]> wrote:
Hi,
In my script, I need to compare big NumPy arrays (2D or 3D), and return a list of all cells with difference bigger than a defined threshold.
The compare itself can be done easily done with "allclose" function, like that:
Threshold = 0.1
if (np.allclose(Arr1, Arr2, Threshold, equal_nan=True)):
    Print('Same')
But this compare does not return which cells are not the same.
 
The easiest (yet naive) way to know which cells are not the same is to use a simple for loops code like this one:
def CheckWhichCellsAreNotEqualInArrays(Arr1,Arr2,Threshold):
   if not Arr1.shape == Arr2.shape:
       return ['Arrays size not the same']
   Dimensions = Arr1.shape 
   Diff = []
   for i in range(Dimensions [0]):
       for j in range(Dimensions [1]):
           if not np.allclose(Arr1[i][j], Arr2[i][j], Threshold, equal_nan=True):
               Diff.append(',' + str(i) + ',' + str(j) + ',' + str(Arr1[i,j]) + ','
               + str(Arr2[i,j]) + ',' + str(Threshold) + ',Fail\n')
       return Diff
(and same for 3D arrays - with 1 more for loop)
This way is very slow when the Arrays are big and full of none-equal cells.
 
Is there a fast straight forward way in case they are not the same - to get a list of the uneven cells? maybe some built-in function in the NumPy itself?

Use `close_mask = np.isclose(Arr1, Arr2, Threshold, equal_nan=True)` to return a boolean mask the same shape as the arrays which is True where the elements are close and False where they are not. You can invert it to get a boolean mask which is True where they are "far" with respect to the threshold: `far_mask = ~close_mask`. Then you can use `i_idx, j_idx = np.nonzero(far_mask)` to get arrays of the `i` and `j` indices where the values are far. For example:

for i, j in zip(i_idx, j_idx):
    print("{0}, {1}, {2}, {3}, {4}, Fail".format(i, j, Arr1[i, j], Arr2[i, j], Threshold))

--
Robert Kern

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Compare NumPy arrays with threshold and return the differences

Daniele Nicolodi
In reply to this post by Nissim Derdiger
On 5/17/17 10:50 AM, Nissim Derdiger wrote:

> Hi,
> In my script, I need to compare big NumPy arrays (2D or 3D), and return
> a list of all cells with difference bigger than a defined threshold.
> The compare itself can be done easily done with "allclose" function,
> like that:
> Threshold = 0.1
> if (np.allclose(Arr1, Arr2, Threshold, equal_nan=True)):
>     Print('Same')
> But this compare does not return *_which_* cells are not the same.
>  
> The easiest (yet naive) way to know which cells are not the same is to
> use a simple for loops code like this one:
> def CheckWhichCellsAreNotEqualInArrays(Arr1,Arr2,Threshold):
>    if not Arr1.shape == Arr2.shape:
>        return ['Arrays size not the same']

I think you have been exposed to too much Matlab :-) Why the [] around
the string? The pythonic way to react to unexpected conditions is to
raise an exception:

         raise ValueError('arrays size not the same')

>    Dimensions = Arr1.shape
>    Diff = []
>    for i in range(Dimensions [0]):
>        for j in range(Dimensions [1]):
>            if not np.allclose(Arr1[i][j], Arr2[i][j], Threshold,
> equal_nan=True):
>                Diff.append(',' + str(i) + ',' + str(j) + ',' +
> str(Arr1[i,j]) + ','
>                + str(Arr2[i,j]) + ',' + str(Threshold) + ',Fail\n')

Here you are also doing something very unusual. Why do you concatenate
all those strings? It would be more efficient to return the indexes of
the array elements matching the conditions and print them out in a
second step.

>        return Diff
> (and same for 3D arrays - with 1 more for loop)
> This way is very slow when the Arrays are big and full of none-equal cells.
>  
> Is there a fast straight forward way in case they are not the same - to
> get a list of the uneven cells? maybe some built-in function in the
> NumPy itself?

a = np.random.randn(100, 100)
b = np.random.randn(100, 100)

ids = np.nonzero(np.abs(a - b) > threshold)

gives you a tuple of the indexes of the array elements pairs satisfying
your condition.  If you want to print them:

matcha = a[ids]
matchb = b[ids]

idt = np.vstack(ids).T

for i, ai, bi in zip(ids, matcha, matchb):
    c = ','.join(str(x) for x in i)
    print('{},{},{},{},Fail'.format(c, ai, bi,threshold))

works for 2D and 3D (on nD) arrays.

However, if you have many elements matching your condition this is going
to be slow and not very useful to look at. Maybe you can think about a
different way to visualize this result.

Cheers,
Dan

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion