Re: Compare NumPy arrays with threshold

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Compare NumPy arrays with threshold

Nissim Derdiger
Hi again,
Thanks for the responses to my question!
Roberts answer worked very well for me, except for 1 small issue:
 
This line:
close_mask = np.isclose(MatA, MatB, Threshold, equal_nan=True)
returns each difference twice – once j in compare to I and once for I in compare to j
 
for example:
 
for this input:
MatA = [[10,20,30],[40,50,60]]
MatB = [[10,30,30],[40,50,160]]
 
My old code will return:
0,1,20,30
1,3,60,160
You code returns:
0,1,20,30
1,3,60,160
0,1,30,20
1,3,160,60
 
 
I can simply cut "close_mask" to half so I'll have only 1 iteration, but that does not seems to be efficient..
any ideas?
 
 
 
Also, what should I change to support 3D arrays as well?
 
 
Thanks again,
Nissim.
 
 
 
 
-----Original Message-----
From: NumPy-Discussion [[hidden email]] On Behalf Of [hidden email]
Sent: Wednesday, May 17, 2017 8:17 PM
To: [hidden email]
Subject: NumPy-Discussion Digest, Vol 128, Issue 18
 
Send NumPy-Discussion mailing list submissions to
        [hidden email]
 
To subscribe or unsubscribe via the World Wide Web, visit
or, via email, send a message with subject or body 'help' to
        [hidden email]
 
You can reach the person managing the list at
        [hidden email]
 
When replying, please edit your Subject line so it is more specific than "Re: Contents of NumPy-Discussion digest..."
 
 
Today's Topics:
 
   1. Compare NumPy arrays with threshold and return the
      differences (Nissim Derdiger)
   2. Re: Compare NumPy arrays with threshold and return the
      differences (Paul Hobson)
   3. Re: Compare NumPy arrays with threshold and return the
      differences (Robert Kern)
 
 
----------------------------------------------------------------------
 
Message: 1
Date: Wed, 17 May 2017 16:50:40 +0000
From: Nissim Derdiger <[hidden email]>
Subject: [Numpy-discussion] Compare NumPy arrays with threshold and
        return the differences
Message-ID:
        <[hidden email]>
Content-Type: text/plain; charset="us-ascii"
 
Hi,
 
In my script, I need to compare big NumPy arrays (2D or 3D), and return a list of all cells with difference bigger than a defined threshold.
The compare itself can be done easily done with "allclose" function, like that:
Threshold = 0.1
if (np.allclose(Arr1, Arr2, Threshold, equal_nan=True)):
    Print('Same')
 
But this compare does not return which cells are not the same.
 
The easiest (yet naive) way to know which cells are not the same is to use a simple for loops code like this one:
def CheckWhichCellsAreNotEqualInArrays(Arr1,Arr2,Threshold):
   if not Arr1.shape == Arr2.shape:
       return ['Arrays size not the same']
   Dimensions = Arr1.shape
   Diff = []
   for i in range(Dimensions [0]):
       for j in range(Dimensions [1]):
           if not np.allclose(Arr1[i][j], Arr2[i][j], Threshold, equal_nan=True):
               Diff.append(',' + str(i) + ',' + str(j) + ',' + str(Arr1[i,j]) + ','
               + str(Arr2[i,j]) + ',' + str(Threshold) + ',Fail\n')
       return Diff
 
(and same for 3D arrays - with 1 more for loop) This way is very slow when the Arrays are big and full of none-equal cells.
 
Is there a fast straight forward way in case they are not the same - to get a list of the uneven cells? maybe some built-in function in the NumPy itself?
Thanks!
Nissim
 
 
-------------- next part --------------
An HTML attachment was scrubbed...
 
------------------------------
 
Message: 2
Date: Wed, 17 May 2017 10:13:46 -0700
From: Paul Hobson <[hidden email]>
To: Discussion of Numerical Python <[hidden email]>
Subject: Re: [Numpy-discussion] Compare NumPy arrays with threshold
        and return the differences
Message-ID:
        <[hidden email]>
Content-Type: text/plain; charset="utf-8"
 
I would do something like:
 
diff_is_large = (array1 - array2) > threshold index_at_large_diff = numpy.nonzero(diff_is_large)
array1[index_at_large_diff].tolist()
 
 
On Wed, May 17, 2017 at 9:50 AM, Nissim Derdiger <[hidden email]>
wrote:
 
> Hi,
> In my script, I need to compare big NumPy arrays (2D or 3D), and
> return a list of all cells with difference bigger than a defined threshold.
> The compare itself can be done easily done with "allclose" function,
> like
> that:
> Threshold = 0.1
> if (np.allclose(Arr1, Arr2, Threshold, equal_nan=True)):
>     Print('Same')
> But this compare does not return *which* cells are not the same.
>
> The easiest (yet naive) way to know which cells are not the same is to
> use a simple for loops code like this one:
> def CheckWhichCellsAreNotEqualInArrays(Arr1,Arr2,Threshold):
>    if not Arr1.shape == Arr2.shape:
>        return ['Arrays size not the same']
>    Dimensions = Arr1.shape
>    Diff = []
>    for i in range(Dimensions [0]):
>        for j in range(Dimensions [1]):
>            if not np.allclose(Arr1[i][j], Arr2[i][j], Threshold,
> equal_nan=True):
>                Diff.append(',' + str(i) + ',' + str(j) + ',' +
> str(Arr1[i,j]) + ','
>                + str(Arr2[i,j]) + ',' + str(Threshold) + ',Fail\n')
>        return Diff
> (and same for 3D arrays - with 1 more for loop) This way is very slow
> when the Arrays are big and full of none-equal cells.
>
> Is there a fast straight forward way in case they are not the same -
> to get a list of the uneven cells? maybe some built-in function in the
> NumPy itself?
> Thanks!
> Nissim
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
 
------------------------------
 
Message: 3
Date: Wed, 17 May 2017 10:16:09 -0700
From: Robert Kern <[hidden email]>
To: Discussion of Numerical Python <[hidden email]>
Subject: Re: [Numpy-discussion] Compare NumPy arrays with threshold
        and return the differences
Message-ID:
        <[hidden email]>
Content-Type: text/plain; charset="utf-8"
 
On Wed, May 17, 2017 at 9:50 AM, Nissim Derdiger <[hidden email]>
wrote:
 
> Hi,
> In my script, I need to compare big NumPy arrays (2D or 3D), and
> return a list of all cells with difference bigger than a defined threshold.
> The compare itself can be done easily done with "allclose" function,
> like
> that:
> Threshold = 0.1
> if (np.allclose(Arr1, Arr2, Threshold, equal_nan=True)):
>     Print('Same')
> But this compare does not return *which* cells are not the same.
>
> The easiest (yet naive) way to know which cells are not the same is to
> use a simple for loops code like this one:
> def CheckWhichCellsAreNotEqualInArrays(Arr1,Arr2,Threshold):
>    if not Arr1.shape == Arr2.shape:
>        return ['Arrays size not the same']
>    Dimensions = Arr1.shape
>    Diff = []
>    for i in range(Dimensions [0]):
>        for j in range(Dimensions [1]):
>            if not np.allclose(Arr1[i][j], Arr2[i][j], Threshold,
> equal_nan=True):
>                Diff.append(',' + str(i) + ',' + str(j) + ',' +
> str(Arr1[i,j]) + ','
>                + str(Arr2[i,j]) + ',' + str(Threshold) + ',Fail\n')
>        return Diff
> (and same for 3D arrays - with 1 more for loop) This way is very slow
> when the Arrays are big and full of none-equal cells.
>
> Is there a fast straight forward way in case they are not the same -
> to get a list of the uneven cells? maybe some built-in function in the
> NumPy itself?
>
 
Use `close_mask = np.isclose(Arr1, Arr2, Threshold, equal_nan=True)` to return a boolean mask the same shape as the arrays which is True where the elements are close and False where they are not. You can invert it to get a boolean mask which is True where they are "far" with respect to the
threshold: `far_mask = ~close_mask`. Then you can use `i_idx, j_idx = np.nonzero(far_mask)` to get arrays of the `i` and `j` indices where the values are far. For example:
 
for i, j in zip(i_idx, j_idx):
    print("{0}, {1}, {2}, {3}, {4}, Fail".format(i, j, Arr1[i, j], Arr2[i, j], Threshold))
 
--
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
 
------------------------------
 
Subject: Digest Footer
 
_______________________________________________
NumPy-Discussion mailing list
 
 
------------------------------
 
End of NumPy-Discussion Digest, Vol 128, Issue 18
*************************************************
 

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Compare NumPy arrays with threshold

Robert Kern-2
On Thu, May 18, 2017 at 5:07 AM, Nissim Derdiger <[hidden email]> wrote:
>
> Hi again,
> Thanks for the responses to my question!
> Roberts answer worked very well for me, except for 1 small issue:
>  
> This line:
> close_mask = np.isclose(MatA, MatB, Threshold, equal_nan=True)
> returns each difference twice – once j in compare to I and once for I in compare to j

No, it returns a boolean array the same size as MatA and MatB. It literally can't contain "each difference twice". Maybe there is something else in your code that is producing the doubling that you see, possibly in the printing of the results.

I'm not seeing the behavior that you speak of. Please post your complete code that produced the doubled output that you see.

import numpy as np

MatA = np.array([[10,20,30],[40,50,60]])
MatB = np.array([[10,30,30],[40,50,160]])
Threshold = 1.0

# Note the `atol=` here. I missed it before.
close_mask = np.isclose(MatA, MatB, atol=Threshold, equal_nan=True)
far_mask = ~close_mask
i_idx, j_idx = np.nonzero(far_mask)
for i, j in zip(i_idx, j_idx):
    print("{0}, {1}, {2}, {3}, {4}, Fail".format(i, j, MatA[i, j], MatB[i, j], Threshold))


I get the following output:

$ python isclose.py
0, 1, 20, 30, 1.0, Fail
1, 2, 60, 160, 1.0, Fail

--
Robert Kern

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/numpy-discussion