in1d, but preserve shape of ar1

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

in1d, but preserve shape of ar1

Brenton R S Recht
I started an enhancement request in the Github bug tracker at https://github.com/numpy/numpy/issues/8331 , but Jaime Frio recommended I bring it to the mailing list.

`in1d` takes two arrays, `ar1` and `ar2`, and returns a 1d array with the same number of elements as `ar1`. The logical extension would be a function that does the same thing but returns a (possibly multi-dimensional) array of the same shape as `ar1`. The code already has a comment suggesting this could be done (see https://github.com/numpy/numpy/blob/master/numpy/lib/arraysetops.py#L444 ). 

I agree that changing the behavior of the existing function isn't an option, since it would break backwards compatability. I'm not sure adding an option keep_shape is good, since the name of the function ("1d") wouldn't match what it does (returns an array that might not be 1d). I think a new function is the way to go. This would be it, more or less:

def items_in(ar1, ar2, **kwargs):
  return np.in1d(ar1, ar2, **kwargs).reshape(ar1.shape)

Questions I have are:
* Function name? I was thinking something like `items_in` or `item_in`: the function returns whether each item in `ar1` is in `ar2`. Is "item" or "element" the right term here?
* Are there any other changes that need to happen in arraysetops.py? Or other files? I ask this because although the file says "Set operations for 1D numeric arrays" right at the top, it's growing increasingly not 1D: `unique` recently changed to operate on multidimensional arrays, and I'm proposing a multidimensional version of `in1d`. `ediff1d` could probably be tweaked into a version that operates along an axis the same way unique does now, fwiw. Mostly I want to know if I should put my code changes in this file or somewhere else.

Thanks,

-brsr

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: in1d, but preserve shape of ar1

Stephan Hoyer-2
I think this is a great idea!

I agree that we need a new function. Because the new API is almost strictly superior, we should try to pick a more general name that we can encourage users to switch to from in1d.

Pandas calls this method "isin", which I think is a perfectly good name for the multi-dimensional NumPy version, too:

It's a subjective call, but I would probably keep the new function in arraysetops.py. (This is the sort of question well suited to GitHub rather than the mailing list, though.) 


On Mon, Dec 19, 2016 at 3:25 PM, Brenton R S Recht <[hidden email]> wrote:
I started an enhancement request in the Github bug tracker at https://github.com/numpy/numpy/issues/8331 , but Jaime Frio recommended I bring it to the mailing list.

`in1d` takes two arrays, `ar1` and `ar2`, and returns a 1d array with the same number of elements as `ar1`. The logical extension would be a function that does the same thing but returns a (possibly multi-dimensional) array of the same shape as `ar1`. The code already has a comment suggesting this could be done (see https://github.com/numpy/numpy/blob/master/numpy/lib/arraysetops.py#L444 ). 

I agree that changing the behavior of the existing function isn't an option, since it would break backwards compatability. I'm not sure adding an option keep_shape is good, since the name of the function ("1d") wouldn't match what it does (returns an array that might not be 1d). I think a new function is the way to go. This would be it, more or less:

def items_in(ar1, ar2, **kwargs):
  return np.in1d(ar1, ar2, **kwargs).reshape(ar1.shape)

Questions I have are:
* Function name? I was thinking something like `items_in` or `item_in`: the function returns whether each item in `ar1` is in `ar2`. Is "item" or "element" the right term here?
* Are there any other changes that need to happen in arraysetops.py? Or other files? I ask this because although the file says "Set operations for 1D numeric arrays" right at the top, it's growing increasingly not 1D: `unique` recently changed to operate on multidimensional arrays, and I'm proposing a multidimensional version of `in1d`. `ediff1d` could probably be tweaked into a version that operates along an axis the same way unique does now, fwiw. Mostly I want to know if I should put my code changes in this file or somewhere else.

Thanks,

-brsr

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/numpy-discussion



_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: in1d, but preserve shape of ar1

Joseph Fox-Rabinovitz
Perhaps you could move the code from in1d to your new function and redefine in1d in terms of it? That may help encourage migration and also make deprecation easier down the line.

    -Joe

On Mon, Dec 19, 2016 at 8:43 PM, Stephan Hoyer <[hidden email]> wrote:
I think this is a great idea!

I agree that we need a new function. Because the new API is almost strictly superior, we should try to pick a more general name that we can encourage users to switch to from in1d.

Pandas calls this method "isin", which I think is a perfectly good name for the multi-dimensional NumPy version, too:

It's a subjective call, but I would probably keep the new function in arraysetops.py. (This is the sort of question well suited to GitHub rather than the mailing list, though.) 


On Mon, Dec 19, 2016 at 3:25 PM, Brenton R S Recht <[hidden email]> wrote:
I started an enhancement request in the Github bug tracker at https://github.com/numpy/numpy/issues/8331 , but Jaime Frio recommended I bring it to the mailing list.

`in1d` takes two arrays, `ar1` and `ar2`, and returns a 1d array with the same number of elements as `ar1`. The logical extension would be a function that does the same thing but returns a (possibly multi-dimensional) array of the same shape as `ar1`. The code already has a comment suggesting this could be done (see https://github.com/numpy/numpy/blob/master/numpy/lib/arraysetops.py#L444 ). 

I agree that changing the behavior of the existing function isn't an option, since it would break backwards compatability. I'm not sure adding an option keep_shape is good, since the name of the function ("1d") wouldn't match what it does (returns an array that might not be 1d). I think a new function is the way to go. This would be it, more or less:

def items_in(ar1, ar2, **kwargs):
  return np.in1d(ar1, ar2, **kwargs).reshape(ar1.shape)

Questions I have are:
* Function name? I was thinking something like `items_in` or `item_in`: the function returns whether each item in `ar1` is in `ar2`. Is "item" or "element" the right term here?
* Are there any other changes that need to happen in arraysetops.py? Or other files? I ask this because although the file says "Set operations for 1D numeric arrays" right at the top, it's growing increasingly not 1D: `unique` recently changed to operate on multidimensional arrays, and I'm proposing a multidimensional version of `in1d`. `ediff1d` could probably be tweaked into a version that operates along an axis the same way unique does now, fwiw. Mostly I want to know if I should put my code changes in this file or somewhere else.

Thanks,

-brsr

_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/numpy-discussion



_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/numpy-discussion



_______________________________________________
NumPy-Discussion mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/numpy-discussion
Loading...