Fix to #789 maybe not right.

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Fix to #789 maybe not right.

Charles R Harris
David,

I'm not sure that fix is completely correct. The out keyword is funny and I'm not what the specs are supposed to be, but generally the output is cast rather than an error raised. We need an official spec here because the documentation of this feature is essentially random. Note that the shapes don't really have to match, either.

In [1]: x = ones(5)

In [2]: out = ones(5, dtype=int8)

In [3]: cumsum(x, out=out)
Out[3]: array([1, 2, 3, 4, 5], dtype=int8)

In [4]: out = empty((5,1))

In [5]: cumsum(x, out=out)
Out[5]:
array([[ 1.],
       [ 2.],
       [ 3.],
       [ 4.],
       [ 5.]])

OTOH, out = empty((1,5)) doesn't work but doesn't raise an error. Confused? Me too.

Chuck

_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Fix to #789 maybe not right.

cdavid
Charles R Harris wrote:
> David,
>
> I'm not sure that fix is completely correct. The out keyword is funny
> and I'm not what the specs are supposed to be, but generally the
> output is cast rather than an error raised.

I think the out argument is one of this thing which is rather a mess
right now in numpy. The functions which accept it do not always mention
it, and as you say, there are various different behaviour.

What are the uses of the out argument ? The obvious one is saving
memory, but are there others ? Automatic casting would break the memory
saving.

> We need an official spec here because the documentation of this
> feature is essentially random. Note that the shapes don't really have
> to match, either.
>
> In [1]: x = ones(5)
>
> In [2]: out = ones(5, dtype=int8)
>
> In [3]: cumsum(x, out=out)
> Out[3]: array([1, 2, 3, 4, 5], dtype=int8)

I don't feel like this should work (because of the dtype).

>
> In [4]: out = empty((5,1))
>
> In [5]: cumsum(x, out=out)
> Out[5]:
> array([[ 1.],
>        [ 2.],
>        [ 3.],
>        [ 4.],
>        [ 5.]])
>
> OTOH, out = empty((1,5)) doesn't work but doesn't raise an error.

Not working and no error is obviously wrong. Note that after playing a
bit with this, I got segfaults when exciting python (I am working on
reproducint it).

cheers,

David
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Fix to #789 maybe not right.

Charles R Harris
In reply to this post by Charles R Harris


On Wed, May 21, 2008 at 8:48 PM, Charles R Harris <[hidden email]> wrote:
David,

I'm not sure that fix is completely correct. The out keyword is funny and I'm not what the specs are supposed to be, but generally the output is cast rather than an error raised. We need an official spec here because the documentation of this feature is essentially random. Note that the shapes don't really have to match, either.

In [1]: x = ones(5)

In [2]: out = ones(5, dtype=int8)

In [3]: cumsum(x, out=out)
Out[3]: array([1, 2, 3, 4, 5], dtype=int8)

In [4]: out = empty((5,1))

In [5]: cumsum(x, out=out)
Out[5]:
array([[ 1.],
       [ 2.],
       [ 3.],
       [ 4.],
       [ 5.]])

OTOH, out = empty((1,5)) doesn't work but doesn't raise an error. Confused? Me too.

And I'm not sure self->desc needs its reference count decremented, PyArray_FromArray is one of those vicious, nasty functions with side effects and might decrement the count itself. Note that the reference count is not decremented elsewhere. It's a capital offense to write such functions, but there it is.

On a lesser note, there is trailing whitespace on every new line except one.

Chuck


_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Fix to #789 maybe not right.

cdavid
Charles R Harris wrote:
>
>
> And I'm not sure self->desc needs its reference count decremented,
> PyArray_FromArray is one of those vicious, nasty functions with side
> effects and might decrement the count itself.

might ? What do you mean by might decrement ? If the call to
PyAarray_FromArray fails, no reference are stolen, right ?

> Note that the reference count is not decremented elsewhere. It's a
> capital offense to write such functions, but there it is.
>
> On a lesser note, there is trailing whitespace on every new line
> except one.

This is easier to fix :)

cheers,

David
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Fix to #789 maybe not right.

Travis Oliphant-5
David Cournapeau wrote:

> Charles R Harris wrote:
>  
>> And I'm not sure self->desc needs its reference count decremented,
>> PyArray_FromArray is one of those vicious, nasty functions with side
>> effects and might decrement the count itself.
>>    
>
> might ? What do you mean by might decrement ? If the call to
> PyAarray_FromArray fails, no reference are stolen, right ?
>  

No, that's not right.   The reference is stolen if it fails as well.  
This is true of all descriptor data-types.  Perhaps it is weird, but it
was a lot easier to retro-fit Numeric PyArray_Descr as a Python object
that way.

-Travis

_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Fix to #789 maybe not right.

Charles R Harris
In reply to this post by cdavid


On Wed, May 21, 2008 at 9:03 PM, David Cournapeau <[hidden email]> wrote:
Charles R Harris wrote:
>
>
> And I'm not sure self->desc needs its reference count decremented,
> PyArray_FromArray is one of those vicious, nasty functions with side
> effects and might decrement the count itself.

might ? What do you mean by might decrement ? If the call to
PyAarray_FromArray fails, no reference are stolen, right ?

Maybe, maybe not. We need to look at the function and document it. I think these sort of functions are an invitation to reference counting bugs, of which valgrind says we have several. Really, all the increments and decrements should be inside PyArray_FromArray, but calls to this function are scattered all over.


> Note that the reference count is not decremented elsewhere. It's a
> capital offense to write such functions, but there it is.
>
> On a lesser note, there is trailing whitespace on every new line
> except one.

This is easier to fix :)

Not that I think we are in any worse shape after the fix than before, but I do think we should pretty much leave things alone until 1.1 out the door and leave future fixes to 1.1.1

Chuck



_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Fix to #789 maybe not right.

cdavid
In reply to this post by Travis Oliphant-5
Travis E. Oliphant wrote:
> No, that's not right.   The reference is stolen if it fails as well.  
> This is true of all descriptor data-types.  Perhaps it is weird, but it
> was a lot easier to retro-fit Numeric PyArray_Descr as a Python object
> that way.
>  

Thanks for the clarification. I fixed the code accordingly,

cheers,

David
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Fix to #789 maybe not right.

Travis Oliphant-5
In reply to this post by Charles R Harris
Charles R Harris wrote:
> Really, all the increments and decrements should be inside
> PyArray_FromArray, but calls to this function are scattered all over.
I don't understand what you mean by this statement.    All functions
that return an object and take a PyArray_Descr object steal a reference
to the descriptor (even if it fails).   That's the rule.

-Travis

_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Fix to #789 maybe not right.

cdavid
In reply to this post by cdavid
David Cournapeau wrote:
>
> Thanks for the clarification. I fixed the code accordingly,
>  

Ok, you beat me :)

cheers,

David
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Fix to #789 maybe not right.

Charles R Harris
In reply to this post by Travis Oliphant-5


On Wed, May 21, 2008 at 9:32 PM, Travis E. Oliphant <[hidden email]> wrote:
Charles R Harris wrote:
> Really, all the increments and decrements should be inside
> PyArray_FromArray, but calls to this function are scattered all over.
I don't understand what you mean by this statement.    All functions
that return an object and take a PyArray_Descr object steal a reference
to the descriptor (even if it fails).   That's the rule.

Why should it not increment the reference itself? Note that calls to this function are normally preceded by incrementing the reference, probably because one wants to keep it around. I think it would be clearer to have the rule: you increment it, you decrement it. That way everything is in one obvious place and you don't have to concern yourself with what happens inside PyArray_FromArray. Functions with side effects are almost always a bad idea and lead to bugs in practice.

Chuck



_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Fix to #789 maybe not right.

Charles R Harris
In reply to this post by cdavid


On Wed, May 21, 2008 at 8:56 PM, David Cournapeau <[hidden email]> wrote:
Charles R Harris wrote:
> David,
>
> I'm not sure that fix is completely correct. The out keyword is funny
> and I'm not what the specs are supposed to be, but generally the
> output is cast rather than an error raised.

I think the out argument is one of this thing which is rather a mess
right now in numpy. The functions which accept it do not always mention
it, and as you say, there are various different behaviour.

What are the uses of the out argument ? The obvious one is saving
memory, but are there others ? Automatic casting would break the memory
saving.

I've been contemplating this and I think you are right. The out parameter is for those special bits that need efficiency and should be as simple as possible. That means: same type and shape as the normal output, no appeal. This is especially so as a reference to the output is what the function returns when out is present and the return type should not depend on the type of the out parameter.

Chuck



_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Fix to #789 maybe not right.

Travis Oliphant-5
In reply to this post by Charles R Harris
Charles R Harris wrote:

>
>
> On Wed, May 21, 2008 at 9:32 PM, Travis E. Oliphant
> <[hidden email] <mailto:[hidden email]>> wrote:
>
>     Charles R Harris wrote:
>     > Really, all the increments and decrements should be inside
>     > PyArray_FromArray, but calls to this function are scattered all
>     over.
>     I don't understand what you mean by this statement.    All functions
>     that return an object and take a PyArray_Descr object steal a
>     reference
>     to the descriptor (even if it fails).   That's the rule.
>
>
> Why should it not increment the reference itself? Note that calls to
> this function are normally preceded by incrementing the reference,
> probably because one wants to keep it around.
I wouldn't say normally.   I would say sometimes.  

Normally you create a reference to the data-type and want
PyArray_FromAny and friends to steal it (i.e. PyArray_DescrFromType).

I wouldn't call stealing a reference count a side effect, but even if
you want to call it that, it can't really change without a *huge*
re-working effort for almost zero gain.   You would also have to re-work
all the macros that take type numbers and construct data-type objects
for passing to these functions.   I don't see the benefit at all.
> I think it would be clearer to have the rule: you increment it, you
> decrement it.
Maybe, but Python's own C-API doesn't always follow that rule, there are
functions that steal references.   Remember, PyArray_Descr was
retrofitted as a Python Object.  It didn't use to be one.   This steal
rule was the cleanest I could come up with --- i.e. it wasn't an idle
decision.

It actually makes some sense because the returned array is going to
"own" the reference count to the data-type object (just like setting to
a list).

-Travis

_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Fix to #789 maybe not right.

Charles R Harris


On Wed, May 21, 2008 at 10:55 PM, Travis E. Oliphant <[hidden email]> wrote:
Charles R Harris wrote:
>
>
> On Wed, May 21, 2008 at 9:32 PM, Travis E. Oliphant
> <[hidden email] <mailto:[hidden email]>> wrote:
>
>     Charles R Harris wrote:
>     > Really, all the increments and decrements should be inside
>     > PyArray_FromArray, but calls to this function are scattered all
>     over.
>     I don't understand what you mean by this statement.    All functions
>     that return an object and take a PyArray_Descr object steal a
>     reference
>     to the descriptor (even if it fails).   That's the rule.
>
>
> Why should it not increment the reference itself? Note that calls to
> this function are normally preceded by incrementing the reference,
> probably because one wants to keep it around.
I wouldn't say normally.   I would say sometimes.

Normally you create a reference to the data-type and want
PyArray_FromAny and friends to steal it (i.e. PyArray_DescrFromType).

I wouldn't call stealing a reference count a side effect, but even if
you want to call it that, it can't really change without a *huge*
re-working effort for almost zero gain.   You would also have to re-work
all the macros that take type numbers and construct data-type objects
for passing to these functions.   I don't see the benefit at all.

I  agree with all that, which is why I'm not advocating a change. But it does raise the bar for working with the C code and I think the current case is an example of that.


> I think it would be clearer to have the rule: you increment it, you
> decrement it.
Maybe, but Python's own C-API doesn't always follow that rule, there are
functions that steal references.   Remember, PyArray_Descr was
retrofitted as a Python Object.  It didn't use to be one.   This steal
rule was the cleanest I could come up with --- i.e. it wasn't an idle
decision.

I realize that Python does this too, I also note that getting reference counting right is one of the more difficult aspects of writing Python extension code. The more programmers have to know and keep in mind, the more likely they are to make mistakes.

Chuck


_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: Fix to #789 maybe not right.

Travis Oliphant-5
Charles R Harris wrote:
>
> I  agree with all that, which is why I'm not advocating a change. But
> it does raise the bar for working with the C code and I think the
> current case is an example of that.
>
Yes it does.  I also agree that reference counting is the hardest part
of coding with the Python C-API.

>
>
>     > I think it would be clearer to have the rule: you increment it, you
>     > decrement it.
>     Maybe, but Python's own C-API doesn't always follow that rule,
>     there are
>     functions that steal references.   Remember, PyArray_Descr was
>     retrofitted as a Python Object.  It didn't use to be one.   This steal
>     rule was the cleanest I could come up with --- i.e. it wasn't an idle
>     decision.
>
>
> I realize that Python does this too, I also note that getting
> reference counting right is one of the more difficult aspects of
> writing Python extension code. The more programmers have to know and
> keep in mind, the more likely they are to make mistakes.
>
Agreed.

-teo


_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion