let's use patch review

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

let's use patch review

Ondrej Certik
Hi,

I read the recent flamebate about unittests, formal procedures for a
commit etc. and it was amusing. :)
I think Stefan is right about the unit tests. I also think that Travis
is right that there is no formal procedure that can assure what we
want.

I think that a solution is a patch review. Every big/succesful project
does it. And the workflow as I see it is this:

1) Travis will fix a bug, and submit it to a patch review. If he is
busy, that's the only thing he will do
2) Someone else reviews it. Stefan will be the one who will always
point out missing tests.
3) There needs to be a common consensus that the patch is ok to go in.
4) when the patch is reviewed and ok to go in, anyone with a commit
access will commit it.

I think it's as simple as that.

Sometimes no one has enought time to write a proper test, yet someone
has a free minute to fix a bug. Then I think it's ok to put the code
in, as I think it's good to fix a bug now. However,
the issue is definitely not closed and the bug is not fixed (!) until
someone writes a proper test. I.e. putting code in that is not tested,
however it doesn't break things, is imho ok, as it will not hurt
anyone and it will temporarily fix a bug (but of course the code will
be broken at some point in the future, if there is no test for it).

Now, the problem is that all patch review tools sucks in some way.
Currently the most promissing is the one from Guido here:

http://code.google.com/p/rietveld/

it's opensource, you can run it on your server, or use it online here:

http://codereview.appspot.com/

I suggest you to read the docs how to use it, I am still learning it.
Also it works fine for svn, but not for Mercurial, so we are not using
it in SymPy yet.

So to also do some work besides just talk, I started with this issue:

http://projects.scipy.org/scipy/numpy/ticket/788

and submitted the code (not my code though:) in there for a review here:

http://codereview.appspot.com/953

and added some comments. So what do you think?

Ondrej

P.S. efiring, my comments are real questions to your patch. :)
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: let's use patch review

Eric Firing
Ondrej Certik wrote:

> Hi,
>
> I read the recent flamebate about unittests, formal procedures for a
> commit etc. and it was amusing. :)
> I think Stefan is right about the unit tests. I also think that Travis
> is right that there is no formal procedure that can assure what we
> want.
>
> I think that a solution is a patch review. Every big/succesful project
> does it. And the workflow as I see it is this:

Are you sure numpy is big enough that a formal mechanism is needed--for
everyone?  It makes good sense for my (rare) patches to be reviewed, but
shouldn't some of the core developers be allowed to simply get on with
it?  As it is, my patches can easily be reviewed because I don't have
commit access.

>
> 1) Travis will fix a bug, and submit it to a patch review. If he is
> busy, that's the only thing he will do
> 2) Someone else reviews it. Stefan will be the one who will always
> point out missing tests.

That we can agree on!

> 3) There needs to be a common consensus that the patch is ok to go in.

What does that mean?  How does one know when there is a consensus?

> 4) when the patch is reviewed and ok to go in, anyone with a commit
> access will commit it.

But it has to be a specific person in each case, not "anyone".

>
> I think it's as simple as that.
>
> Sometimes no one has enought time to write a proper test, yet someone
> has a free minute to fix a bug. Then I think it's ok to put the code
> in, as I think it's good to fix a bug now. However,

How does that fit with the workflow above?  Does Travis commit the
bugfix, or not?

> the issue is definitely not closed and the bug is not fixed (!) until
> someone writes a proper test. I.e. putting code in that is not tested,
> however it doesn't break things, is imho ok, as it will not hurt
> anyone and it will temporarily fix a bug (but of course the code will
> be broken at some point in the future, if there is no test for it).
That is overstating the case; for 788, for example, no one in his right
mind would undo the one-line correction that Travis made.  Chances are,
there will be all sorts of breakage and foulups and the revelation of
new bugs in the future--but not another instance that would be caught by
the test for 788.
>
[...]
> http://codereview.appspot.com/953
>
> and added some comments. So what do you think?
Looks like it could be useful.  I replied to the comments. I haven't
read the docs, and I don't know what the next step is when a revision of
the patch is in order, as it is in this case.

Eric
>
> Ondrej
>
> P.S. efiring, my comments are real questions to your patch. :)
> _______________________________________________
> Numpy-discussion mailing list
> [hidden email]
> http://projects.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: let's use patch review

Ondrej Certik
On Thu, May 15, 2008 at 1:58 AM, Eric Firing <[hidden email]> wrote:

> Ondrej Certik wrote:
>> Hi,
>>
>> I read the recent flamebate about unittests, formal procedures for a
>> commit etc. and it was amusing. :)
>> I think Stefan is right about the unit tests. I also think that Travis
>> is right that there is no formal procedure that can assure what we
>> want.
>>
>> I think that a solution is a patch review. Every big/succesful project
>> does it. And the workflow as I see it is this:
>
> Are you sure numpy is big enough that a formal mechanism is needed--for
> everyone?  It makes good sense for my (rare) patches to be reviewed, but
> shouldn't some of the core developers be allowed to simply get on with
> it?  As it is, my patches can easily be reviewed because I don't have
> commit access.

It's not me who should make this decision, as I have contributed in
total maybe 1 patch to numpy. I am just suggesting it as a
possibility.

The numpy developers are free to choose the workflow that suits them best.

But if you are asking for my own opinion -- yes, I think all code
should be reviewed, including core developers, that's for example what
Sage does (or what we do in SymPy), because that brings other people
to comment on it, to help find bugs, to get familiar with it and also
everyone involved learns something from it. Simply google why patch
review is useful to get arguments for doing it.

>
>>
>> 1) Travis will fix a bug, and submit it to a patch review. If he is
>> busy, that's the only thing he will do
>> 2) Someone else reviews it. Stefan will be the one who will always
>> point out missing tests.
>
> That we can agree on!
>
>> 3) There needs to be a common consensus that the patch is ok to go in.
>
> What does that mean?  How does one know when there is a consensus?

That everyone involved in the discussion agrees it should go in as it
is. I am sure you can recognize very easily if there is not a
concensus.

>
>> 4) when the patch is reviewed and ok to go in, anyone with a commit
>> access will commit it.
>
> But it has to be a specific person in each case, not "anyone".

Those, who have commit access are definitely not anyone. Only those,
who have showed they can be trusted
not to break things.

>
>>
>> I think it's as simple as that.
>>
>> Sometimes no one has enought time to write a proper test, yet someone
>> has a free minute to fix a bug. Then I think it's ok to put the code
>> in, as I think it's good to fix a bug now. However,
>
> How does that fit with the workflow above?  Does Travis commit the
> bugfix, or not?

Both is possible. Some projects require all patches to go through a
review and I personally think it's a good idea.

>
>> the issue is definitely not closed and the bug is not fixed (!) until
>> someone writes a proper test. I.e. putting code in that is not tested,
>> however it doesn't break things, is imho ok, as it will not hurt
>> anyone and it will temporarily fix a bug (but of course the code will
>> be broken at some point in the future, if there is no test for it).
> That is overstating the case; for 788, for example, no one in his right
> mind would undo the one-line correction that Travis made.  Chances are,
> there will be all sorts of breakage and foulups and the revelation of
> new bugs in the future--but not another instance that would be caught by
> the test for 788.

That's what the patch review is for -- people will comment on this and
if a consencus
is made that a test is not necessary, ok.

>>
> [...]
>> http://codereview.appspot.com/953
>>
>> and added some comments. So what do you think?
> Looks like it could be useful.  I replied to the comments. I haven't
> read the docs, and I don't know what the next step is when a revision of
> the patch is in order, as it is in this case.

It seems only the owner of the issue (in this case me, because I
uploaded your code) can add new patches to that issue. So simply start
a new issue
and upload it there. If there were more revisions from you, it would
look like this:

http://codereview.appspot.com/970

Ondrej
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: let's use patch review

David Cournapeau-4
In reply to this post by Eric Firing
On Wed, 2008-05-14 at 13:58 -1000, Eric Firing wrote:
>
> What does that mean?  How does one know when there is a consensus?

There can be a system to make this automatic. For example, the code is
never commited directly to svn, but to a gatekeeper, and people vote by
an email command to say if they want the patch in; when the total number
of votes is above some threshold, the gatekeeper commit the patch.

David

_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: let's use patch review

David Huard
2008/5/14 David Cournapeau <[hidden email]>:
On Wed, 2008-05-14 at 13:58 -1000, Eric Firing wrote:
>
> What does that mean?  How does one know when there is a consensus?

There can be a system to make this automatic. For example, the code is
never commited directly to svn, but to a gatekeeper, and people vote by
an email command to say if they want the patch in; when the total number
of votes is above some threshold, the gatekeeper commit the patch.

There is about 5 commits/day, I don't think it's a good idea to wait for a vote on each one of them.
 

David

_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion


_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: let's use patch review

Ondrej Certik
On Thu, May 15, 2008 at 3:08 PM, David Huard <[hidden email]> wrote:

> 2008/5/14 David Cournapeau <[hidden email]>:
>>
>> On Wed, 2008-05-14 at 13:58 -1000, Eric Firing wrote:
>> >
>> > What does that mean?  How does one know when there is a consensus?
>>
>> There can be a system to make this automatic. For example, the code is
>> never commited directly to svn, but to a gatekeeper, and people vote by
>> an email command to say if they want the patch in; when the total number
>> of votes is above some threshold, the gatekeeper commit the patch.
>
> There is about 5 commits/day, I don't think it's a good idea to wait for a
> vote on each one of them.

Me neither. I think just one reviewer is enough.

Ondrej
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: let's use patch review

Pearu Peterson-3
In reply to this post by Ondrej Certik
On Thu, May 15, 2008 12:06 am, Ondrej Certik wrote:
> Hi,
>
> I read the recent flamebate about unittests, formal procedures for a
> commit etc. and it was amusing. :)
> I think Stefan is right about the unit tests. I also think that Travis
> is right that there is no formal procedure that can assure what we
> want.
>
> I think that a solution is a patch review.

I am -0.8 on it because the number of numpy core developers is just
too small for the patch review to be effective - there is not enough
reviewers who are qualified to review low-level code.

The number of core developers can be defined as a number of
developers who have ever been owners of numpy tickets.
It seems that the number is less than 10.
Note also that may be only few of them can work full time on numpy.

For adding new features, the patch review system can be reasonable though.

My 2 cents,
Pearu

_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: let's use patch review

Francesc Alted
In reply to this post by Ondrej Certik
A Wednesday 14 May 2008, Ondrej Certik escrigué:
> Hi,
>
> I read the recent flamebate about unittests, formal procedures for a
> commit etc. and it was amusing. :)
> I think Stefan is right about the unit tests. I also think that
> Travis is right that there is no formal procedure that can assure
> what we want.
[snip]

For what is worth, Ivan and me were using patch peer review for more
than a year in the PyTables project, and, although I was quite
reluctant to adopt it at the begining, the reality is that it resulted
in *much* better code quality to be added to the repository.

Here it is the strategy ww used:

0. A ticket is opened explaining the feature to add or the thing to fix.

1. The ticket taker (i.e. the responsible of adding the new code)
creates a new branch (properly labeled) for the desired
modification/patch.  The ticket is updated with a link to the new
branch and the ownership of the ticket is transferred to the taker.

2. Once the modification is in the new branch, the ticket is updated
explaining the actions done, and the ownership of the ticket
transferred to the reviewer.

3. The peer reviews the code, and write suggestions in the same ticket
about the new code.  If the reviewer doesn't feel the need to suggest
anything, he will write this in the ticket also.  The ticket is
transferred to the original author.

4. The original author should revise the suggestions of his peer, and if
more actions are needed, he should address them.  After this, he will
transfer the ticket to the peer for a new review.

5. Phase 3 and 4 are repeated until an agreement is held by both parts,
and the discussion remains in the ticket (one can temporarily tranfer
part of the ticket discussion to the mailing list, if necessary).

6. After the agreement, the original author commits the patch in the
temporary code branch to the affected branches (normally trunk and the
stable branch of the project) and removes the temporary branch.  The
author has to tell explicitely in the ticket to which branches he has
applied the new patch.

7. The ticket is closed.

Of course, this works great with a pair programming paradigm (as was the
case for PyTables), but for a project as NumPy there are more
developers than just a pair, so you should decide how to choose the
reviewer.  One possibility is to form pairs by affinity, so that they
can act normally together.  Another possibility would be to force all
the developers to subscribe to the ticket mailing list, and, for each
ticket that requires a peer review, a call should be sent in order to
gather a reviewer (who can offer as a volunteer by adding a note to the
ticket, for example).

I don't need to say that this procedure was not used for small or
trivial changes (that were fixed directly), but only when the issue was
important enough to deserve the attention of the mate.

My two cents,

--
Francesc Alted
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: let's use patch review

cdavid
In reply to this post by David Huard
David Huard wrote:
>
> There is about 5 commits/day, I don't think it's a good idea to wait
> for a vote on each one of them.

There is definitely a balance to find, and I am not convinced it would
work well with subversion (it really makes sense to have those review
with merge request, not per commit). For example, in scons, they have a
fairly heavy review process which IMO prevents it from getting more
contribution. In bzr, it works pretty well (they use a gateway system),
but most main developers are paid for it.

Having a somewhat official review process would also help solving one of
my problem with trac: when someone sends a patch on trac, we don't know
it, we have to look for it, and some of them are lost/duplicated.
Requesting unit tests for new contributors is too much, I think, I hate
it for other projects where I am less involved than numpy/scipy. But say
I have 20 minutes to spend on reviewing patches: with a system which a
list of available patches, it would be easy to do so. Maybe it is
possible with trac and I just missed it.

cheers,

David
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: let's use patch review

Anne Archibald
In reply to this post by Francesc Alted
2008/5/15 Francesc Alted <[hidden email]>:

> I don't need to say that this procedure was not used for small or
> trivial changes (that were fixed directly), but only when the issue was
> important enough to deserve the attention of the mate.

I think here's the rub: when I hear "patch review system" it sounds to
me like an obstacle course for getting code into the software. Maybe
it's justified, but I think at the moment there are many many things
that are just awaiting a little bit of attention from someone who
knows the code. A patch review system overapplied would multiply that
number by rather a lot.

How about a purely-optional patch review system? I've submitted
patches I wanted reviewed before they went in the trunk. As it was, I
didn't have SVN access, so I just posted them to trac or emailed them
to somebody, who then pondered and committed them. But a patch review
system - provided people were promptly reviewing patches - would have
fit the bill nicely.

How frequently does numpy receive patches that warrant review? The
zillion little doc fixes don't, even moderate-sized patches from
experienced developers probably don't warrant review. So in the last
while, what are changes that needed review, and what happened to them?

* financial code - email to the list, discussion, eventual inclusion
* matrix change - bug filed, substantial discussion, confusion about
consensus committed, further dicussion, consensus committed, users
complain, patch backed out and issue placed on hold
* MA change - developed independently, discussed on mailing list, committed
* histogram change - filed as bug, discussed on mailing list, committed
* median change - discussed on mailing list, committed
* .npy file format - discussed and implemented at a sprint, committed

Did I miss any major ones? Of course, svn log will give you a list of
minor fixes in the last few months.

It seems to me like the review process at the moment is just "discuss
it on the mailing list". Tools to facilitate that would be valuable;
it would be handy to be able to point to a particular version of the
code somewhere on the Web (rather than just in patches attached to
email), for example.

Anne
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: let's use patch review

cdavid
Anne Archibald wrote:
> I think here's the rub: when I hear "patch review system" it sounds to
> me like an obstacle course for getting code into the software. Maybe
> it's justified, but I think at the moment there are many many things
> that are just awaiting a little bit of attention from someone who
> knows the code. A patch review system overapplied would multiply that
> number by rather a lot.
>  

I think in this discussion, it is easy to see the drawbacks, and not
seeing the advantages (better code, etc...).

> How about a purely-optional patch review system? I've submitted
> patches I wanted reviewed before they went in the trunk. As it was, I
> didn't have SVN access, so I just posted them to trac or emailed them
> to somebody, who then pondered and committed them. But a patch review
> system - provided people were promptly reviewing patches - would have
> fit the bill nicely.
>  

It is not much, but I've just created a trac report to see all (open)
tickets with an attachment. What would be good would be to create a new
ticket type, like patch, such as instead of seeing all attachments
(including build log as now), we see real attachments.

I know next to nothing about databases and how complicated it would be
to do it in trac, but I know other projects using trac have it, so it is
doable.

cheers,

David
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Reply | Threaded
Open this post in threaded view
|

Re: let's use patch review

Stéfan van der Walt
In reply to this post by Anne Archibald
2008/5/16 Anne Archibald <[hidden email]>:
> How frequently does numpy receive patches that warrant review? The
> zillion little doc fixes don't, even moderate-sized patches from
> experienced developers probably don't warrant review.

Those moderately-sized patches are the ones that need review,
especially.  Review provides useful information on a couple of levels:

a) Motivation -- why do we want/need this patch
b) Functionality -- does it do what the developer intended it to
c) Implementation -- is it written according to current best practices

Level (a) is normally discussed on the mailing list, if needed.  Level
(b) is covered by unit tests, *if* those were written.  Then, level
(c) is where the main advantage lies: we can learn from one another
how to develop better code.

I am somewhat split in two on this one.  I love the idea of patch
review; it undoubtedly raises the quality of the codebase.  That said,
it comes at a cost in developer time, and I'm not sure we have that
luxury (we don't have a Michael Abshoff, unfortunately).  Making it
optional might be a good compromise, although the person who wrote a
patch isn't the best one to judge whether it should be reviewed (of
course, we all think our code is good!).

Regards
Stéfan
_______________________________________________
Numpy-discussion mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/numpy-discussion