"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Stefan Sperling <stsp@stsp.name>
Subject:
Re: check file status before applying patches (second try)
To:
Omar Polo <op@omarpolo.com>
Cc:
gameoftrees@openbsd.org
Date:
Sun, 13 Mar 2022 12:32:07 +0100

Download raw body.

Thread
On Sun, Mar 13, 2022 at 11:33:49AM +0100, Omar Polo wrote:
> > If a patch file contains a series of patches which will be applied
> > in sequence, then patches which pass pre-conditions will affect
> > the work tree, and any later patch which does not pass pre-conditions
> > will make 'got patch' error out, leaving any modifications made by
> > previous patches behind.
> > I think this behaviour is correct. The application of patch N to
> > the work tree can affect pre-condition checks for patch N+1, so
> > there is no way to check pre-conditions for patch N+1 before
> > applying patch N to the work tree.
> > Still, this might be worth documenting because users might otherwise
> > expect that 'got patch' will not modify the work tree in any way
> > unless *all* patches in the input pass pre-condition checks.
> 
> Yep, i tried to point this out in the diff below.
> 
> In theory it should be possible to apply a patch in a way that if
> everything succeeds then the changes are persisted but otherwise the
> worktree is unmodified, but it seems a bit difficult to do correctly.

This would require a multi-layer transactional system, where we could
do things like 'revert' but go back to some intermediate modified state
instead of the repository's base state.

I would suggest to keep such things as simple as possible.
We'll produce too many bugs otherwise.
 
> If we strengthen the preconditions so that only unmodified files are
> allowed to be changed, than we could run an implicit 'got revert' but
> i don't really like the idea.  Producing .rej files (which atm we still
> don't do) or conflict markers on failures but still trying to continue
> seems the most useful behavior to me, at least by default.

Allowing only M is not a restriction we would want to keep in the
long-term. Having files in A and D status should be supported.
 
> > > -	err = got_worktree_resolve_path(&oldpath, worktree,
> > > -	    p->old != NULL ? p->old : p->new);
> > > +	TAILQ_INIT(&oldpaths);
> > > +	TAILQ_INIT(&newpaths);
> > > +
> > > +	err = build_pathlist(p->old != NULL ? p->old : p->new, &oldpath,
> > > +	    &oldpaths, worktree);
> > >  	if (err)
> > >  		goto done;
> > 
> > Maybe worktree.c should provide a build_pathlist style funcion?
> > This function returns a pathlist that corresponds to a single path
> > in the work tree, and could be useful elsewhere, e.g. in got.c?
> > In any case, we can refactor this later.
> 
> I'm not really sure which other parts of got works with only one file at
> time...
>
> Furthermore, I'm expecting to handle some other paths operation there in
> the future.  Atm we fail to work on `got checkout -p ...' worktrees and
> with patches that have "non-worktree-absolute" paths.  I still haven't
> solved these problems, but I'm guessing that I'll be able to solve them
> in `build_pathlist'.

Absolute paths can be made relative via the strip option.

For checkout -p, we could strip away the work tree's path prefix from
paths found in the patch, provided *all* such paths share this prefix.

Patches created from checkout -p are another problem. At present,
'got patch' expects paths in the patch to be relative to the work
tree root. Perhaps it should treat paths as relative to the current
working directory instead? (This directory would be expected to be
somewhere within a work tree, of course.) Then we could apply a patch
that was created from, say, a '-p sys' work tree, as follows:
  cd /usr/src/sys
  got patch < patchfile


> > Maybe refine this error to say "iota: No such file or directory"?
> > You could use got_error_set_errno(ENOENT, path) somewhere in this
> > specific case.
> 
> agreed, it's more intuitive this way.  I've used got_error_set_errno in
> can_rm and can_edit if the file is NONEXISTENT.
> 
> I've also added two other checks to test_patch_illegal_status.

Great, looks good to me!

Some man page nits, with one behavioural change suggestion below:

> -----------------------------------------------
> commit 82f60a9a23894c1fa5c3098d8d9d9559c2ad2329 (patchcfs)
> from: Omar Polo <op@omarpolo.com>
> date: Sun Mar 13 10:06:20 2022 UTC
>  
>  improve `got patch' section of the manpage
>  
>  Simplify some pharsings, explain what preconditions `got patch' has and

s/pharsings/phrasing/

>  what happens to the worktree after an error occurs.
>  
> diff 5b12f37b51cbef2c1d56cba3d6811d8172c60a5d 9eec00bfb3e7e3a68db41095f894e1f33a0d4727
> blob - 61596d1cfc534598b2516ffa737754cfb667abb1
> blob + d7bdf56abfff3111d815aab14e11d14fb3c31c58
> --- got/got.1
> +++ got/got.1
> @@ -889,7 +889,6 @@ arguments are provided, only show differences for the 
>  Cannot be used together with the
>  .Fl P
>  option.
> -.Pp
>  .It Fl C Ar number
>  Set the number of context lines shown in the diff.
>  By default, 3 lines of context are shown.
> @@ -1291,18 +1290,19 @@ option)
>  Apply changes from
>  .Ar patchfile
>  .Pq or standard input
> -and record the state of the affected files afterwards.
> -The content of
> -.Ar patchfile
> -must be an unified diff.
> +that must be an unified diff.

This should say "a unified diff".
Nobody would say "an unified" in spoken English.

>  If
>  .Ar patchfile
>  contains more than one patch,
>  .Nm
>  .Cm patch
> -will try to apply them all.
> +will try to apply each of them in sequence.
> +Files added or removed from the repository as consequence are automatically
> +scheduled as such.

Nothing is removed from the repository until changes are committed.
Perhaps say:

"Files added or removed by the patch are scheduled for addition or removal
in the work tree."

>  .Pp
> -Show the status of each affected file, using the following status codes:
> +While applying the
> +.Ar patchfile ,
> +show the status of each affected file, using the following status codes:
>  .Bl -column XYZ description
>  .It M Ta modified file
>  .It D Ta deleted file
> @@ -1314,6 +1314,23 @@ If a change does not match at its exact line number,
>  .Cm patch
>  applies it somewhere else in the file if it can find a good spot before
>  giving up.
> +.Pp
> +.Nm
> +.Cm patch
> +will refuse to apply a patch if certain preconditions are not met.
> +Files to be deleted must be present in the repository and not have been

s/repository/work tree/

> +scheduled for deletion already.
> +Files to be added must be unknown and not present in the worktree, but

s/worktree/work tree/

> +are also allowed to be missing.

Missing means status !, correct?  I think 'got patch' should block an
addition in this case. A missing file generally means that something
went wrong (e.g. someone ran 'mv foo bar' but forgot to use 'got rm foo'
and 'got add bar'). And the next 'got update' would usually try to restore
any missing files, which after patching would no longer be missing, they
would appear to be in modified status instead of added status after patching.

If you agree, please make this change as a follow-up in a later commit,
or roll it into the current diff if you prefer.

> +Files to be modified must be already present in the repository and in
> +the worktree, and not have conflicts or being obstructed.

s/worktree/work tree/

And I would drop this part: "in the repository and"
Because "present in the work tree" implies presence in the repository.

> +.Pp
> +If a patch fails to apply for any reason
> +.Nm
> +stops and the affected file is unchanged.
> +Modifications done to the worktree due to previous patches in the same
> +.Ar patchfile
> +are still visible.

I would rephrase the above as follows, to give users clear guidance
about what to do next:

If an error occurs, the
.Cm patch
operation will be aborted.
Any changes made to the work tree up to this point will be left behind.
Such changes can be viewed with
.Cm got diff
and can be reverted with
.Cm got revert
if needed.