"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Stefan Sperling <stsp@stsp.name>
Subject:
Re: got patch: merge patches with diff3
To:
Omar Polo <op@omarpolo.com>
Cc:
gameoftrees@openbsd.org
Date:
Fri, 17 Jun 2022 10:51:02 +0200

Download raw body.

Thread
On Thu, Jun 16, 2022 at 12:35:31AM +0200, Omar Polo wrote:
> Turns out that plain diffs are not without their own shortcomings.  For
> example, when multiple people are working on the same codebase often
> happens that changes done by one person conflicts with changes done in
> parallel by other people.
> 
> (I know this well since i'm making an habit of breaking stsp@ diffs
> before he can even send them to the mailing list ;-)
> 
> Plain patches also have other issues and can produce wrong results in
> some circumstances.  Long story short, a patch implementation can
> guarantee to produce correct results only when the patch is applied
> exactly to the same version of the file that the patch was generated
> from.  (The CAVEATS and BUGS section of patch(1) gives good hints.)
> 
> Fortunately, in the scope of a version control system we can do better :)
> 
> `got diff' output includes some metadata about the blob id from which
> the diff was generated.  The blob id indicates the exact version of the
> file, so by using that as input of the patch machinery instead of the
> current working revision we can guarantee a correct result.  Then we can
> reuse the same mechanism used by 'got cherrypick' and 'got rebase' to
> merge the files: diff3.  We have the file indicated by the blob id in
> the patch text as the ancestor, the patched file as one of the
> derivative and the current working revision as other derivative.
> 
> Pragmatically speaking, this reduces the times 'got patch' fails to
> apply a diff turning many patch-incompatibilities either into a success
> or a diff3 conflict.  QoL stuff.

apply_patch() seems to treat the occurrence of a merge conflict as a
fatal error. Why?

> Some possible improvements are:
> 
>  - more test coverage: the added tests are very basic;

Certainly always welcome and needed.

>  - requiring a perfect application (i.e. not even 'hunk applied with
>    offset') when patching the blob;

Not sure if there would be much use for such an option. Who would
be enabling this feature on purpose and for which reason?

>  - requiring that a previous revision of the file effectively had
>    associated that blob id (i.e. not trusting the blob id metadata)

This is not impossible to do but very expensive. At least it would
require a full history walk in case the ID is not present in the repo,
and after all this time you need to do regular patching anyway.

I would refrain from adding such checks to 'patch'; we already trust
other parts of the patch input data, even though could the patch can
always be mangled in ways we cannot predict. Unlike changes merged
from other commits, patches are allowed to be fuzzy, and sometimes
this can be an advantage rather than a problem.

>  - extracting the same metadata from the output of git diff;

Git diff only shows abbreviated commit IDs by default.
If the commit IDs are not ambiguous, then you could open a tree, and
lookup the path to hopefully find a blob ID. A bit awkward.

>  - inspecting the file history: at least in theory, we should be able to
>    apply a diff even after a file has been renamed; also, by doing a
>    recursive merge we may be able to reduce the conflicts further maybe
>    (i need to investigate on this)

Renames open up a huge can of worms and I don't want to be dragged
into that before we have a working server implementation.
Let's deal with renames later.

>  - providing a flag to use a specified commit tree as base when applying
>    the diff and then merging the result with the current version of the
>    files: this should allow to easily apply diffs generated by other
>    means without conflicts;

That is a nice idea. Here, the user would probably want to provide
just a commit ID as well, just like a default Git diff does.

I would suggest using '-c commit' as the option for this, and document
that this commit will be used as a merge base. Check upfront if all the
base files we will need are present in that commit, otherwise error out?
This way you would enforce at least a bit of sanity, but see below.

>  - we could even try to take the previous point to the extreme and
>    traversing the history until we find a point where the patch applies
>    cleanly.  probably not useful if not in extreme cases, but it's a fun
>    thought experiment at least.

Best you could do here is try to find a commit where all the files exist.
Which is not reliable. With a bad merge base there can be many useless
conflict markers in the patched result. This can also happen with the
-c option suggested above, though it would be the user's fault.

I don't think any of the above features are very urgent. For an initial
implementation, what you have now seems fine. More tests are the most
important todo item on your list, I believe. Tests might uncover a few
problems we don't know about yet.