"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Omar Polo <op@omarpolo.com>
Subject:
Re: got patch: merge patches with diff3
To:
Stefan Sperling <stsp@stsp.name>
Cc:
gameoftrees@openbsd.org
Date:
Fri, 17 Jun 2022 11:48:31 +0200

Download raw body.

Thread
Stefan Sperling <stsp@stsp.name> wrote:
> On Thu, Jun 16, 2022 at 12:35:31AM +0200, Omar Polo wrote:
> > Turns out that plain diffs are not without their own shortcomings.  For
> > example, when multiple people are working on the same codebase often
> > happens that changes done by one person conflicts with changes done in
> > parallel by other people.
> > 
> > (I know this well since i'm making an habit of breaking stsp@ diffs
> > before he can even send them to the mailing list ;-)
> > 
> > Plain patches also have other issues and can produce wrong results in
> > some circumstances.  Long story short, a patch implementation can
> > guarantee to produce correct results only when the patch is applied
> > exactly to the same version of the file that the patch was generated
> > from.  (The CAVEATS and BUGS section of patch(1) gives good hints.)
> > 
> > Fortunately, in the scope of a version control system we can do better :)
> > 
> > `got diff' output includes some metadata about the blob id from which
> > the diff was generated.  The blob id indicates the exact version of the
> > file, so by using that as input of the patch machinery instead of the
> > current working revision we can guarantee a correct result.  Then we can
> > reuse the same mechanism used by 'got cherrypick' and 'got rebase' to
> > merge the files: diff3.  We have the file indicated by the blob id in
> > the patch text as the ancestor, the patched file as one of the
> > derivative and the current working revision as other derivative.
> > 
> > Pragmatically speaking, this reduces the times 'got patch' fails to
> > apply a diff turning many patch-incompatibilities either into a success
> > or a diff3 conflict.  QoL stuff.
> 
> apply_patch() seems to treat the occurrence of a merge conflict as a
> fatal error. Why?

in fact, i don't want to treat them as fatal error.  The idea was to use
GOT_ERR_PATCH_CONFLICT to signal the caller (got_patch) that a conflict
was found, but not to stop the patch operation.  (like what happens for
a failed patch, so that in the end 'got patch' exits with non-zero)

I lost a change in got_patch when putting the diff into shape, thanks
for spotting this!

> > Some possible improvements are:
> > 
> >  - more test coverage: the added tests are very basic;
> 
> Certainly always welcome and needed.

eheh :)

> >  - requiring a perfect application (i.e. not even 'hunk applied with
> >    offset') when patching the blob;
> 
> Not sure if there would be much use for such an option. Who would
> be enabling this feature on purpose and for which reason?

it wasn't meant as an user-option, but the default behaviour.  i'm
probably overthinking thought.

> >  - requiring that a previous revision of the file effectively had
> >    associated that blob id (i.e. not trusting the blob id metadata)
> 
> This is not impossible to do but very expensive. At least it would
> require a full history walk in case the ID is not present in the repo,
> and after all this time you need to do regular patching anyway.
> 
> I would refrain from adding such checks to 'patch'; we already trust
> other parts of the patch input data, even though could the patch can
> always be mangled in ways we cannot predict. Unlike changes merged
> from other commits, patches are allowed to be fuzzy, and sometimes
> this can be an advantage rather than a problem.

yeah, i was probably overthinking this too.

honestly, i was trying to avoid to apply patches with fuzzy, given that
we're not patch(1) and have access to the history of the files.  on the
other hand, from a more pragmatic point of view, it's nice to have this
diff3 merge in place and use it when possible, but also supporting
applying at offset (like we do) and with fuzzy are genuinely useful
features.

> >  - extracting the same metadata from the output of git diff;
> 
> Git diff only shows abbreviated commit IDs by default.
> If the commit IDs are not ambiguous, then you could open a tree, and
> lookup the path to hopefully find a blob ID. A bit awkward.

yep, that's why i'm not trying to in the first attempt at diff3 merge to
support the git notation.  I want to, but that can come (just a bit)
later.

it's nice thought how the output of 'got diff' is so easy to parse
already, was it written with a 'got patch' in mind? :)

> >  - inspecting the file history: at least in theory, we should be able to
> >    apply a diff even after a file has been renamed; also, by doing a
> >    recursive merge we may be able to reduce the conflicts further maybe
> >    (i need to investigate on this)
> 
> Renames open up a huge can of worms and I don't want to be dragged
> into that before we have a working server implementation.
> Let's deal with renames later.

absolutely, i was just mentioning a few things that can be done (also to
hear your opinion.)

> >  - providing a flag to use a specified commit tree as base when applying
> >    the diff and then merging the result with the current version of the
> >    files: this should allow to easily apply diffs generated by other
> >    means without conflicts;
> 
> That is a nice idea. Here, the user would probably want to provide
> just a commit ID as well, just like a default Git diff does.
> 
> I would suggest using '-c commit' as the option for this, and document
> that this commit will be used as a merge base.

that's _exactly_ what i had in mind :)

> Check upfront if all the
> base files we will need are present in that commit, otherwise error out?
> This way you would enforce at least a bit of sanity, but see below.

that sounds good, but would require some work.  At the moment got patch
works on one patch at a time (not that it cannot be changed.)

> >  - we could even try to take the previous point to the extreme and
> >    traversing the history until we find a point where the patch applies
> >    cleanly.  probably not useful if not in extreme cases, but it's a fun
> >    thought experiment at least.
> 
> Best you could do here is try to find a commit where all the files exist.
> Which is not reliable. With a bad merge base there can be many useless
> conflict markers in the patched result. This can also happen with the
> -c option suggested above, though it would be the user's fault.
>
> I don't think any of the above features are very urgent. For an initial
> implementation, what you have now seems fine. More tests are the most
> important todo item on your list, I believe. Tests might uncover a few
> problems we don't know about yet.

i'll work on an updated diff addressing also the dummy label names but
then I think too that this could be committed.

Thanks!

Omar Polo