"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Stefan Sperling <stsp@stsp.name>
Subject:
Re: merge chokes, creates bogus conflicts
To:
Mark Jamsek <mark@jamsek.com>
Cc:
Christian Weisgerber <naddy@mips.inka.de>, gameoftrees@openbsd.org
Date:
Sun, 19 Feb 2023 10:17:56 +0100

Download raw body.

Thread
On Sun, Feb 19, 2023 at 06:38:41PM +1100, Mark Jamsek wrote:
> On 23-02-18 04:45PM, Stefan Sperling wrote:
> > On Sun, Feb 19, 2023 at 01:57:23AM +1100, Mark Jamsek wrote:
> > > Updated diff incorporates all your suggestions. Thanks for the
> > > improvements :)
> > 
> > Thanks! ok by me. We can add the diff_result performance tweak
> > on top once you or myself find time to implement it.
> 
> The below diff implements this optimisation by using diff_result and
> bypassing the somewhat expensive diff format and output path.
> 
> It also fixes a bug in the previous revision where the "from" file (f1)
> was always empty because the blob was consumed by the caller and needed
> a got_object_blob_rewind() call. So we were effectively adding the whole
> file thus making the process that much slower!
> 
> Also, that update.sh change that I spent far too much time debating
> myself over has now reverted from "M  foo" back to "C  foo"--and I think
> this is actually right:
> 
> In the test, we delete "foo" in the previous commit, so it actually is
> a new conflict. The previous approach of parsing the text diff
> for added lines turned the report from "C  foo" to "M  foo" because it
> wasn't forcing ASCII, thus the diff didn't have any '+' lines, only the
> "Binary files ... differ" line (i.e., no conflict markers). If we set
> the 'force_text_diff' flag in the previous revision, it does indeed
> report as a conflict. Thus this diff is not only an optimisation but
> also fixes two bugs :)

Nice, ok stsp@

> -		if (blob) {
> -			err = got_object_blob_dump_to_file(&sz1, NULL, NULL,
> -			    f1, blob);
> +		diff_chunk_context_load_change(&cc, &nchunks_parsed, r, n, 0);
> +
> +		while (ln < cc.right.start) {
> +			err = skip_one_line(ondisk_file);

It is a bit disappointing that we need to keep reading lines from the
file to skip to the start of a hunk, instead of being able to fseek to a
line via a byte offset which corresponds to line number cc.right.start.
For very large files (e.g. in OpenBSD sys/pci/drm) this can be expensive.

Maybe the diff library does not yet expose enough information?
diff_atoms do have a 'pos' member but it is apparently only filled if the
file is not memory mapped. Should add an API that exposes byte offsets of
all chunks? Or does something like this already exist?