From: Mark Jamsek <mark@jamsek.com>
Subject: Re: merge chokes, creates bogus conflicts
To: Christian Weisgerber <naddy@mips.inka.de>, gameoftrees@openbsd.org
Date: Sun, 19 Feb 2023 21:31:18 +1100

On 23-02-19 10:17AM, Stefan Sperling wrote:
> On Sun, Feb 19, 2023 at 06:38:41PM +1100, Mark Jamsek wrote:
> > On 23-02-18 04:45PM, Stefan Sperling wrote:
> > > On Sun, Feb 19, 2023 at 01:57:23AM +1100, Mark Jamsek wrote:
> > -		if (blob) {
> > -			err = got_object_blob_dump_to_file(&sz1, NULL, NULL,
> > -			    f1, blob);
> > +		diff_chunk_context_load_change(&cc, &nchunks_parsed, r, n, 0);
> > +
> > +		while (ln < cc.right.start) {
> > +			err = skip_one_line(ondisk_file);
> 
> It is a bit disappointing that we need to keep reading lines from the
> file to skip to the start of a hunk, instead of being able to fseek to a
> line via a byte offset which corresponds to line number cc.right.start.
> For very large files (e.g. in OpenBSD sys/pci/drm) this can be expensive.
> 
> Maybe the diff library does not yet expose enough information?
> diff_atoms do have a 'pos' member but it is apparently only filled if the
> file is not memory mapped. Should add an API that exposes byte offsets of
> all chunks? Or does something like this already exist?

I did look to see if we could do that [seek to the first hunk with added
lines] and came to the same conclusion; that is, no, we can't do it
(unless GOT_DIFF_NO_MMAP is defined) so I think we should add hunk
offsets to the result irrespective of whether the files are mmap'd.

But I didn't want to mix such changes in with this diff, and then when
I discovered the couple bugs I just wanted to push this out quickly, so
I instead added it to my todo list with the other diff API change (i.e.,
reuse diff_result).

-- 
Mark Jamsek <fnc.bsdbox.org|got.bsdbox.org>
GPG: F2FF 13DE 6A06 C471 CA80  E6E2 2930 DC66 86EE CF68