"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Stefan Sperling <stsp@stsp.name>
Subject:
Re: expose chunk offsets in the diff API
To:
Mark Jamsek <mark@jamsek.com>
Cc:
Game of Trees <gameoftrees@openbsd.org>
Date:
Sun, 19 Feb 2023 18:34:31 +0100

Download raw body.

Thread
On Mon, Feb 20, 2023 at 12:18:00AM +1100, Mark Jamsek wrote:
> On 23-02-20 12:05AM, Mark Jamsek wrote:
> > As discussed with stsp, if we expose each chunk's offset, we can more
> > efficiently seek to where we need to start parsing lines rather than
> > skipping each line till we get there. The below diff makes this change.
> 
> I completely forgot to explain that atom->pos is indeed set whether
> mmap'd or not--it is just that atom->at is _only_ set when mmap'd.
> This is done in the atomize routines in diff_atomize_text.c.
> Perhaps the comment in the diff_main.h header could be more precise?

Indeed. The code is good but the comment is misleading. We should
fix the comment in diff.git and eventually sync over the change.

> > This also requires tightening the check for chunks with only added lines
> > so we don't grab the offset to unchanged lines (i.e., right_count ==
> > left_count). On that note, should we also expose the diff_chunk_type()
> > routine?

I would say yes. Having this in the public diff APi will be useful as
we expose more of the internals to improve caller efficiency.

> > Regress is obviously unaffected by this change.

> > -		while (ln < cc.right.start) {
> > -			err = skip_one_line(ondisk_file);

The pre-declaration for skip_one_line() which was added in the original
diff could be deleted again.

ok for the diff.