From: Stefan Sperling Subject: Re: expose chunk offsets in the diff API To: Mark Jamsek Cc: Game of Trees Date: Sun, 19 Feb 2023 18:34:31 +0100 On Mon, Feb 20, 2023 at 12:18:00AM +1100, Mark Jamsek wrote: > On 23-02-20 12:05AM, Mark Jamsek wrote: > > As discussed with stsp, if we expose each chunk's offset, we can more > > efficiently seek to where we need to start parsing lines rather than > > skipping each line till we get there. The below diff makes this change. > > I completely forgot to explain that atom->pos is indeed set whether > mmap'd or not--it is just that atom->at is _only_ set when mmap'd. > This is done in the atomize routines in diff_atomize_text.c. > Perhaps the comment in the diff_main.h header could be more precise? Indeed. The code is good but the comment is misleading. We should fix the comment in diff.git and eventually sync over the change. > > This also requires tightening the check for chunks with only added lines > > so we don't grab the offset to unchanged lines (i.e., right_count == > > left_count). On that note, should we also expose the diff_chunk_type() > > routine? I would say yes. Having this in the public diff APi will be useful as we expose more of the internals to improve caller efficiency. > > Regress is obviously unaffected by this change. > > - while (ln < cc.right.start) { > > - err = skip_one_line(ondisk_file); The pre-declaration for skip_one_line() which was added in the original diff could be deleted again. ok for the diff.