"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Martin Pieuchot <mpi@openbsd.org>
Subject:
Re: faster history traversal for 'got blame'
To:
gameoftrees@openbsd.org
Date:
Tue, 7 Jan 2020 13:02:13 +0100

Download raw body.

Thread
On 07/01/20(Tue) 12:01, Stefan Sperling wrote:
> On Tue, Jan 07, 2020 at 11:28:28AM +0100, Martin Pieuchot wrote:
> > 'tog blame' is definitively faster with this.  But still unbearably slow
> > (understand unusable) compared to 'tig blame'.
> > 
> > So I confirm the improvement.  Thanks :o)
> > 
> > Before applying the diff:
> > 
> > 	$ time got blame kern/kern_synch.c  > /dev/null
> > 	    0m47.82s real     0m40.38s user     0m12.72s system
> > 
> > After:
> > 
> > 	$ time got blame kern/kern_synch.c  > /dev/null
> > 	    0m34.26s real     0m31.13s user     0m03.02s system
> > 
> > Comparing with tig-2.4.1:
> > 
> > 	$ time git blame kern/kern_synch.c > /dev/null
> > 	    0m09.42s real     0m08.59s user     0m00.60s system
> > 
> > 
> > Not sure if reducing malloc/free and caching could help you reduce user
> > time by a factor of 2 or 3, but that would be awesome :o)
> 
> got-read-pack is reasonably fast with this. I don't expect that more 
> micro-optimizations will help much, though they should still be worth doing.
> 
> 'got blame' is still slow on files with many revisions.
> And file size also matters. Note how 'blame tog/tog.c' in the got repo
> is slower than 'blame kern/kern_tc.c', even though the history of got
> is much shorter than that of the OpenBSD src repo.
> 
> So I think the next problem we need to solve is that our diff code is slow.
> Blame runs the entire file through diff for each commit that touched it.
> Browsing sys/dev/pcidevs revision diffs with tog should give you an idea
> about how slow diff can be. 'git diff' is super fast compared to our diff.
> 
> See the attached profile graph of the main process.
> 22% of 'blame kern/kern_synch.c' is in got_diffreg.
> Another 18% is spent opening pack index files. This repository contains
> 14 pack files. Perhaps it would run faster with a single large pack.

Don't forget to look at the numbers in the brackets!  (0.00%) means the
function is cheap but its children aren't.  Which bring the following
question: is the diff implementation slow because of the syscalls it
does?  Or because of the kernel implementation of such syscalls?  Did
someone try on another OS?

'tog blame' spends ~13sec in kernel!  See how _libc_munmap() accounts
for 7% of the overall time, only via got_packid_close().

Another 7% are spent in read(2) plus ~7% in fseek(3), that seems to
correspond to the existing diff implementation.

got_packidx_init_hdr() is also taking a lot of CPU time.  Is it because
of the SHA operations?  Is git using the sames?  Can they be improved?

What about the number of betoh32() calls?  Can they be reduced?  For
example:

	if (betoh32(*h->version) != GOT_PACKIDX_VERSION)
		...

Could we move the swap on the other side and have it computed only once
at build time?

Another expensive function is got_object_id_cmp() which compares SHA
hashes.  It is called often via RB-tree insert.  Is a RB-tree necessary?
In this use case you have ~20K insertions and ~40k search that both
result in ~3.4M comparisons.  Would another data structure result in
fewer comparisons?

And maybe it is possible to traverse commits via a different linking.
Isn't a parent/child linking available?  Could that be used in some cases?