From: Martin Pieuchot Subject: Re: faster history traversal for 'got blame' To: gameoftrees@openbsd.org Date: Tue, 7 Jan 2020 13:02:13 +0100 On 07/01/20(Tue) 12:01, Stefan Sperling wrote: > On Tue, Jan 07, 2020 at 11:28:28AM +0100, Martin Pieuchot wrote: > > 'tog blame' is definitively faster with this. But still unbearably slow > > (understand unusable) compared to 'tig blame'. > > > > So I confirm the improvement. Thanks :o) > > > > Before applying the diff: > > > > $ time got blame kern/kern_synch.c > /dev/null > > 0m47.82s real 0m40.38s user 0m12.72s system > > > > After: > > > > $ time got blame kern/kern_synch.c > /dev/null > > 0m34.26s real 0m31.13s user 0m03.02s system > > > > Comparing with tig-2.4.1: > > > > $ time git blame kern/kern_synch.c > /dev/null > > 0m09.42s real 0m08.59s user 0m00.60s system > > > > > > Not sure if reducing malloc/free and caching could help you reduce user > > time by a factor of 2 or 3, but that would be awesome :o) > > got-read-pack is reasonably fast with this. I don't expect that more > micro-optimizations will help much, though they should still be worth doing. > > 'got blame' is still slow on files with many revisions. > And file size also matters. Note how 'blame tog/tog.c' in the got repo > is slower than 'blame kern/kern_tc.c', even though the history of got > is much shorter than that of the OpenBSD src repo. > > So I think the next problem we need to solve is that our diff code is slow. > Blame runs the entire file through diff for each commit that touched it. > Browsing sys/dev/pcidevs revision diffs with tog should give you an idea > about how slow diff can be. 'git diff' is super fast compared to our diff. > > See the attached profile graph of the main process. > 22% of 'blame kern/kern_synch.c' is in got_diffreg. > Another 18% is spent opening pack index files. This repository contains > 14 pack files. Perhaps it would run faster with a single large pack. Don't forget to look at the numbers in the brackets! (0.00%) means the function is cheap but its children aren't. Which bring the following question: is the diff implementation slow because of the syscalls it does? Or because of the kernel implementation of such syscalls? Did someone try on another OS? 'tog blame' spends ~13sec in kernel! See how _libc_munmap() accounts for 7% of the overall time, only via got_packid_close(). Another 7% are spent in read(2) plus ~7% in fseek(3), that seems to correspond to the existing diff implementation. got_packidx_init_hdr() is also taking a lot of CPU time. Is it because of the SHA operations? Is git using the sames? Can they be improved? What about the number of betoh32() calls? Can they be reduced? For example: if (betoh32(*h->version) != GOT_PACKIDX_VERSION) ... Could we move the swap on the other side and have it computed only once at build time? Another expensive function is got_object_id_cmp() which compares SHA hashes. It is called often via RB-tree insert. Is a RB-tree necessary? In this use case you have ~20K insertions and ~40k search that both result in ~3.4M comparisons. Would another data structure result in fewer comparisons? And maybe it is possible to traverse commits via a different linking. Isn't a parent/child linking available? Could that be used in some cases?