"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Stefan Sperling <stsp@stsp.name>
Subject:
Re: Got is kinda slow
To:
Christian Weisgerber <naddy@mips.inka.de>
Cc:
gameoftrees@openbsd.org
Date:
Tue, 18 Apr 2023 10:06:47 +0200

Download raw body.

Thread
On Mon, Apr 17, 2023 at 11:34:33PM +0200, Christian Weisgerber wrote:
> Stefan Sperling:
> > I am afraid the only way I see to speed this up is to get rid of entry
> > sorting in the tree parser and then deal with the consequences of doing so.
> > Which isn't going to be pretty. Does anyone else have other ideas?
> 
> If I understand the profiling graph correctly, got spends 1/3 of
> its time in inflate().  I would naively think that git also needs
> to decompress the same data, but its total runtime is a fraction
> of the time got spends decompressing.  How can that be?  And git
> log is not multithreaded.

Good point. Thinking a bit more about this I still believe this 
boils down to caching.

Your profile graph shows that most calls to inflate() result from
got_dump_delta_chain_to_mem().

The FreeBSD ports devel/ directory tree objects throughout history
will likely be represented with deltas in the pack file. Whenever we
open a tree in got-read-pack we read the entire delta chain from the
pack file to reconstruct the tree object. Without caching this includes
decompression of deltas and delta bases, which we may already have
decompressed previously while reading other (versions of the same) trees.
So in the worst case to open 1 tree we have to decompress N delta objects.

Now, there is a delta cache in use by got-read-pack, but it is evidently
not being used effectively in the case we are looking at. The run-time
spent in got_delta_cache functions in your profile graph is tiny compared
to the time spent in code paths reading trees and deltas. This cache is
supposed to prevent us from spending too much time reading deltas but it
fails to meet that goal in this scenario.