Got is kinda slow

On Mon, Apr 17, 2023 at 11:34:33PM +0200, Christian Weisgerber wrote: > Stefan Sperling: > > I am afraid the only way I see to speed this up is to get rid of entry > > sorting in the tree parser and then deal with the consequences of doing so. > > Which isn't going to be pretty. Does anyone else have other ideas? > > If I understand the profiling graph correctly, got spends 1/3 of > its time in inflate(). I would naively think that git also needs > to decompress the same data, but its total runtime is a fraction > of the time got spends decompressing. How can that be? And git > log is not multithreaded. Good point. Thinking a bit more about this I still believe this boils down to caching. Your profile graph shows that most calls to inflate() result from got_dump_delta_chain_to_mem(). The FreeBSD ports devel/ directory tree objects throughout history will likely be represented with deltas in the pack file. Whenever we open a tree in got-read-pack we read the entire delta chain from the pack file to reconstruct the tree object. Without caching this includes decompression of deltas and delta bases, which we may already have decompressed previously while reading other (versions of the same) trees. So in the worst case to open 1 tree we have to decompress N delta objects. Now, there is a delta cache in use by got-read-pack, but it is evidently not being used effectively in the case we are looking at. The run-time spent in got_delta_cache functions in your profile graph is tiny compared to the time spent in code paths reading trees and deltas. This cache is supposed to prevent us from spending too much time reading deltas but it fails to meet that goal in this scenario.

2023-04-17 15:13 Stefan Sperling:
Got is kinda slow
- 2023-04-17 21:34 Christian Weisgerber:
  Got is kinda slow
- - 2023-04-18 08:06 Stefan Sperling:
    Got is kinda slow
  - 2023-04-19 12:14 Stefan Sperling:
    Got is kinda slow
  - 2023-04-23 21:40 Stefan Sperling:
    Got is kinda slow