Got is kinda slow

On Tue, Apr 18, 2023 at 10:06:47AM +0200, Stefan Sperling wrote: > On Mon, Apr 17, 2023 at 11:34:33PM +0200, Christian Weisgerber wrote: > > Stefan Sperling: > > > I am afraid the only way I see to speed this up is to get rid of entry > > > sorting in the tree parser and then deal with the consequences of doing so. > > > Which isn't going to be pretty. Does anyone else have other ideas? > > > > If I understand the profiling graph correctly, got spends 1/3 of > > its time in inflate(). I would naively think that git also needs > > to decompress the same data, but its total runtime is a fraction > > of the time got spends decompressing. How can that be? And git > > log is not multithreaded. > > Good point. Thinking a bit more about this I still believe this > boils down to caching. > > Your profile graph shows that most calls to inflate() result from > got_dump_delta_chain_to_mem(). > > The FreeBSD ports devel/ directory tree objects throughout history > will likely be represented with deltas in the pack file. Whenever we > open a tree in got-read-pack we read the entire delta chain from the > pack file to reconstruct the tree object. Without caching this includes > decompression of deltas and delta bases, which we may already have > decompressed previously while reading other (versions of the same) trees. > So in the worst case to open 1 tree we have to decompress N delta objects. > > Now, there is a delta cache in use by got-read-pack, but it is evidently > not being used effectively in the case we are looking at. The run-time > spent in got_delta_cache functions in your profile graph is tiny compared > to the time spent in code paths reading trees and deltas. This cache is > supposed to prevent us from spending too much time reading deltas but it > fails to meet that goal in this scenario. The patch below implements fulltext caching in the delta cache. This is a proof of concept, not ready for commit mostly because we would need to carefully evaluate the various constants that control memory usage. For now, I've sized the cache such that all fulltexts in your test case will fit. This shaves off about 10% of time spent inflating data. But got-read-pack uses about 300MB of memory now, instead of about 150MB. And this is not enough to make 'got log path' reasonably fast. It is still much too slow in your libpciaccess test case. Before: https://stsp.name/images/got-read-pack-no-fulltext-cache.png After: https://stsp.name/images/got-read-pack-with-fulltext-cache.png I guess we would need to change the logging algorithm to get anywhere near Git's performance. I don't know how they do it (their code is way too opaque for me to make much sense of it) and I don't have good ideas myself. Could someone explain Git's approach to me? Can we somehow avoid opening so many trees? Right now we walk through commit history in sequence and compare the tree of commit N to its parent commit N-1, over and over. It would be nice to be able to skip some commits somehow. One problem is that a tree could be changed in commit N, and then in commit N+1 it could be switched back to the same tree that was present in commit N-1. So I don't see a realiable way to sample ranges of commits for tree IDs without a risk of missing out on changes where a tree was flipped back to a previous state. There is an easy way to cut the number of open_tree calls in half, by keeping one of the two trees open and re-using it in the next iteration. But with fulltext caching the second open_tree() becomes a mempcy(). So this is unlikely to get us the 10x to 100x performance gain we need. diff refs/heads/main refs/heads/delta-cache-fulltext commit - b77715b73bccc5e7b56252864e0b7902a3a17626 commit + 1fe997b49ad055cb104d25c713ca265781d11181 blob - 495bc5a1d44efc6d760792f0976a4cbf808e0b8a blob + 6039123ccb60450f64dd5c029c416d07221c8354 --- Makefile.inc +++ Makefile.inc @@ -3,7 +3,7 @@ CPPFLAGS += -DGOT_LIBEXECDIR=${LIBEXECDIR} -DGOT_VERSI #CFLAGS += -DGOT_NO_OBJ_CACHE #CFLAGS += -DGOT_NO_DELTA_CACHE #CFLAGS += -DGOT_OBJ_CACHE_DEBUG -#CFLAGS += -DGOT_DELTA_CACHE_DEBUG +CFLAGS += -DGOT_DELTA_CACHE_DEBUG #CFLAGS += -DGOT_DIFF_NO_MMAP .if "${GOT_RELEASE}" == "Yes" blob - a88ea5659f6ff7be69d846ef2825a6e32b5963a7 blob + a819c0e7ff72877f305908cd18df9174d9d0519c --- lib/delta_cache.c +++ lib/delta_cache.c @@ -40,15 +40,18 @@ #define nitems(_a) (sizeof(_a) / sizeof((_a)[0])) #endif -#define GOT_DELTA_CACHE_MIN_BUCKETS 64 -#define GOT_DELTA_CACHE_MAX_BUCKETS 2048 -#define GOT_DELTA_CACHE_MAX_CHAIN 2 -#define GOT_DELTA_CACHE_MAX_DELTA_SIZE 1024 +#define GOT_DELTA_CACHE_MIN_BUCKETS 64 +#define GOT_DELTA_CACHE_MAX_BUCKETS 2048 +#define GOT_DELTA_CACHE_MAX_CHAIN 2 +#define GOT_DELTA_CACHE_MAX_DELTA_SIZE 5145 +#define GOT_DELTA_CACHE_MAX_FULLTEXT_SIZE (325182*2) struct got_cached_delta { off_t offset; uint8_t *data; size_t len; + uint8_t *fulltext; + size_t fulltext_len; }; struct got_delta_cache_head { @@ -62,10 +65,13 @@ struct got_delta_cache { unsigned int totelem; int cache_search; int cache_hit; + int cache_hit_fulltext; int cache_miss; int cache_evict; int cache_toolarge; + int cache_toolarge_fulltext; int cache_maxtoolarge; + int cache_maxtoolarge_fulltext; unsigned int flags; #define GOT_DELTA_CACHE_F_NOMEM 0x01 SIPHASH_KEY key; @@ -105,10 +111,14 @@ got_delta_cache_free(struct got_delta_cache *cache) #ifdef GOT_DELTA_CACHE_DEBUG fprintf(stderr, "%s: delta cache: %u elements, %d searches, %d hits, " - "%d missed, %d evicted, %d too large (max %d)\n", getprogname(), - cache->totelem, cache->cache_search, cache->cache_hit, + "%d fulltext hits, %d missed, %d evicted, %d too large (max %d), " + "%d too large fulltext (max %d)\n", + getprogname(), cache->totelem, cache->cache_search, + cache->cache_hit, cache->cache_hit_fulltext, cache->cache_miss, cache->cache_evict, cache->cache_toolarge, - cache->cache_maxtoolarge); + cache->cache_maxtoolarge, + cache->cache_toolarge_fulltext, + cache->cache_maxtoolarge_fulltext); #endif for (i = 0; i < cache->nbuckets; i++) { struct got_delta_cache_head *head; @@ -222,6 +232,7 @@ got_delta_cache_add(struct got_delta_cache *cache, if (head->nchain >= nitems(head->entries)) { delta = &head->entries[head->nchain - 1]; free(delta->data); + free(delta->fulltext); memset(delta, 0, sizeof(*delta)); head->nchain--; cache->totelem--; @@ -232,6 +243,8 @@ got_delta_cache_add(struct got_delta_cache *cache, delta->offset = delta_data_offset; delta->data = delta_data; delta->len = delta_len; + delta->fulltext = NULL; + delta->fulltext_len = 0; head->nchain++; cache->totelem++; @@ -239,8 +252,56 @@ void #endif } +const struct got_error * +got_delta_cache_add_fulltext(struct got_delta_cache *cache, + off_t delta_data_offset, uint8_t *fulltext, size_t fulltext_len) +{ +#ifdef GOT_NO_DELTA_CACHE + return got_error(GOT_ERR_NO_SPACE); +#else + struct got_cached_delta *delta; + struct got_delta_cache_head *head; + uint64_t idx; + int i; + + if (fulltext_len > GOT_DELTA_CACHE_MAX_FULLTEXT_SIZE) { + cache->cache_toolarge_fulltext++; + if (fulltext_len > cache->cache_maxtoolarge) + cache->cache_maxtoolarge_fulltext = fulltext_len; + return got_error(GOT_ERR_NO_SPACE); + } + + idx = delta_cache_hash(cache, delta_data_offset) % cache->nbuckets; + head = &cache->buckets[idx]; + + for (i = 0; i < head->nchain; i++) { + delta = &head->entries[i]; + if (delta->offset != delta_data_offset) + continue; + if (i > 0) { + struct got_cached_delta tmp; + memcpy(&tmp, &head->entries[0], sizeof(tmp)); + memcpy(&head->entries[0], &head->entries[i], + sizeof(head->entries[0])); + memcpy(&head->entries[i], &tmp, + sizeof(head->entries[i])); + delta = &head->entries[0]; + } + delta->fulltext = malloc(fulltext_len); + if (delta->fulltext == NULL) + return got_error_from_errno("malloc"); + memcpy(delta->fulltext, fulltext, fulltext_len); + delta->fulltext_len = fulltext_len; + break; + } + + return NULL; +#endif +} + void got_delta_cache_get(uint8_t **delta_data, size_t *delta_len, + uint8_t **fulltext, size_t *fulltext_len, struct got_delta_cache *cache, off_t delta_data_offset) { uint64_t idx; @@ -254,6 +315,10 @@ got_delta_cache_get(uint8_t **delta_data, size_t *delt cache->cache_search++; *delta_data = NULL; *delta_len = 0; + if (fulltext) + *fulltext = NULL; + if (fulltext_len) + *fulltext_len = 0; for (i = 0; i < head->nchain; i++) { delta = &head->entries[i]; if (delta->offset != delta_data_offset) @@ -270,6 +335,13 @@ got_delta_cache_get(uint8_t **delta_data, size_t *delt } *delta_data = delta->data; *delta_len = delta->len; + if (fulltext && fulltext_len && + delta->fulltext && delta->fulltext_len) { + *fulltext = delta->fulltext; + *fulltext_len = delta->fulltext_len; + cache->cache_hit_fulltext++; + } + return; } blob - 7da86957201fd53ae69d5deb71b64d3bf6216599 blob + 03111a87e65fe8d7de030236276b4a0961ab8d05 --- lib/got_lib_delta_cache.h +++ lib/got_lib_delta_cache.h @@ -21,4 +21,7 @@ void got_delta_cache_get(uint8_t **, size_t *, struct const struct got_error *got_delta_cache_add(struct got_delta_cache *, off_t, uint8_t *, size_t); -void got_delta_cache_get(uint8_t **, size_t *, struct got_delta_cache *, off_t); +const struct got_error *got_delta_cache_add_fulltext(struct got_delta_cache *, + off_t , uint8_t *, size_t); +void got_delta_cache_get(uint8_t **, size_t *, uint8_t **, size_t *, + struct got_delta_cache *, off_t); blob - a9b71c809585eb6a270c9f49e602be196d76f3ad blob + 4bef4fa0fe94d62da5c1003246d35e196a4e04dc --- lib/pack.c +++ lib/pack.c @@ -1316,7 +1316,8 @@ got_pack_get_delta_chain_max_size(uint64_t *max_size, if (pack->delta_cache) { got_delta_cache_get(&delta_buf, &delta_len, - pack->delta_cache, delta->data_offset); + NULL, NULL, pack->delta_cache, + delta->data_offset); } if (delta_buf == NULL) { cached = 0; @@ -1370,7 +1371,7 @@ got_pack_dump_delta_chain_to_file(size_t *result_size, const struct got_error *err = NULL; struct got_delta *delta; uint8_t *base_buf = NULL, *accum_buf = NULL; - size_t base_bufsz = 0, accum_bufsz = 0, accum_size = 0, delta_len; + size_t base_bufsz = 0, accum_bufsz = 0, accum_size = 0; /* We process small enough files entirely in memory for speed. */ const size_t max_bufsize = GOT_DELTA_RESULT_SIZE_CACHED_MAX; uint64_t max_size = 0; @@ -1381,6 +1382,23 @@ got_pack_dump_delta_chain_to_file(size_t *result_size, if (STAILQ_EMPTY(&deltas->entries)) return got_error(GOT_ERR_BAD_DELTA_CHAIN); + if (pack->delta_cache) { + uint8_t *delta_buf = NULL, *fulltext = NULL; + size_t delta_len, fulltext_len; + + delta = STAILQ_LAST(&deltas->entries, got_delta, entry); + got_delta_cache_get(&delta_buf, &delta_len, + &fulltext, &fulltext_len, + pack->delta_cache, delta->data_offset); + if (fulltext) { + size_t w = fwrite(fulltext, 1, fulltext_len, outfile); + if (w != fulltext_len) + return got_ferror(outfile, GOT_ERR_IO); + *result_size = fulltext_len; + return NULL; + } + } + if (fseeko(base_file, 0L, SEEK_SET) == -1) return got_error_from_errno("fseeko"); if (fseeko(accum_file, 0L, SEEK_SET) == -1) @@ -1388,7 +1406,8 @@ got_pack_dump_delta_chain_to_file(size_t *result_size, /* Deltas are ordered in ascending order. */ STAILQ_FOREACH(delta, &deltas->entries, entry) { - uint8_t *delta_buf = NULL; + uint8_t *delta_buf = NULL, *fulltext = NULL; + size_t delta_len, fulltext_len; uint64_t base_size, result_size = 0; int cached = 1; if (n == 0) { @@ -1475,6 +1494,7 @@ got_pack_dump_delta_chain_to_file(size_t *result_size, if (pack->delta_cache) { got_delta_cache_get(&delta_buf, &delta_len, + &fulltext, &fulltext_len, pack->delta_cache, delta->data_offset); } if (delta_buf == NULL) { @@ -1506,6 +1526,8 @@ got_pack_dump_delta_chain_to_file(size_t *result_size, max_size = base_size; if (result_size > max_size) max_size = result_size; + if (fulltext_len > max_size) + max_size = fulltext_len; if (base_buf && max_size > max_bufsize) { /* Switch from buffers to temporary files. */ @@ -1548,21 +1570,41 @@ got_pack_dump_delta_chain_to_file(size_t *result_size, } if (base_buf) { - err = got_delta_apply_in_mem(base_buf, base_bufsz, - delta_buf, delta_len, accum_buf, - &accum_size, max_size); + if (fulltext) { + memcpy(accum_buf, fulltext, fulltext_len); + accum_size = fulltext_len; + err = NULL; + } else { + err = got_delta_apply_in_mem(base_buf, base_bufsz, + delta_buf, delta_len, accum_buf, + &accum_size, max_size); + } n++; + if (!cached) + free(delta_buf); + if (err) + goto done; + if (fulltext == NULL) { + err = got_delta_cache_add_fulltext( + pack->delta_cache, delta->data_offset, + accum_buf, accum_size); + if (err) { + if (err->code != GOT_ERR_NO_SPACE) + goto done; + err = NULL; + } + } } else { err = got_delta_apply(base_file, delta_buf, delta_len, /* Final delta application writes to output file. */ ++n < deltas->nentries ? accum_file : outfile, &accum_size); + if (!cached) + free(delta_buf); + if (err) + goto done; } - if (!cached) - free(delta_buf); - if (err) - goto done; if (n < deltas->nentries) { /* Accumulated delta becomes the new base. */ @@ -1604,7 +1646,7 @@ got_pack_dump_delta_chain_to_mem(uint8_t **outbuf, siz const struct got_error *err = NULL; struct got_delta *delta; uint8_t *base_buf = NULL, *accum_buf = NULL; - size_t base_bufsz = 0, accum_bufsz = 0, accum_size = 0, delta_len; + size_t base_bufsz = 0, accum_bufsz = 0, accum_size = 0; uint64_t max_size = 0; int n = 0; @@ -1614,9 +1656,28 @@ got_pack_dump_delta_chain_to_mem(uint8_t **outbuf, siz if (STAILQ_EMPTY(&deltas->entries)) return got_error(GOT_ERR_BAD_DELTA_CHAIN); + if (pack->delta_cache) { + uint8_t *delta_buf = NULL, *fulltext = NULL; + size_t delta_len, fulltext_len; + + delta = STAILQ_LAST(&deltas->entries, got_delta, entry); + got_delta_cache_get(&delta_buf, &delta_len, + &fulltext, &fulltext_len, + pack->delta_cache, delta->data_offset); + if (fulltext) { + *outbuf = malloc(fulltext_len); + if (*outbuf == NULL) + return got_error_from_errno("malloc"); + memcpy(*outbuf, fulltext, fulltext_len); + *outlen = fulltext_len; + return NULL; + } + } + /* Deltas are ordered in ascending order. */ STAILQ_FOREACH(delta, &deltas->entries, entry) { - uint8_t *delta_buf = NULL; + uint8_t *delta_buf = NULL, *fulltext = NULL; + size_t delta_len, fulltext_len = 0; uint64_t base_size, result_size = 0; int cached = 1; if (n == 0) { @@ -1637,10 +1698,26 @@ got_pack_dump_delta_chain_to_mem(uint8_t **outbuf, siz goto done; } + if (pack->delta_cache) { + got_delta_cache_get(&delta_buf, &delta_len, + &fulltext, &fulltext_len, + pack->delta_cache, delta_data_offset); + } + if (delta->size > max_size) max_size = delta->size; + if (delta->size > fulltext_len) + max_size = fulltext_len; - if (pack->map) { + if (fulltext) { + base_buf = malloc(fulltext_len); + if (base_buf == NULL) { + err = got_error_from_errno("malloc"); + goto done; + } + memcpy(base_buf, fulltext, fulltext_len); + base_bufsz = fulltext_len; + } else if (pack->map) { size_t mapoff; if (delta_data_offset > SIZE_MAX) { @@ -1667,11 +1744,30 @@ got_pack_dump_delta_chain_to_mem(uint8_t **outbuf, siz if (err) goto done; n++; + + if (pack->delta_cache && fulltext == NULL) { + err = got_delta_cache_add(pack->delta_cache, + delta_data_offset, NULL, 0); + if (err) { + if (err->code != GOT_ERR_NO_SPACE) + goto done; + err = NULL; + } else { + err = got_delta_cache_add_fulltext( + pack->delta_cache, + delta_data_offset, + fulltext, fulltext_len); + if (err && err->code != GOT_ERR_NO_SPACE) + goto done; + err = NULL; + } + } continue; } if (pack->delta_cache) { got_delta_cache_get(&delta_buf, &delta_len, + &fulltext, &fulltext_len, pack->delta_cache, delta->data_offset); } if (delta_buf == NULL) { @@ -1703,6 +1799,8 @@ got_pack_dump_delta_chain_to_mem(uint8_t **outbuf, siz max_size = base_size; if (result_size > max_size) max_size = result_size; + if (fulltext_len > max_size) + max_size = fulltext_len; if (max_size > base_bufsz) { uint8_t *p = realloc(base_buf, max_size); @@ -1728,15 +1826,31 @@ got_pack_dump_delta_chain_to_mem(uint8_t **outbuf, siz accum_bufsz = max_size; } - err = got_delta_apply_in_mem(base_buf, base_bufsz, - delta_buf, delta_len, accum_buf, - &accum_size, max_size); + if (fulltext) { + memcpy(accum_buf, fulltext, fulltext_len); + accum_size = fulltext_len; + err = NULL; + } else { + err = got_delta_apply_in_mem(base_buf, base_bufsz, + delta_buf, delta_len, accum_buf, + &accum_size, max_size); + } if (!cached) free(delta_buf); n++; if (err) goto done; + if (fulltext == NULL) { + err = got_delta_cache_add_fulltext(pack->delta_cache, + delta->data_offset, accum_buf, accum_size); + if (err) { + if (err->code != GOT_ERR_NO_SPACE) + goto done; + err = NULL; + } + } + if (n < deltas->nentries) { /* Accumulated delta becomes the new base. */ uint8_t *tmp = accum_buf;

2023-04-17 21:34 Christian Weisgerber:
Got is kinda slow
- 2023-04-18 08:06 Stefan Sperling:
  Got is kinda slow
- 2023-04-19 12:14 Stefan Sperling:
  Got is kinda slow
- 2023-04-23 21:40 Stefan Sperling:
  Got is kinda slow
- 2023-04-24 12:34 Mark Jamsek:
  Got is kinda slow