Download raw body.
optimise reading the file index
On 2023/07/30 13:42:26 +1000, Mark Jamsek <mark@jamsek.com> wrote: > Stefan Sperling <stsp@stsp.name> wrote: > > Overall, I am very happy with the progress that's being made on this :) > > Likewise! While I don’t really have any issues with got's performance on > my new daily driver, I sometimes take my x1 when out and about and that > is much slower! Besides which, I think many small optimisations will > eventually add up in this area so if we can obtain them without > complicating things it will be worthwhile. And from chatting with op > off-list, he’s got a good plan for further related optimisations that > I'm really looking forward to; I think it'll be great! As stated elsewhere too, the slowest part of reading the fileindex is by far building the RB tree. got_fileindex_tree_RB_INSERT takes 60% of the runtime, which is all due to got_path_cmp(). This because the element on disk are already sorted so each insertion happens on the leaves, requiring many got_path_cmp() calls that adds up to most of half of the runtime. I've been playing with various approaches to optimize this, but i'm not happy with anything yet. (all discussion below leverage the fact that we don't call got_fileindex_read() on an already populated fileindex(), so we assume that the tree is empty when entering this function.) I first tried to exploit the fact that the entries are sorted by reading the fileindex in an array and use the RB tree only for runtime additions. This makes the reading as fast as possible and doesn't degrade the query performance sensibly, but it's clunky and quirky to handle two different data structures for the fileindex (especially for when we have to traverse the entries in order.) Diff below is the second approach. It reaches into the RB tree internals and does an optimized insert. Since the fileindex is sorted, we can exploit this to insert the new node on the right of the previous inserted node. I'm inlining half of RB_INSERT(), it's effectively an RB_INSERT_AFTER(). With this, got_fileindex_read() drops from 70% of runtime to 30% and got_path_cmp() from 56.99% to 1.79%. with diff: https://tmp.omarpolo.com/profile.opt.png without diff: https://tmp.omarpolo.com/profile.stock.png A RB_INSERT_AFTER() operation would be difficult to use as it'd be easy to corrupt the state of the tree if not used very, very carefully, so I'm not proposing to add such operation in tree.h nor mess with the internals of RB_INSERT(); diff below is just for the sake of the discussion. Another approach would be to use a different data structure. We still need a tree, as we have to walk the fileindex in order and have lookups, but looking at the usage RB trees _might_ be overkill. I don't have yet read all the code using the fileindex, but usually we don't add, delete and query the fileindex at the same time. We always read the entries from the disk, and such operation needs to be fast, but otherwise we have different usage patterns: we either only traverse it, or delete some entries, or add entries. We should be able to build a perfectly balanced binary tree when reading since entries are in order, and we might attempt to use a dumb BST without a particular loss of performance. or maybe something like a scapegoat tree (i discovered them just a couple of days ago) that can self-balance and seems easy to implement. It's something I'll try to look into. diff /home/op/w/got commit - 0f2e686eec562e28977521d25101acfa4396b47a path + /home/op/w/got blob - 62f7df1b95d5d56bd0b23c83df32e7d74f3d4b00 file + lib/fileindex.c --- lib/fileindex.c +++ lib/fileindex.c @@ -696,7 +696,7 @@ got_fileindex_read(struct got_fileindex *fileindex, FI const struct got_error *err = NULL; struct got_fileindex_hdr hdr; struct got_hash ctx; - struct got_fileindex_entry *ie; + struct got_fileindex_entry *ie, *prev = NULL; uint8_t sha1_expected[SHA1_DIGEST_LENGTH]; uint8_t sha1[SHA1_DIGEST_LENGTH]; size_t n; @@ -740,11 +740,31 @@ got_fileindex_read(struct got_fileindex *fileindex, FI err = read_fileindex_entry(&ie, &ctx, infile, hdr.version); if (err) return err; +#if 0 err = add_entry(fileindex, ie); if (err) { got_fileindex_entry_free(ie); return err; } + (void)prev; +#else + if (fileindex->nentries >= GOT_FILEIDX_MAX_ENTRIES || + (prev != NULL && got_fileindex_cmp(prev, ie) > 0)) { + got_fileindex_entry_free(ie); + return got_error(GOT_ERR_FILEIDX_BAD); + } + + fileindex->nentries++; + + RB_SET(ie, prev, entry); + if (prev != NULL) { + RB_RIGHT(prev, entry) = ie; + RB_AUGMENT(prev); + } else + RB_ROOT(&fileindex->entries) = ie; + got_fileindex_tree_RB_INSERT_COLOR(&fileindex->entries, ie); + prev = ie; +#endif } n = fread(sha1_expected, 1, sizeof(sha1_expected), infile);
optimise reading the file index