"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Omar Polo <op@omarpolo.com>
Subject:
Re: optimise reading the file index
To:
Mark Jamsek <mark@jamsek.com>
Cc:
Stefan Sperling <stsp@stsp.name>, "Todd C. Miller" <millert@openbsd.org>, Christian Weisgerber <naddy@mips.inka.de>, gameoftrees@openbsd.org
Date:
Sun, 30 Jul 2023 11:45:45 +0200

Download raw body.

Thread
On 2023/07/30 13:42:26 +1000, Mark Jamsek <mark@jamsek.com> wrote:
> Stefan Sperling <stsp@stsp.name> wrote:
> > Overall, I am very happy with the progress that's being made on this :)
> 
> Likewise! While I don’t really have any issues with got's performance on
> my new daily driver, I sometimes take my x1 when out and about and that
> is much slower! Besides which, I think many small optimisations will
> eventually add up in this area so if we can obtain them without
> complicating things it will be worthwhile. And from chatting with op
> off-list, he’s got a good plan for further related optimisations that
> I'm really looking forward to; I think it'll be great!

As stated elsewhere too, the slowest part of reading the fileindex is
by far building the RB tree.  got_fileindex_tree_RB_INSERT takes 60%
of the runtime, which is all due to got_path_cmp().  This because the
element on disk are already sorted so each insertion happens on the
leaves, requiring many got_path_cmp() calls that adds up to most of
half of the runtime.

I've been playing with various approaches to optimize this, but i'm
not happy with anything yet.

(all discussion below leverage the fact that we don't call
got_fileindex_read() on an already populated fileindex(), so we assume
that the tree is empty when entering this function.)

I first tried to exploit the fact that the entries are sorted by
reading the fileindex in an array and use the RB tree only for runtime
additions.  This makes the reading as fast as possible and doesn't
degrade the query performance sensibly, but it's clunky and quirky to
handle two different data structures for the fileindex (especially for
when we have to traverse the entries in order.)

Diff below is the second approach.  It reaches into the RB tree
internals and does an optimized insert.  Since the fileindex is
sorted, we can exploit this to insert the new node on the right of the
previous inserted node.  I'm inlining half of RB_INSERT(), it's
effectively an RB_INSERT_AFTER().  With this, got_fileindex_read()
drops from 70% of runtime to 30% and got_path_cmp() from 56.99% to
1.79%.

with diff:	https://tmp.omarpolo.com/profile.opt.png
without diff:	https://tmp.omarpolo.com/profile.stock.png

A RB_INSERT_AFTER() operation would be difficult to use as it'd be
easy to corrupt the state of the tree if not used very, very
carefully, so I'm not proposing to add such operation in tree.h nor
mess with the internals of RB_INSERT(); diff below is just for the
sake of the discussion.

Another approach would be to use a different data structure.

We still need a tree, as we have to walk the fileindex in order and
have lookups, but looking at the usage RB trees _might_ be overkill.
I don't have yet read all the code using the fileindex, but usually we
don't add, delete and query the fileindex at the same time.  We always
read the entries from the disk, and such operation needs to be fast,
but otherwise we have different usage patterns: we either only
traverse it, or delete some entries, or add entries.

We should be able to build a perfectly balanced binary tree when
reading since entries are in order, and we might attempt to use a dumb
BST without a particular loss of performance.  or maybe something like
a scapegoat tree (i discovered them just a couple of days ago) that
can self-balance and seems easy to implement.  It's something I'll try
to look into.


diff /home/op/w/got
commit - 0f2e686eec562e28977521d25101acfa4396b47a
path + /home/op/w/got
blob - 62f7df1b95d5d56bd0b23c83df32e7d74f3d4b00
file + lib/fileindex.c
--- lib/fileindex.c
+++ lib/fileindex.c
@@ -696,7 +696,7 @@ got_fileindex_read(struct got_fileindex *fileindex, FI
 	const struct got_error *err = NULL;
 	struct got_fileindex_hdr hdr;
 	struct got_hash ctx;
-	struct got_fileindex_entry *ie;
+	struct got_fileindex_entry *ie, *prev = NULL;
 	uint8_t sha1_expected[SHA1_DIGEST_LENGTH];
 	uint8_t sha1[SHA1_DIGEST_LENGTH];
 	size_t n;
@@ -740,11 +740,31 @@ got_fileindex_read(struct got_fileindex *fileindex, FI
 		err = read_fileindex_entry(&ie, &ctx, infile, hdr.version);
 		if (err)
 			return err;
+#if 0
 		err = add_entry(fileindex, ie);
 		if (err) {
 			got_fileindex_entry_free(ie);
 			return err;
 		}
+		(void)prev;
+#else
+		if (fileindex->nentries >= GOT_FILEIDX_MAX_ENTRIES ||
+		    (prev != NULL && got_fileindex_cmp(prev, ie) > 0)) {
+			got_fileindex_entry_free(ie);
+			return got_error(GOT_ERR_FILEIDX_BAD);
+		}
+
+		fileindex->nentries++;
+
+		RB_SET(ie, prev, entry);
+		if (prev != NULL) {
+			RB_RIGHT(prev, entry) = ie;
+			RB_AUGMENT(prev);
+		} else
+			RB_ROOT(&fileindex->entries) = ie;
+		got_fileindex_tree_RB_INSERT_COLOR(&fileindex->entries, ie);
+		prev = ie;
+#endif
 	}
 
 	n = fread(sha1_expected, 1, sizeof(sha1_expected), infile);