From: Stefan Sperling <stsp@stsp.name>
Subject: Re: Got is kinda slow
To: Christian Weisgerber <naddy@mips.inka.de>, gameoftrees@openbsd.org
Date: Mon, 17 Apr 2023 13:07:33 +0200

On Sun, Apr 16, 2023 at 05:02:52PM +0200, Stefan Sperling wrote:
> On Sun, Apr 16, 2023 at 04:45:40PM +0200, Christian Weisgerber wrote:
> > > The OpenBSD README explains how to do
> > > a profile build and get a graph of the result. Not sure about -portable.
> > 
> > I'll just copy the repository over to OpenBSD and try a profile
> > build there.
> 
> Thanks! Not having to run the profiling myself is a big time saver for me.
> 
> It can be a bit tricky. Make sure to only build one of the binaries with
> profiling enabled at a time. Otherwise the gmon.out file will be useless.

The profile graph you sent me off-list shows that got-read-pack is spending
quite some time in got_object_parse_tree_entry(). This happens because you
are specifying a path to 'got log' which then has to filter commits based on
changes made to the specified path. So opening and diffing trees becomes
necessary. Running 'got log' on the root directory and filtering irrevelant
commits externally somehow might be faster.

The profiler breaks got_object_parse_tree_entry() run-time down as follows:

 80% commit_traversal_request
 70% open_tree
   37% got_packfile_extract_object_to_mem
     # boils down to inflate(), nothing can be done here
   44% got_object_parse_tree
     # optimization possible, maybe
     20% mergesort -> 16% pte_cmp -> 20% got_path_cmp
     16% parse_tree_entry() -> 4% memchr + 4% strnlen
 
Optimizing parse_tree_entry() a little bit more might be possible.
The patch below attempts to parse tree entires with just one pass over
the buffer instead of calling strnlen and memchr in addition to parsing.
But does this really make a significant difference?

The mergesort case is too tricky to fix in one patch.
The fundamental problem is that the tree entry order Got uses internally
is subtly different to the order in which Git expects tree entries to
appear in tree objects.

For some reason, when a tree entry represents a directory, Git appends an
invisible slash ('/') to the tree entry's name before sorting with strncmp
order. This results in a situation where e.g. the directory 'got/' and the
file got-version.mk will appear in a different order, depending on whether
the entries are sorted following Git and Got (pathlist) semantics.

At present, Got code which relies on Got's tree parser depends on entries
being returned in pure strncmp() sort order, without imaginary trailing
slashes. This order affects how trees are traversed while comparing them
to on-disk directories in the work tree, for example. If tree entry order
doesn't match entries of on-disk directories then this will easily lead to
misbehaviour during 'got update' and any other commands which compare the
work tree to an in-repository tree. This is why the got_pathlist data
structure exists. It took quite some time to reach stable behaviour in
this aspect of the code. I'd rather avoid having to touch this again.

But I'm afraid the only reasonable way to avoid sorting in the tree parser
would be to adapt Git's trailing-slash-on-directories everywhere and enforce
this order on all trees and on on-disk directories as well (which are kind of
arbitrary I suspect, though usually appear in strncmp order). This could be
done. But it is more complicated than what we do now because a got_pathlist
would need to know whether or not an entry represents a directory to insert
it in the correct position. So this would merely move the problem out of the
tree parser and compliate things elsewhere. The basic issue that Git trees do
not agree with readdir and basic strncmp() will always remain. And Git is
likely stuck with its weird ordering because the order of entries factors
into tree object IDs...

There are also callers which use binary search to find tree entries, which
also rely on strncmp() order and would need to be modified if this assumption
about ordering is broken. It would all be doable, just quite a lot of work.

Alternatively, as quick fix you could try whether qsort is faster than
mergesort on average. But I don't have much hope that this helps much.

-----------------------------------------------
 rewrite parse_tree_entry() to avoid use of memchr and strnlen
 
diff c736b84ab8efb53399d58afe57a2e40c4c7dd1b5 06d4189c5b19b096a27e312083f2650cd81b611f
commit - c736b84ab8efb53399d58afe57a2e40c4c7dd1b5
commit + 06d4189c5b19b096a27e312083f2650cd81b611f
blob - 28536b3cc7243ce070909cda1331a2b7750713e0
blob + 6def3e6d9efbb9faf9e0dcc0c112a9b104839833
--- lib/object_parse.c
+++ lib/object_parse.c
@@ -765,36 +765,44 @@ parse_tree_entry(struct got_parsed_tree_entry *pte, si
 parse_tree_entry(struct got_parsed_tree_entry *pte, size_t *elen, char *buf,
     size_t maxlen)
 {
-	char *p, *space;
+	char *p;
 
 	*elen = 0;
 
-	*elen = strnlen(buf, maxlen) + 1;
-	if (*elen > maxlen)
+	if (maxlen < SHA1_DIGEST_LENGTH + 3)
 		return got_error(GOT_ERR_BAD_OBJ_DATA);
 
-	space = memchr(buf, ' ', *elen);
-	if (space == NULL || space <= buf)
-		return got_error(GOT_ERR_BAD_OBJ_DATA);
-
 	pte->mode = 0;
 	p = buf;
-	while (p < space) {
+	while (p - buf < maxlen && *p != '\0' && *p != ' ') {
 		if (*p < '0' || *p > '7')
 			return got_error(GOT_ERR_BAD_OBJ_DATA);
 		pte->mode <<= 3;
 		pte->mode |= *p - '0';
 		p++;
+		(*elen)++;
 	}
+	if (*elen >= maxlen || *p != ' ')
+		return got_error(GOT_ERR_BAD_OBJ_DATA);
+	p++; /* skip space */
+	(*elen)++;
 
-	if (*elen > maxlen || maxlen - *elen < SHA1_DIGEST_LENGTH)
+	pte->name = p;
+	while (*p != '\0' && p - buf < maxlen)
+		p++;
+	if (p - buf >= maxlen || *p != '\0')
 		return got_error(GOT_ERR_BAD_OBJ_DATA);
 
-	pte->name = space + 1;
-	pte->namelen = strlen(pte->name);
-	buf += *elen;
-	pte->id = buf;
-	*elen += SHA1_DIGEST_LENGTH;
+	pte->namelen = p - pte->name;
+	*elen += pte->namelen;
+
+	p++; /* skip \0 */
+	(*elen)++;
+
+	if (*elen + SHA1_DIGEST_LENGTH > maxlen)
+		return got_error(GOT_ERR_BAD_OBJ_DATA);
+	pte->id = p;
+	(*elen) += SHA1_DIGEST_LENGTH;
 	return NULL;
 }