From: Stefan Sperling Subject: Re: check RB_INSERT() return value To: gameoftrees@openbsd.org Date: Thu, 14 Apr 2022 17:52:38 +0200 On Wed, Apr 13, 2022 at 10:22:46PM +0200, Stefan Sperling wrote: > In the future, we could use this to optimize code which performs > a presence check before doing an insertion. > For example, assuming we propagate RB_INSERT() errors up via the > function got_object_idset_add(), instead of having code which does: > > if (!got_object_idset_contains(idset, id)) { > err = got_object_idset_add(idset, id, NULL); > if (err) > goto done; > } > > ... we could have code which does the same thing with just > a single walk of the RB tree instead of two walks: > > err = got_object_idset_add(idset, id, NULL); > if (err && err->code != GOT_ERR_OBJ_EXISTS) > goto done; > > I have a WIP patch which makes some changes of the above nature > in pack_create.c, but it does not help with performance (yet?). Further experiments suggest that the above idea isn't worth it. gotadmin pack -a in the OpenBSD ports repo takes 9 minutes on my machine to color commits and enumerate all objects, and this does not change significantly if the above change is made. We should gain a reasonable performance boost by implementing the object ID set on top of a data structure that supports O(1) lookups. Observing the progress reported while packing the ports repo makes it obvious how our current code gets slower and slower with time.