From: Stefan Sperling <stsp@stsp.name>
Subject: Re: optimise reading the file index
To: Mark Jamsek <mark@jamsek.com>
Cc: gameoftrees@openbsd.org
Date: Thu, 27 Jul 2023 12:47:20 +0200

On Thu, Jul 27, 2023 at 06:04:21PM +1000, Mark Jamsek wrote:
> This diff is borne from a discussion with stsp on irc, and attempts to
> make reading the file index faster, which should improve many operations.
> 
> The main change is that we memory map the file index to avoid lots of
> small reads that can be costly in large repos, for example, with many
> files. And we also reduce multiple small allocations down to one
> allocation for the entire padded path length when building each index
> entry. If memory mapping the file fails, however, we fall back to the
> file io status quo.
> 
> Regress is still happy, but I haven't properly profiled this yet. Some
> trivial measurements show a marked improvement, and it appears to be
> perceptibly faster; however, my machine is already quite fast even with
> 'got status' in src.git so I'm looking for wider testing to see if this
> is indeed worth it. In theory it should be, but I'd rather have some
> good measurements.

I am not opposed to using mmap to read the file index if that is indeed
the fastest approach. However, keep in mind that OpenBSD has no unified
buffer cache, which means we cannot mix mmap with regular read/write
operations on the same file. Probably not an issue in this case since
the file index is read completely before being modified and then written
out again. But this also suggests that mmap might not be the best approach.

mmap is most useful for us when running a series of random access reads,
e.g. while reading pack index and pack files. If we compare the performance
benefits of your diff to an operation which reads pack files a lot, such
as gotadmin pack -a or gotadmin indexpack, between a -DGOT_PACK_NO_MMAP build
and a regular build of Got, I would expect the relative difference that mmap
makes in your proposed diff to be smaller (in theory; I haven't actually tried
this for lack of time).

The file index is not very large. Currently about 12MB for src, 8MB for ports.
Even on memory-constrained machines we could probably read the whole file into
memory in one or a few read() calls and parse the data in a temporary malloc
buffer. I suspect this would yield higher speed than small reads via mmap.
mmap is probably faster than the current code only because with mmap we are
asking the kernel to read page-sized blocks from the file, instead of tiny
amounts like sizeof(uint32_t), sizeof(uint64_t) sized-chunks. (Not sure what
the minimun size of a read in the kernel is, probably a 512 bytes sector?
Could be less than a page.)
In any case, the main overhead of the current code should be the amount of
read syscalls it triggers, with the buffer cache compensating somewhat for
our tiny read requests.