From: Stefan Sperling Subject: Re: optimise reading the file index To: Mark Jamsek Cc: gameoftrees@openbsd.org Date: Thu, 27 Jul 2023 12:47:20 +0200 On Thu, Jul 27, 2023 at 06:04:21PM +1000, Mark Jamsek wrote: > This diff is borne from a discussion with stsp on irc, and attempts to > make reading the file index faster, which should improve many operations. > > The main change is that we memory map the file index to avoid lots of > small reads that can be costly in large repos, for example, with many > files. And we also reduce multiple small allocations down to one > allocation for the entire padded path length when building each index > entry. If memory mapping the file fails, however, we fall back to the > file io status quo. > > Regress is still happy, but I haven't properly profiled this yet. Some > trivial measurements show a marked improvement, and it appears to be > perceptibly faster; however, my machine is already quite fast even with > 'got status' in src.git so I'm looking for wider testing to see if this > is indeed worth it. In theory it should be, but I'd rather have some > good measurements. I am not opposed to using mmap to read the file index if that is indeed the fastest approach. However, keep in mind that OpenBSD has no unified buffer cache, which means we cannot mix mmap with regular read/write operations on the same file. Probably not an issue in this case since the file index is read completely before being modified and then written out again. But this also suggests that mmap might not be the best approach. mmap is most useful for us when running a series of random access reads, e.g. while reading pack index and pack files. If we compare the performance benefits of your diff to an operation which reads pack files a lot, such as gotadmin pack -a or gotadmin indexpack, between a -DGOT_PACK_NO_MMAP build and a regular build of Got, I would expect the relative difference that mmap makes in your proposed diff to be smaller (in theory; I haven't actually tried this for lack of time). The file index is not very large. Currently about 12MB for src, 8MB for ports. Even on memory-constrained machines we could probably read the whole file into memory in one or a few read() calls and parse the data in a temporary malloc buffer. I suspect this would yield higher speed than small reads via mmap. mmap is probably faster than the current code only because with mmap we are asking the kernel to read page-sized blocks from the file, instead of tiny amounts like sizeof(uint32_t), sizeof(uint64_t) sized-chunks. (Not sure what the minimun size of a read in the kernel is, probably a 512 bytes sector? Could be less than a page.) In any case, the main overhead of the current code should be the amount of read syscalls it triggers, with the buffer cache compensating somewhat for our tiny read requests.