From: Mark Jamsek Subject: Re: optimise reading the file index To: Stefan Sperling Cc: "Todd C. Miller" , Christian Weisgerber , gameoftrees@openbsd.org Date: Fri, 28 Jul 2023 15:14:43 +1000 Stefan Sperling wrote: > On Thu, Jul 27, 2023 at 09:07:13AM -0600, Todd C. Miller wrote: > > On Thu, 27 Jul 2023 15:19:47 +0200, Christian Weisgerber wrote: > > > > > The current code uses fread(3), i.e., buffered stdio, which translates > > > to a long series of 16 kB read(2) system calls. > > > > stdio uses the "optimal" buffer size reported for the device, > > st_blksize from struct stat. The buffer size can be modified via > > a call to setvbuf(). The defaults were created when memory was > > more dear so you may be able to improve things by just bumping the > > buffer size. > > > > > For a got status of /usr/src, 786 read(2) calls look rather negligible > > > compared to 92,300+ stat(2) calls. > > > > Yikes, that is a lot. > > It seems to be about one stat call per file which can't get much better? > > $ find /usr/src | wc -l > 99998 > > In any case, this fileindex optimization isn't about making 'got status' > run faster. We noticed a visible delay when starting up tog, where the > newly added base-commit marker takes a short time to appear on some systems. > I determined that the time was spent reading the file index, which prompted > Mark to write this patch. Yes, that's likely my fault: the 'got status' remark in my OP became an unintended red herring but it was used to emphasise the point that operations which depend on got_fileindex_read() calls such as the new base commit marker in tog and the got branch -l change are already far too quick on this machine because even 'got status' is fast on it. Which is why I wanted wider testing besides what I could measure here. As op taught me on irc, 'got info ' is good for this case, and while we're seeing some improvement there, it doesn't seem to be showing any improvement in the time it takes to draw the base commit marker in tog. However, it is also making the new 'got branch -l' about 30% faster and that takes the base commit of each file in the index into account the same way tog's new base commit marker does, so there could be some other reason in tog for the delay. -- Mark Jamsek GPG: F2FF 13DE 6A06 C471 CA80 E6E2 2930 DC66 86EE CF68