From: Mark Jamsek Subject: Re: optimise reading the file index To: Stefan Sperling Cc: gameoftrees@openbsd.org Date: Fri, 28 Jul 2023 00:09:53 +1000 Stefan Sperling wrote: > On Thu, Jul 27, 2023 at 06:04:21PM +1000, Mark Jamsek wrote: > > This diff is borne from a discussion with stsp on irc, and attempts to > > make reading the file index faster, which should improve many operations. > > > > The main change is that we memory map the file index to avoid lots of > > small reads that can be costly in large repos, for example, with many > > files. And we also reduce multiple small allocations down to one > > allocation for the entire padded path length when building each index > > entry. If memory mapping the file fails, however, we fall back to the > > file io status quo. > > > > Regress is still happy, but I haven't properly profiled this yet. Some > > trivial measurements show a marked improvement, and it appears to be > > perceptibly faster; however, my machine is already quite fast even with > > 'got status' in src.git so I'm looking for wider testing to see if this > > is indeed worth it. In theory it should be, but I'd rather have some > > good measurements. > > I am not opposed to using mmap to read the file index if that is indeed > the fastest approach. However, keep in mind that OpenBSD has no unified > buffer cache, which means we cannot mix mmap with regular read/write > operations on the same file. Probably not an issue in this case since > the file index is read completely before being modified and then written > out again. But this also suggests that mmap might not be the best approach. > > mmap is most useful for us when running a series of random access reads, > e.g. while reading pack index and pack files. If we compare the performance > benefits of your diff to an operation which reads pack files a lot, such > as gotadmin pack -a or gotadmin indexpack, between a -DGOT_PACK_NO_MMAP build > and a regular build of Got, I would expect the relative difference that mmap > makes in your proposed diff to be smaller (in theory; I haven't actually tried > this for lack of time). > > The file index is not very large. Currently about 12MB for src, 8MB for ports. > Even on memory-constrained machines we could probably read the whole file into > memory in one or a few read() calls and parse the data in a temporary malloc > buffer. I suspect this would yield higher speed than small reads via mmap. > mmap is probably faster than the current code only because with mmap we are > asking the kernel to read page-sized blocks from the file, instead of tiny > amounts like sizeof(uint32_t), sizeof(uint64_t) sized-chunks. (Not sure what > the minimun size of a read in the kernel is, probably a 512 bytes sector? > Could be less than a page.) > In any case, the main overhead of the current code should be the amount of > read syscalls it triggers, with the buffer cache compensating somewhat for > our tiny read requests. After looking at the size of the index in src.git, that was my first instinct too, but then I thought there may well be got users with much larger indexes. Assuming there is, though, I think it's safe to assume that they still wouldn't be "that" big? So, sure, it should be fine to allocate the entire file. It's a trivial change in any case. Interestingly, in my profiling it is about the same as mmap, though still faster than the status quo. Mark Jamsek: > >> The main change is that we memory map the file index to avoid lots of >> small reads that can be costly in large repos, for example, with many >> files. > >I'd think that "got status" in /usr/src is dominated by the 92,300 >fstatat() calls... > >I tried this (cd /usr/src; got st) on an APU2, which qualifies as >slow for amd64 purposes: > >0.92-current, three runs: > 0m19.97s real 0m08.82s user 0m11.16s system > 0m19.95s real 0m08.56s user 0m10.93s system > 0m19.99s real 0m08.74s user 0m10.96s system > >+ patch, three runs: > 0m19.28s real 0m08.21s user 0m10.96s system > 0m19.14s real 0m08.02s user 0m10.76s system > 0m19.33s real 0m07.72s user 0m11.42s system > >I think we need more measurements to be certain that there is any >effect at all. :-> heh! :D I'm showing an average of 30-40% faster with an -02 release build when compared to main; op is seeing about the same as me both on ssd and spinning rust. It _should_ be a little faster but how much, I guess it depends? It would be interesting if the below diff is the same on your APU2 as the memory mapped file. > >I still have a code comment: > >> +static const struct got_error * >> +mread_fileindex_path(char **path, struct got_hash *ctx, const uint8_t *map, >> + size_t mapsz, size_t *offset) >> +{ >> + const uint8_t *p, *nul; >> + size_t len, pad, pathlen; >> + >> + if (mapsz < *offset) >> + return got_error(GOT_ERR_FILEIDX_BAD); >> + >> + p = map + *offset; >> + >> + nul = memchr(p, '\0', mapsz); >> + len = nul - p; >> + pad = 8 - len % 8; >> + >> + pathlen = len + pad; >> + >> + if (mapsz < *offset + pathlen) >> + return got_error(GOT_ERR_FILEIDX_BAD); > >If the file index is corrupt, the memchr() can overrun the mapped >area and trigger a segfault. The max length should be something >like map + mapz - p. Nice catch! You are right; I've fixed it with mapsz - *offset (which is the same N). Thanks, naddy :) diffstat c2ba7aa6808a2583895c024f5c85fff03948494e 1f5993e02a319fbcf02f9dbcd2926397fc7b3462 M lib/fileindex.c | 226+ 23- M lib/got_lib_fileindex.h | 2+ 1- M lib/worktree.c | 36+ 5- M lib/worktree_cvg.c | 37+ 5- 4 files changed, 301 insertions(+), 34 deletions(-) diff c2ba7aa6808a2583895c024f5c85fff03948494e 1f5993e02a319fbcf02f9dbcd2926397fc7b3462 commit - c2ba7aa6808a2583895c024f5c85fff03948494e commit + 1f5993e02a319fbcf02f9dbcd2926397fc7b3462 blob - e989c6b7cd01f55aa6173b8bbf04441802b11fe6 blob + 1cf015d945052cca6c3b80265eed6285ce7bfa73 --- lib/fileindex.c +++ lib/fileindex.c @@ -21,6 +21,7 @@ #include #include #include +#include #include #include #include @@ -537,6 +538,187 @@ read_fileindex_val64(uint64_t *val, struct got_hash *c } static const struct got_error * +mread_fileindex_val64(uint64_t *val, struct got_hash *ctx, const uint8_t *p, + size_t psz, size_t *offset) +{ + if (psz < *offset + sizeof(*val)) + return got_error(GOT_ERR_FILEIDX_BAD); + + memcpy(val, p + *offset, sizeof(*val)); + got_hash_update(ctx, val, sizeof(*val)); + *val = be64toh(*val); + *offset += sizeof(*val); + + return NULL; +} + +static const struct got_error * +mread_fileindex_val32(uint32_t *val, struct got_hash *ctx, const uint8_t *p, + size_t psz, size_t *offset) +{ + if (psz < *offset + sizeof(*val)) + return got_error(GOT_ERR_FILEIDX_BAD); + + memcpy(val, p + *offset, sizeof(*val)); + got_hash_update(ctx, val, sizeof(*val)); + *val = be32toh(*val); + *offset += sizeof(*val); + + return NULL; +} + +static const struct got_error * +mread_fileindex_val16(uint16_t *val, struct got_hash *ctx, const uint8_t *p, + size_t psz, size_t *offset) +{ + if (psz < *offset + sizeof(*val)) + return got_error(GOT_ERR_FILEIDX_BAD); + + memcpy(val, p + *offset, sizeof(*val)); + got_hash_update(ctx, val, sizeof(*val)); + *val = be16toh(*val); + *offset += sizeof(*val); + + return NULL; +} + + +static const struct got_error * +mread_fileindex_sha1bin(uint8_t *val, struct got_hash *ctx, const uint8_t *p, + size_t psz, size_t *offset) +{ + if (psz < *offset + SHA1_DIGEST_LENGTH) + return got_error(GOT_ERR_FILEIDX_BAD); + + memcpy(val, p + *offset, SHA1_DIGEST_LENGTH); + got_hash_update(ctx, val, SHA1_DIGEST_LENGTH); + *offset += SHA1_DIGEST_LENGTH; + + return NULL; +} + +static const struct got_error * +mread_fileindex_path(char **path, struct got_hash *ctx, const uint8_t *p, + size_t psz, size_t *offset) +{ + const uint8_t *ptr, *nul; + size_t len, pad, pathlen; + + if (psz < *offset) + return got_error(GOT_ERR_FILEIDX_BAD); + + ptr = p + *offset; + + nul = memchr(ptr, '\0', psz - *offset); + len = nul - ptr; + pad = 8 - len % 8; + + pathlen = len + pad; + + if (psz < *offset + pathlen) + return got_error(GOT_ERR_FILEIDX_BAD); + + *path = malloc(pathlen); + if (*path == NULL) + return got_error_from_errno("malloc"); + + memcpy(*path, ptr, pathlen); + got_hash_update(ctx, *path, pathlen); + + *offset += pathlen; + + return NULL; +} + +static const struct got_error * +mread_fileindex_entry(struct got_fileindex_entry **iep, struct got_hash *ctx, + const uint8_t *ptr, size_t ptrsz, uint32_t version, size_t *offset) +{ + const struct got_error *err; + struct got_fileindex_entry *ie; + const uint8_t *p; + size_t n = 0; + + *iep = NULL; + + if (ptrsz < *offset) + return got_error(GOT_ERR_FILEIDX_BAD); + + p = ptr + *offset; + + ie = calloc(1, sizeof(*ie)); + if (ie == NULL) + return got_error_from_errno("calloc"); + + err = mread_fileindex_val64(&ie->ctime_sec, ctx, p, ptrsz, &n); + if (err != NULL) + goto done; + err = mread_fileindex_val64(&ie->ctime_nsec, ctx, p, ptrsz, &n); + if (err != NULL) + goto done; + err = mread_fileindex_val64(&ie->mtime_sec, ctx, p, ptrsz, &n); + if (err != NULL) + goto done; + err = mread_fileindex_val64(&ie->mtime_nsec, ctx, p, ptrsz, &n); + if (err != NULL) + goto done; + + err = mread_fileindex_val32(&ie->uid, ctx, p, ptrsz, &n); + if (err != NULL) + goto done; + err = mread_fileindex_val32(&ie->gid, ctx, p, ptrsz, &n); + if (err != NULL) + goto done; + err = mread_fileindex_val32(&ie->size, ctx, p, ptrsz, &n); + if (err != NULL) + goto done; + + err = mread_fileindex_val16(&ie->mode, ctx, p, ptrsz, &n); + if (err != NULL) + goto done; + + err = mread_fileindex_sha1bin(ie->blob_sha1, ctx, p, ptrsz, &n); + if (err != NULL) + goto done; + err = mread_fileindex_sha1bin(ie->commit_sha1, ctx, p, ptrsz, &n); + if (err != NULL) + goto done; + + err = mread_fileindex_val32(&ie->flags, ctx, p, ptrsz, &n); + if (err != NULL) + goto done; + + err = mread_fileindex_path(&ie->path, ctx, p, ptrsz, &n); + if (err != NULL) + goto done; + + if (version >= 2) { + uint32_t stage; + + stage = got_fileindex_entry_stage_get(ie); + if (stage == GOT_FILEIDX_STAGE_MODIFY || + stage == GOT_FILEIDX_STAGE_ADD) { + err = mread_fileindex_sha1bin(ie->staged_blob_sha1, + ctx, p, ptrsz, &n); + if (err != NULL) + goto done; + } + } else { + /* GOT_FILE_INDEX_VERSION 1 does not support staging. */ + ie->flags &= ~GOT_FILEIDX_F_STAGE; + } + + *offset += n; + +done: + if (err != NULL) + got_fileindex_entry_free(ie); + else + *iep = ie; + return err; +} + +static const struct got_error * read_fileindex_val64(uint64_t *val, struct got_hash *ctx, FILE *infile) { size_t n; @@ -702,7 +884,8 @@ got_fileindex_read(struct got_fileindex *fileindex, FI } const struct got_error * -got_fileindex_read(struct got_fileindex *fileindex, FILE *infile) +got_fileindex_read(struct got_fileindex *fileindex, FILE *infile, + const uint8_t *p, size_t psz) { const struct got_error *err = NULL; struct got_fileindex_hdr hdr; @@ -710,29 +893,38 @@ got_fileindex_read(struct got_fileindex *fileindex, FI struct got_fileindex_entry *ie; uint8_t sha1_expected[SHA1_DIGEST_LENGTH]; uint8_t sha1[SHA1_DIGEST_LENGTH]; - size_t n; + size_t n, offset; int i; got_hash_init(&ctx, GOT_HASH_SHA1); - n = fread(&hdr.signature, 1, sizeof(hdr.signature), infile); - if (n != sizeof(hdr.signature)) { - if (n == 0) /* EOF */ - return NULL; - return got_ferror(infile, GOT_ERR_FILEIDX_BAD); + if (p != NULL) { + offset = offsetof(struct got_fileindex_hdr, sha1); + if (psz < offset) + return got_error(GOT_ERR_FILEIDX_BAD); + + /* copy hdr up to and including nentries */ + memcpy(&hdr, p, offset); + } else { + n = fread(&hdr.signature, 1, sizeof(hdr.signature), infile); + if (n != sizeof(hdr.signature)) { + if (n == 0) /* EOF */ + return NULL; + return got_ferror(infile, GOT_ERR_FILEIDX_BAD); + } + n = fread(&hdr.version, 1, sizeof(hdr.version), infile); + if (n != sizeof(hdr.version)) { + if (n == 0) /* EOF */ + return NULL; + return got_ferror(infile, GOT_ERR_FILEIDX_BAD); + } + n = fread(&hdr.nentries, 1, sizeof(hdr.nentries), infile); + if (n != sizeof(hdr.nentries)) { + if (n == 0) /* EOF */ + return NULL; + return got_ferror(infile, GOT_ERR_FILEIDX_BAD); + } } - n = fread(&hdr.version, 1, sizeof(hdr.version), infile); - if (n != sizeof(hdr.version)) { - if (n == 0) /* EOF */ - return NULL; - return got_ferror(infile, GOT_ERR_FILEIDX_BAD); - } - n = fread(&hdr.nentries, 1, sizeof(hdr.nentries), infile); - if (n != sizeof(hdr.nentries)) { - if (n == 0) /* EOF */ - return NULL; - return got_ferror(infile, GOT_ERR_FILEIDX_BAD); - } got_hash_update(&ctx, &hdr.signature, sizeof(hdr.signature)); got_hash_update(&ctx, &hdr.version, sizeof(hdr.version)); @@ -748,7 +940,12 @@ got_fileindex_read(struct got_fileindex *fileindex, FI return got_error(GOT_ERR_FILEIDX_VER); for (i = 0; i < hdr.nentries; i++) { - err = read_fileindex_entry(&ie, &ctx, infile, hdr.version); + if (p != NULL) + err = mread_fileindex_entry(&ie, &ctx, p, psz, + hdr.version, &offset); + else + err = read_fileindex_entry(&ie, &ctx, infile, + hdr.version); if (err) return err; err = add_entry(fileindex, ie); @@ -758,9 +955,15 @@ got_fileindex_read(struct got_fileindex *fileindex, FI } } - n = fread(sha1_expected, 1, sizeof(sha1_expected), infile); - if (n != sizeof(sha1_expected)) - return got_ferror(infile, GOT_ERR_FILEIDX_BAD); + if (p != NULL) { + if (psz < offset + sizeof(sha1_expected)) + return got_error(GOT_ERR_FILEIDX_BAD); + memcpy(sha1_expected, p + offset, sizeof(sha1_expected)); + } else { + n = fread(sha1_expected, 1, sizeof(sha1_expected), infile); + if (n != sizeof(sha1_expected)) + return got_ferror(infile, GOT_ERR_FILEIDX_BAD); + } got_hash_final(&ctx, sha1); if (memcmp(sha1, sha1_expected, SHA1_DIGEST_LENGTH) != 0) return got_error(GOT_ERR_FILEIDX_CSUM); blob - 7045733009e3c274bef6f25b589a001bff3de260 blob + c2e4498c52366b3ce1d284297d23ff4006158aed --- lib/got_lib_fileindex.h +++ lib/got_lib_fileindex.h @@ -123,7 +123,8 @@ const struct got_error *got_fileindex_read(struct got_ struct got_fileindex_entry *); struct got_fileindex_entry *got_fileindex_entry_get(struct got_fileindex *, const char *, size_t); -const struct got_error *got_fileindex_read(struct got_fileindex *, FILE *); +const struct got_error *got_fileindex_read(struct got_fileindex *, FILE *, + const uint8_t *, size_t); typedef const struct got_error *(*got_fileindex_cb)(void *, struct got_fileindex_entry *); const struct got_error *got_fileindex_for_each_entry_safe( blob - 2cdafab5a89da7e31a55bc887af42b14226637d8 blob + f5ae973cdf877ffbf50ff9e4d4db224a5344ae36 --- lib/worktree.c +++ lib/worktree.c @@ -20,8 +20,8 @@ #include #include -#include #include +#include #include #include #include @@ -2508,6 +2508,8 @@ open_fileindex(struct got_fileindex **fileindex, char { const struct got_error *err = NULL; FILE *index = NULL; + struct stat st; + uint8_t *p = NULL; *fileindex_path = NULL; *fileindex = got_fileindex_alloc(); @@ -2522,12 +2524,41 @@ open_fileindex(struct got_fileindex **fileindex, char if (index == NULL) { if (errno != ENOENT) err = got_error_from_errno2("fopen", *fileindex_path); - } else { - err = got_fileindex_read(*fileindex, index); - if (fclose(index) == EOF && err == NULL) - err = got_error_from_errno("fclose"); + goto done; } + + if (fstat(fileno(index), &st) == -1) { + err = got_error_from_errno2("fstat", *fileindex_path); + goto done; + } + + /* XXX if the file is > 1 GB, fallback to file io */ + if (st.st_size > 0 && st.st_size <= 0x3fffffffL) { + ssize_t r; + + p = malloc(st.st_size); + if (p == NULL) { + err = got_error_from_errno("malloc"); + goto done; + } + + r = read(fileno(index), p, st.st_size); + if (r != st.st_size) { + if (r == -1) + err = got_error_from_errno("read"); + else + err = got_error(GOT_ERR_IO); + goto done; + } + } + + err = got_fileindex_read(*fileindex, index, p, st.st_size); + done: + if (index != NULL && fclose(index) == EOF && err == NULL) + err = got_error_from_errno("fclose"); + if (p != NULL) + free(p); if (err) { free(*fileindex_path); *fileindex_path = NULL; blob - f488cd9dcd98b68c647973cb84a54e18daefced4 blob + 0c0f118bbe3ebca17cf76bf44f243b7a83ff31b4 --- lib/worktree_cvg.c +++ lib/worktree_cvg.c @@ -20,8 +20,8 @@ #include #include -#include #include +#include #include #include #include @@ -567,6 +567,8 @@ open_fileindex(struct got_fileindex **fileindex, char { const struct got_error *err = NULL; FILE *index = NULL; + struct stat st; + uint8_t *p = NULL; *fileindex_path = NULL; *fileindex = got_fileindex_alloc(); @@ -581,12 +583,41 @@ open_fileindex(struct got_fileindex **fileindex, char if (index == NULL) { if (errno != ENOENT) err = got_error_from_errno2("fopen", *fileindex_path); - } else { - err = got_fileindex_read(*fileindex, index); - if (fclose(index) == EOF && err == NULL) - err = got_error_from_errno("fclose"); + goto done; } + + if (fstat(fileno(index), &st) == -1) { + err = got_error_from_errno2("fstat", *fileindex_path); + goto done; + } + + /* XXX if the file is > 1 GB, fallback to file io */ + if (st.st_size > 0 && st.st_size <= 0x3fffffffL) { + ssize_t r; + + p = malloc(st.st_size); + if (p == NULL) { + err = got_error_from_errno("malloc"); + goto done; + } + + r = read(fileno(index), p, st.st_size); + if (r != st.st_size) { + if (r == -1) + err = got_error_from_errno("read"); + else + err = got_error(GOT_ERR_IO); + goto done; + } + } + + err = got_fileindex_read(*fileindex, index, p, st.st_size); + done: + if (index != NULL && fclose(index) == EOF && err == NULL) + err = got_error_from_errno("fclose"); + if (p != NULL) + free(p); if (err) { free(*fileindex_path); *fileindex_path = NULL; @@ -596,6 +627,7 @@ static const struct got_error * return err; } + static const struct got_error * sync_fileindex(struct got_fileindex *fileindex, const char *fileindex_path) { -- Mark Jamsek GPG: F2FF 13DE 6A06 C471 CA80 E6E2 2930 DC66 86EE CF68