From: Omar Polo Subject: Re: gotwebd: cache repos in struct server To: Stefan Sperling Cc: gameoftrees@openbsd.org Date: Thu, 01 Sep 2022 11:15:55 +0200 On 2022/09/01 09:31:23 +0200, Stefan Sperling wrote: > On Thu, Sep 01, 2022 at 09:04:19AM +0200, Stefan Sperling wrote: > > On Wed, Aug 31, 2022 at 06:25:01PM +0200, Omar Polo wrote: > > > We `goto err' also when fcgi_printf returns -1, i.e. upon premature > > > client disconnection. (OK, also if asprintf fails, but that error is > > > not propagate up to here and it's not relevant for the cache anyway.) > > > I think it should look if error is non-NULL. > > > > > > There is another issue that I think is worth being raised: clients can > > > force errors by playing with the query string. By passing invalid > > > query parameters they can force all kinds of erros being returned here > > > (from no git directory, to no object, to no file entry...) and so > > > making us invalidate the cache. > > > > You're right, always invalidating on any error is not ideal, > > especially when it can be triggered via a bad query string. > > > > I will think some more about this, and commit the previous > > version of the diff, for now. It might be possible to fix this > > issue in the lib/ code where relevant errors are being raised. > > I suspect doing something like this would prevent the error I saw. > This will also fix similar issues in tog and perhaps even the got CLI > running in parallel to some other program which changes the Git repo. > > ok? I haven't tested it against the issue in gotwebd, but it reads fine. *tinfoil hat on* I have some concerns though. > It is a small performance hit but we have to do this for correctness. > > ----------------------------------------------- > commit 7102e47efd67a59a58b1b8bb05152ad592d354af (pack-mtime) > from: Stefan Sperling > date: Thu Sep 1 07:29:14 2022 UTC > > refresh our pack-index path list if the mtime of objects/pack has changed > > diff b5c757f5f816a8061f4879da9e68a39141148e40 7102e47efd67a59a58b1b8bb05152ad592d354af > commit - b5c757f5f816a8061f4879da9e68a39141148e40 > commit + 7102e47efd67a59a58b1b8bb05152ad592d354af > blob - 02b998107e294bc2eca6ffb21a128c5973771755 > blob + fadbfad45e1c86893452d04c192e3bf35883bebf > --- lib/got_lib_repository.h > +++ lib/got_lib_repository.h > @@ -52,6 +52,7 @@ struct got_repository { > int gitdir_fd; > > struct got_pathlist_head packidx_paths; > + time_t pack_path_mtime; > > /* The pack index cache speeds up search for packed objects. */ > struct got_packidx *packidx_cache[GOT_PACK_CACHE_SIZE]; > blob - 237c796775a73f7eac4b6b6b463b2caefc13b720 > blob + 87c7abb573744c7b7f9b38f35d7550229267b418 > --- lib/repository.c > +++ lib/repository.c > @@ -1282,6 +1282,7 @@ got_repo_list_packidx(struct got_pathlist_head *packid > struct dirent *dent; > char *path_packidx = NULL; > int packdir_fd; > + struct stat sb; > > packdir_fd = openat(got_repo_get_fd(repo), > GOT_OBJECTS_PACK_DIR, O_DIRECTORY | O_CLOEXEC); > @@ -1297,6 +1298,12 @@ got_repo_list_packidx(struct got_pathlist_head *packid > goto done; > } > > + if (fstat(packdir_fd, &sb) == -1) { > + err = got_error_from_errno("fstat"); > + goto done; > + } > + repo->pack_path_mtime = sb.st_mtime; > + > while ((dent = readdir(packdir)) != NULL) { > if (!got_repo_is_packidx_filename(dent->d_name, dent->d_namlen)) > continue; > @@ -1608,6 +1615,19 @@ got_repo_init(const char *repo_path) > return NULL; > } > > +static void > +purge_packidx_paths(struct got_pathlist_head *packidx_paths) > +{ > + struct got_pathlist_entry *pe; > + > + while (!TAILQ_EMPTY(packidx_paths)) { > + pe = TAILQ_FIRST(packidx_paths); > + TAILQ_REMOVE(packidx_paths, pe, entry); > + free((char *)pe->path); > + free(pe); > + } > +} > + > static const struct got_error * > match_packed_object(struct got_object_id **unique_id, > struct got_repository *repo, const char *id_str_prefix, int obj_type) > @@ -1615,9 +1635,21 @@ match_packed_object(struct got_object_id **unique_id, > const struct got_error *err = NULL; > struct got_object_id_queue matched_ids; > struct got_pathlist_entry *pe; > + struct stat sb; > > STAILQ_INIT(&matched_ids); I see two potential problems here: > + if (stat(got_repo_get_path_objects_pack(repo), &sb) == -1) { 1. possible TOCTOU? what if the file is changed between this check and its actual use? > + if (errno != ENOENT) > + return got_error_from_errno2("stat", > + got_repo_get_path_objects_pack(repo)); > + } else if (sb.st_mtime != repo->pack_path_mtime) { 2. is this enough? what if someone changed/replaced the file and kept the same mtime? should we care? > + purge_packidx_paths(&repo->packidx_paths); > + err = got_repo_list_packidx(&repo->packidx_paths, repo); > + if (err) > + return err; > + } > + > TAILQ_FOREACH(pe, &repo->packidx_paths, entry) { > const char *path_packidx = pe->path; > struct got_packidx *packidx; if you think that detecting a change in the data is otherwise too difficult and this is just an attempt at being helpful, I can agree and the diff would be OK for me.