From: Stefan Sperling Subject: Re: teach gotadmin cleanup to remove redundant packfiles To: Omar Polo Cc: gameoftrees@openbsd.org Date: Sun, 18 Jun 2023 15:33:44 +0200 Generally this looks great and I am fine with you committing this. I have some suggestions below which you could follow up on if you have time. Otherwise I will get to making some tweaks eventually. On Sun, Jun 18, 2023 at 12:17:47PM +0200, Omar Polo wrote: > @@ -261,6 +268,11 @@ In order to determine the set of objects which are ref > .Cm got ref -d > may also leave unreferenced objects behind. > .Pp > +Pack files will usually become redundant as a result of creating As already noted on IRC, "creating" in the above line should be dropped. > +repacking the repository with > +.Nm > +.Cm pack Fl a . > +.Pp > @@ -1228,6 +1229,200 @@ remove_packidx(int dir_fd, const char *relpath) > } > > static const struct got_error * > +account_unlink_pack(struct got_repository *repo, const char *packidx_path, > + int dry_run, int *remove, off_t *size_before, off_t *size_after) > +{ > + static const char *ext[] = {".idx", ".pack", ".rev", ".bitmap", > + ".promisor", ".mtimes"}; > + struct stat sb; > + char *dot, path[PATH_MAX]; > + size_t i; > + > + if (strlcpy(path, packidx_path, sizeof(path)) >= > + sizeof(path)) > + return got_error(GOT_ERR_NO_SPACE); > + > + /* > + * For compatibility with Git, if a matching .keep file exist > + * don't delete the packfile. > + */ > + dot = strrchr(path, '.'); > + *dot = '\0'; > + if (strlcat(path, ".keep", sizeof(path)) >= sizeof(path)) > + return got_error(GOT_ERR_NO_SPACE); Would be nice to use got_error_fmt here to print the overlong path. Because it comes from the filesystem the value is beyond our control and it would suck if someone hit this and couldn't easily figure out which NO_SPACE error case is being hit. > + if (faccessat(got_repo_get_fd(repo), path, F_OK, 0) == 0) > + *remove = 0; > + > + for (i = 0; i < nitems(ext); ++i) { > + *dot = '\0'; > + > + if (strlcat(path, ext[i], sizeof(path)) >= > + sizeof(path)) > + return got_error(GOT_ERR_NO_SPACE); > + > + if (fstatat(got_repo_get_fd(repo), path, &sb, 0) == > + -1) { > + if (errno == ENOENT && > + strcmp(ext[i], ".pack") != 0 && > + strcmp(ext[i], ".idx") != 0) > + continue; > + return got_error_from_errno2("fstatat", path); > + } > + > + *size_before += sb.st_size; > + if (!*remove) { > + *size_after += sb.st_size; > + continue; > + } > + > + if (dry_run) > + continue; > + > + if (unlinkat(got_repo_get_fd(repo), path, 0) == -1) { Should we lock the file before deleting it, like remove_packidx() which is used by gotadmin cleanup -p? In fact, should we call remove_packidx() here of unlinking .idx directly? I see that git-repack doesn't seem to bother with such locking but it is confusing to have an "unlink_pack" function which doesn't take a lock and a "remove_packidx" function which takes a lock. > + if (errno == ENOENT && > + strcmp(ext[i], ".pack") != 0 && > + strcmp(ext[i], ".idx") != 0) What is the reason for ignoring ENOENT for .bitmap etc. but not for .idx and .pack extensions? Can't we always treat ENOENT as "this file has already been removed somehow, which is fine by me"? > + continue; > + return got_error_from_errno2("unlinkat", > + path); > + } > + } > + err = idset_add_from_packidx(&remove, repo, sorted[i].path, > + idset); It's just a matter of taste but I would call the above function something that reads more like an actual english phrase. Maybe "scan_packidx" or "pack_is_redundant"? > + if (err) > + goto done; > + err = account_unlink_pack(repo, sorted[i].path, dry_run, > + &remove, size_before, size_after); The name "account_unlink_pack" also seems a bit unclear to me. I guess it is supposed to convey that disk space usage accounting is happening in here? I would assume such meta-data accounting can be implied even if it is not mentioned in the name of the function, especially since we are passing in parameters that provide a hint at size accounting. At this point we have already determined that the pack is redundant. We would like to remove it unless in the special .keep case which can be handled internally. Could we call this function "purge_redundant_pack"?