From: Omar Polo Subject: Re: teach gotadmin cleanup to remove redundant packfiles To: Stefan Sperling Cc: gameoftrees@openbsd.org Date: Sun, 18 Jun 2023 16:00:22 +0200 On 2023/06/18 15:33:44 +0200, Stefan Sperling wrote: > > + /* > > + * For compatibility with Git, if a matching .keep file exist > > + * don't delete the packfile. > > + */ > > + dot = strrchr(path, '.'); > > + *dot = '\0'; > > + if (strlcat(path, ".keep", sizeof(path)) >= sizeof(path)) > > + return got_error(GOT_ERR_NO_SPACE); > > Would be nice to use got_error_fmt here to print the overlong path. > Because it comes from the filesystem the value is beyond our > control and it would suck if someone hit this and couldn't easily > figure out which NO_SPACE error case is being hit. I'm not sure the return line could ever be hitted. The path is of the form object/packs/pack-$hash.idx, then we change the .idx with .keep and the other .something. Hash is guaranteed to be a sha1 digest so we can't pratically overflow the buffer. not that I'm against a more explicit error message, just sharing the reason for my choice. > > + if (faccessat(got_repo_get_fd(repo), path, F_OK, 0) == 0) > > + *remove = 0; > > + > > + for (i = 0; i < nitems(ext); ++i) { > > + *dot = '\0'; > > + > > + if (strlcat(path, ext[i], sizeof(path)) >= > > + sizeof(path)) > > + return got_error(GOT_ERR_NO_SPACE); > > + > > + if (fstatat(got_repo_get_fd(repo), path, &sb, 0) == > > + -1) { > > + if (errno == ENOENT && > > + strcmp(ext[i], ".pack") != 0 && > > + strcmp(ext[i], ".idx") != 0) > > + continue; > > + return got_error_from_errno2("fstatat", path); > > + } > > + > > + *size_before += sb.st_size; > > + if (!*remove) { > > + *size_after += sb.st_size; > > + continue; > > + } > > + > > + if (dry_run) > > + continue; > > + > > + if (unlinkat(got_repo_get_fd(repo), path, 0) == -1) { > > Should we lock the file before deleting it, like remove_packidx() which > is used by gotadmin cleanup -p? > In fact, should we call remove_packidx() here of unlinking .idx directly? > > I see that git-repack doesn't seem to bother with such locking but it is > confusing to have an "unlink_pack" function which doesn't take a lock > and a "remove_packidx" function which takes a lock. another good point. I was only thinking of the case of cleanup in parallel, and so assumed that the (soon to be added) locking around the operation would have been fine. I'll send a follow-up to change this. > > + if (errno == ENOENT && > > + strcmp(ext[i], ".pack") != 0 && > > + strcmp(ext[i], ".idx") != 0) > > What is the reason for ignoring ENOENT for .bitmap etc. but not for > .idx and .pack extensions? Can't we always treat ENOENT as "this > file has already been removed somehow, which is fine by me"? I just wanted to be paranoic. i'll drop the explicit .pack and .idx check (twice.) > > + continue; > > + return got_error_from_errno2("unlinkat", > > + path); > > + } > > + } > > > + err = idset_add_from_packidx(&remove, repo, sorted[i].path, > > + idset); > > It's just a matter of taste but I would call the above function something > that reads more like an actual english phrase. > Maybe "scan_packidx" or "pack_is_redundant"? > > > + if (err) > > + goto done; > > + err = account_unlink_pack(repo, sorted[i].path, dry_run, > > + &remove, size_before, size_after); > > The name "account_unlink_pack" also seems a bit unclear to me. > > I guess it is supposed to convey that disk space usage accounting is > happening in here? I would assume such meta-data accounting can be > implied even if it is not mentioned in the name of the function, > especially since we are passing in parameters that provide a hint > at size accounting. > > At this point we have already determined that the pack is redundant. > We would like to remove it unless in the special .keep case which can > be handled internally. Could we call this function "purge_redundant_pack"? i didn't like the names either but haven't come up with anything better, i'll go with your suggestions.