Download raw body.
teach gotadmin cleanup to remove redundant packfiles
On 2023/06/18 15:33:44 +0200, Stefan Sperling <stsp@stsp.name> wrote:
> > + /*
> > + * For compatibility with Git, if a matching .keep file exist
> > + * don't delete the packfile.
> > + */
> > + dot = strrchr(path, '.');
> > + *dot = '\0';
> > + if (strlcat(path, ".keep", sizeof(path)) >= sizeof(path))
> > + return got_error(GOT_ERR_NO_SPACE);
>
> Would be nice to use got_error_fmt here to print the overlong path.
> Because it comes from the filesystem the value is beyond our
> control and it would suck if someone hit this and couldn't easily
> figure out which NO_SPACE error case is being hit.
I'm not sure the return line could ever be hitted.
The path is of the form object/packs/pack-$hash.idx, then we change
the .idx with .keep and the other .something. Hash is guaranteed to
be a sha1 digest so we can't pratically overflow the buffer.
not that I'm against a more explicit error message, just sharing the
reason for my choice.
> > + if (faccessat(got_repo_get_fd(repo), path, F_OK, 0) == 0)
> > + *remove = 0;
> > +
> > + for (i = 0; i < nitems(ext); ++i) {
> > + *dot = '\0';
> > +
> > + if (strlcat(path, ext[i], sizeof(path)) >=
> > + sizeof(path))
> > + return got_error(GOT_ERR_NO_SPACE);
> > +
> > + if (fstatat(got_repo_get_fd(repo), path, &sb, 0) ==
> > + -1) {
> > + if (errno == ENOENT &&
> > + strcmp(ext[i], ".pack") != 0 &&
> > + strcmp(ext[i], ".idx") != 0)
> > + continue;
> > + return got_error_from_errno2("fstatat", path);
> > + }
> > +
> > + *size_before += sb.st_size;
> > + if (!*remove) {
> > + *size_after += sb.st_size;
> > + continue;
> > + }
> > +
> > + if (dry_run)
> > + continue;
> > +
> > + if (unlinkat(got_repo_get_fd(repo), path, 0) == -1) {
>
> Should we lock the file before deleting it, like remove_packidx() which
> is used by gotadmin cleanup -p?
> In fact, should we call remove_packidx() here of unlinking .idx directly?
>
> I see that git-repack doesn't seem to bother with such locking but it is
> confusing to have an "unlink_pack" function which doesn't take a lock
> and a "remove_packidx" function which takes a lock.
another good point. I was only thinking of the case of cleanup in
parallel, and so assumed that the (soon to be added) locking around
the operation would have been fine.
I'll send a follow-up to change this.
> > + if (errno == ENOENT &&
> > + strcmp(ext[i], ".pack") != 0 &&
> > + strcmp(ext[i], ".idx") != 0)
>
> What is the reason for ignoring ENOENT for .bitmap etc. but not for
> .idx and .pack extensions? Can't we always treat ENOENT as "this
> file has already been removed somehow, which is fine by me"?
I just wanted to be paranoic. i'll drop the explicit .pack and .idx
check (twice.)
> > + continue;
> > + return got_error_from_errno2("unlinkat",
> > + path);
> > + }
> > + }
>
> > + err = idset_add_from_packidx(&remove, repo, sorted[i].path,
> > + idset);
>
> It's just a matter of taste but I would call the above function something
> that reads more like an actual english phrase.
> Maybe "scan_packidx" or "pack_is_redundant"?
>
> > + if (err)
> > + goto done;
> > + err = account_unlink_pack(repo, sorted[i].path, dry_run,
> > + &remove, size_before, size_after);
>
> The name "account_unlink_pack" also seems a bit unclear to me.
>
> I guess it is supposed to convey that disk space usage accounting is
> happening in here? I would assume such meta-data accounting can be
> implied even if it is not mentioned in the name of the function,
> especially since we are passing in parameters that provide a hint
> at size accounting.
>
> At this point we have already determined that the pack is redundant.
> We would like to remove it unless in the special .keep case which can
> be handled internally. Could we call this function "purge_redundant_pack"?
i didn't like the names either but haven't come up with anything
better, i'll go with your suggestions.
teach gotadmin cleanup to remove redundant packfiles