From: Omar Polo Subject: Re: better packfiles cleanup To: Stefan Sperling Cc: gameoftrees@openbsd.org Date: Tue, 11 Jul 2023 16:27:30 +0200 On 2023/07/11 16:18:15 +0200, Stefan Sperling wrote: > On Tue, Jul 11, 2023 at 03:35:06PM +0200, Omar Polo wrote: > > The gotadmin cleanup algorithm for pack files is inadequate because it > > only works in terms of unique objects contained in pack files, not > > about reachability. > > > > Let's say you have a pack file which holds objects for a few temporary > > branches that are long gone. Those objects are unreachable, but since > > the objects in that pack file are not present in any other pack (are > > 'unique') then gotadmin cleanup won't delete that pack. > > > > Here's an attempt at reworking this by taking into consideration if > > the objects are also reachable or not. I've refactored a bit how we > > cleanup since now I need to walk the objects unconditionally and > > there's some churn due to swapping the order of the arguments in the > > reporting function. > > > > cleanup.sh contains also some changes to address previous mistakes. > > cmp(1) returns nonzero when the files differ and since the output was > > changed we can't grep it like that anymore, so use -q and check the > > number of pack files later with gotadmin info. > > > > one more thing that should be done but will be addressed as follow-up > > is to avoid removing pack files that are 'too new', much like we do > > with loose object, and for the same reason. > > > > ok? > > Ok. And I agree that we should also be looking at timestamps of > pack files before deleting them. > > This diff is indeed a bit hard to review because of the refactoring > changes you mentioned. It would have been easier to review if the > swapping of arguments and such had been left out for later. sorry. I have a few intermediary commits that i thought i could use to split and simplify the review but I wasn't very careful... ^^" > If I understood correctly then the core change which changes pack > deletion heuristic is this part? exactly, that's the key to handle pack files with unreachable objects. > > @@ -1352,6 +1276,9 @@ pack_is_redundant(int *redundant, struct got_repositor > > if (got_object_idset_contains(idset, &id)) > > continue; > > > > + if (!got_object_idset_contains(traversed_ids, &id)) > > + continue; > > + > > *redundant = 0; > > err = got_object_idset_add(idset, &id, NULL); > > if (err)