"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Omar Polo <op@omarpolo.com>
Subject:
Re: better packfiles cleanup
To:
Stefan Sperling <stsp@stsp.name>
Cc:
gameoftrees@openbsd.org
Date:
Tue, 11 Jul 2023 16:27:30 +0200

Download raw body.

Thread
On 2023/07/11 16:18:15 +0200, Stefan Sperling <stsp@stsp.name> wrote:
> On Tue, Jul 11, 2023 at 03:35:06PM +0200, Omar Polo wrote:
> > The gotadmin cleanup algorithm for pack files is inadequate because it
> > only works in terms of unique objects contained in pack files, not
> > about reachability.
> > 
> > Let's say you have a pack file which holds objects for a few temporary
> > branches that are long gone.  Those objects are unreachable, but since
> > the objects in that pack file are not present in any other pack (are
> > 'unique') then gotadmin cleanup won't delete that pack.
> > 
> > Here's an attempt at reworking this by taking into consideration if
> > the objects are also reachable or not.  I've refactored a bit how we
> > cleanup since now I need to walk the objects unconditionally and
> > there's some churn due to swapping the order of the arguments in the
> > reporting function.
> > 
> > cleanup.sh contains also some changes to address previous mistakes.
> > cmp(1) returns nonzero when the files differ and since the output was
> > changed we can't grep it like that anymore, so use -q and check the
> > number of pack files later with gotadmin info.
> > 
> > one more thing that should be done but will be addressed as follow-up
> > is to avoid removing pack files that are 'too new', much like we do
> > with loose object, and for the same reason.
> > 
> > ok?
> 
> Ok. And I agree that we should also be looking at timestamps of
> pack files before deleting them.
> 
> This diff is indeed a bit hard to review because of the refactoring
> changes you mentioned. It would have been easier to review if the
> swapping of arguments and such had been left out for later.

sorry.  I have a few intermediary commits that i thought i could use
to split and simplify the review but I wasn't very careful... ^^"

> If I understood correctly then the core change which changes pack
> deletion heuristic is this part?

exactly, that's the key to handle pack files with unreachable objects.

> > @@ -1352,6 +1276,9 @@ pack_is_redundant(int *redundant, struct got_repositor
> >  		if (got_object_idset_contains(idset, &id))
> >  			continue;
> >  
> > +		if (!got_object_idset_contains(traversed_ids, &id))
> > +			continue;
> > +
> >  		*redundant = 0;
> >  		err = got_object_idset_add(idset, &id, NULL);
> >  		if (err)