"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Stefan Sperling <stsp@stsp.name>
Subject:
Re: cleanup: always remove redundant packs
To:
Omar Polo <op@omarpolo.com>
Cc:
gameoftrees@openbsd.org
Date:
Sun, 22 Oct 2023 21:14:27 +0200

Download raw body.

Thread
On Sun, Oct 22, 2023 at 08:40:13PM +0200, Omar Polo wrote:
> stsp reported that `gotadmin pack -a && gotadmin cleanup' wasn't working
> in one case since rsync was constantly messing with mtime.
> 
> looking closely, I don't think we have to treat "recentish" pack files
> in a special way.  If they're redundant, they can go, regardless of when
> they were created.
> 
> I've taken a quick look at git' builtin/repack.c and it doesn't seem to
> care about the {c,m}time of pack files.  From another quick look, gotd
> should be fine too.
> 
> I probably got over-cautious and copied the loose object time check,
> with for packs is not needed.  If a pack is redundant, all of its
> contents are already available in the repository so there's nothing to
> loose by removing it.  Loose object instead may be installed and only
> later being pointed by a reference, so we still need the creation check.
> 
> ok?

traversed_ids is shared between repo_purge_unreferenced_loose_objects()
and repo_purge_redundant_packfiles(). And loose objects are traversed first.
Which probably means that a pack file composed of objects that all exist as
loose objects will be considered redundant and deleted.

I suspect without the timestamp check creating a pack file and running
cleanup concurrently can now result in a race where the newly created
pack file is deleted. I would suggest to use separate travered_ids sets
for loose objects and for pack files to avoid this issue.