"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Stefan Sperling <stsp@stsp.name>
Subject:
Re: gotadmin pack/cleanup leave loose objects
To:
Christian Weisgerber <naddy@mips.inka.de>
Cc:
gameoftrees@openbsd.org
Date:
Fri, 11 Feb 2022 02:13:26 +0100

Download raw body.

Thread
On Thu, Feb 10, 2022 at 11:34:56PM +0100, Christian Weisgerber wrote:
> Stefan Sperling:
> 
> > There is a safety margin based on timestamps which leaves
> > recently created loose objects alone.
> > 
> > The -a option for 'gotadmin cleanup' disables this behaviour:
> 
> Let's try this again with a more manageable repository.  I have
> created an OpenBSD src.git composed entirely of loose objects:
> 
> $ gotadmin info
> repository: /usr/obj/test.git
> pack files: 0
> loose objects: 2077090
> loose total size: 6.9G
> $ git fsck --unreachable
> Checking object directories: 100% (256/256), done.
> Checking connectivity: 2077090, done.
> Verifying commits in commit graph: 100% (128582/128582), done.
> $ got ref -l
> HEAD: refs/heads/master
> refs/heads/master: be07b65e94b18cfa88e86233b6ca62200b90655b
> 
> I would expect pack -a to pack all loose objects and a subsequent
> cleanup to remove all loose objects.

This expectation is not necessarily valid, see below.

> Instead this happens:
> 
> $ gotadmin pack -a  
> packing 2 references; 2077090 objects; deltify: 100%; writing pack:    2.5G 100%
> Wrote 9c403c39ab9a3cc6756ce1b4e9e8882764117ff4.pack
>    2.5G packed; indexing 100%; resolving deltas 100%
> Indexed 9c403c39ab9a3cc6756ce1b4e9e8882764117ff4.pack
> $ gotadmin cl                                    
> 2077090 loose objects; 218189 commits scanned; 894806 objects purged
> loose total size before: 6.9G
> loose total size after: 3.6G
> disk space freed: 3.3G
> loose objects also found in pack files: 2077090
> $ gotadmin info     
> repository: /usr/obj/test.git
> pack files: 1
> packed objects: 2077090
> packed total size: 2.5G
> loose objects: 1182284
> loose total size: 3.6G
> 
> What am I missing here?

Only gotadmin cl -a is expected to definitely remove them.

By default, gotadmin cl keeps loose objects if their modification time
lies within 10 minutes before the youngest reference in the repository
was created. This is the "implementation-defined modification timestamp"
mentioned in the man page.
Depending on how you prepared your test repository a lot of objects
could have a timestamp within this window.

Only use cleanup -a while no other command is running on the repository.
If -a was the default behaviour, commits made to the repository could end
up with incomplete data; if gotadmin cleanup was run at the same time it
could remove loose objects that belong to commits that are being created.