From: Stefan Sperling Subject: Re: gotadmin pack/cleanup leave loose objects To: Christian Weisgerber Cc: gameoftrees@openbsd.org Date: Fri, 11 Feb 2022 02:13:26 +0100 On Thu, Feb 10, 2022 at 11:34:56PM +0100, Christian Weisgerber wrote: > Stefan Sperling: > > > There is a safety margin based on timestamps which leaves > > recently created loose objects alone. > > > > The -a option for 'gotadmin cleanup' disables this behaviour: > > Let's try this again with a more manageable repository. I have > created an OpenBSD src.git composed entirely of loose objects: > > $ gotadmin info > repository: /usr/obj/test.git > pack files: 0 > loose objects: 2077090 > loose total size: 6.9G > $ git fsck --unreachable > Checking object directories: 100% (256/256), done. > Checking connectivity: 2077090, done. > Verifying commits in commit graph: 100% (128582/128582), done. > $ got ref -l > HEAD: refs/heads/master > refs/heads/master: be07b65e94b18cfa88e86233b6ca62200b90655b > > I would expect pack -a to pack all loose objects and a subsequent > cleanup to remove all loose objects. This expectation is not necessarily valid, see below. > Instead this happens: > > $ gotadmin pack -a > packing 2 references; 2077090 objects; deltify: 100%; writing pack: 2.5G 100% > Wrote 9c403c39ab9a3cc6756ce1b4e9e8882764117ff4.pack > 2.5G packed; indexing 100%; resolving deltas 100% > Indexed 9c403c39ab9a3cc6756ce1b4e9e8882764117ff4.pack > $ gotadmin cl > 2077090 loose objects; 218189 commits scanned; 894806 objects purged > loose total size before: 6.9G > loose total size after: 3.6G > disk space freed: 3.3G > loose objects also found in pack files: 2077090 > $ gotadmin info > repository: /usr/obj/test.git > pack files: 1 > packed objects: 2077090 > packed total size: 2.5G > loose objects: 1182284 > loose total size: 3.6G > > What am I missing here? Only gotadmin cl -a is expected to definitely remove them. By default, gotadmin cl keeps loose objects if their modification time lies within 10 minutes before the youngest reference in the repository was created. This is the "implementation-defined modification timestamp" mentioned in the man page. Depending on how you prepared your test repository a lot of objects could have a timestamp within this window. Only use cleanup -a while no other command is running on the repository. If -a was the default behaviour, commits made to the repository could end up with incomplete data; if gotadmin cleanup was run at the same time it could remove loose objects that belong to commits that are being created.