"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Stefan Sperling <stsp@stsp.name>
Subject:
Re: Reuse of packed/loose objects?
To:
Christian Weisgerber <naddy@mips.inka.de>
Cc:
gameoftrees@openbsd.org
Date:
Mon, 17 Mar 2025 15:48:43 +0100

Download raw body.

Thread
On Mon, Mar 17, 2025 at 03:08:26PM +0100, Christian Weisgerber wrote:
> There's something I don't understand.
> 
> I have an OpenBSD src repository with an llvm19 branch.  Say I run
> gotadmin cleanup, after which everything is in a single pack file.
> Then I fetch new upstream commits and rebase the llvm19 branch on
> master.  Now I have some 15,000 loose objects.  Okay.  Later, I
> fetch some more upstream commits, rebase llvm19 again, and have...
> a few more, but still some 15,000 loose objects.
> 
> So it seems loose objects are reused--otherwise every rebase would
> add another 15,000--but packed ones aren't.  Is this expected?

What is happening is that rebase uses the 'got commit' implementation to
create the rebased commits. This implementation writes loose objects based
on files and directories found in the work tree into the repository and it
doesn't first check whether an object is already present in the repository.

When you rebase again and still have those loose objects on disk, they will
simply be overwritten if their hash didn't change.

These loose objects will eventually be removed by cleaning up the repository,
provided they are 10 minutes older than the youngest reference created in
the repository, based that ref's filesystem mtime. We keep rewriting the ref
and the loose objects every time you rebase, bumping their mtimes. So they
won't expire until you stop rebasing this branch for a while and commit to
other branches and then decide to run cleanup.

In theory we could avoid rewriting existing loose objects on disk, trusting
that they already contain the correct contents.
But considering that a previous commit could have left a partially written
loose object file on disk, I wouldn't blindly trust its contents. I suppose
that writing the entire file again is no slower than trying to figure out
whether the existing file contents are valid.

However, we could try teaching the commit implementation to avoid creating
loose objects if a packed equivalent exists. I cannot tell whether this would
speed up commit and rebase in cases like yours, but it would avoid cluttering
the repository with 15.000 loose objects. We can be reasonably confident that
a successfully indexed pack file contains the expected file contents. We
shouldn't only be relying on the pack index, though. We need to actually open
the packed object and close it again to ensure that the object is readable.
Opening and closing implicitly verifies the stored content's checksum, too.