Download raw body.
Reuse of packed/loose objects?
On Mon, Mar 17, 2025 at 03:08:26PM +0100, Christian Weisgerber wrote: > There's something I don't understand. > > I have an OpenBSD src repository with an llvm19 branch. Say I run > gotadmin cleanup, after which everything is in a single pack file. > Then I fetch new upstream commits and rebase the llvm19 branch on > master. Now I have some 15,000 loose objects. Okay. Later, I > fetch some more upstream commits, rebase llvm19 again, and have... > a few more, but still some 15,000 loose objects. > > So it seems loose objects are reused--otherwise every rebase would > add another 15,000--but packed ones aren't. Is this expected? What is happening is that rebase uses the 'got commit' implementation to create the rebased commits. This implementation writes loose objects based on files and directories found in the work tree into the repository and it doesn't first check whether an object is already present in the repository. When you rebase again and still have those loose objects on disk, they will simply be overwritten if their hash didn't change. These loose objects will eventually be removed by cleaning up the repository, provided they are 10 minutes older than the youngest reference created in the repository, based that ref's filesystem mtime. We keep rewriting the ref and the loose objects every time you rebase, bumping their mtimes. So they won't expire until you stop rebasing this branch for a while and commit to other branches and then decide to run cleanup. In theory we could avoid rewriting existing loose objects on disk, trusting that they already contain the correct contents. But considering that a previous commit could have left a partially written loose object file on disk, I wouldn't blindly trust its contents. I suppose that writing the entire file again is no slower than trying to figure out whether the existing file contents are valid. However, we could try teaching the commit implementation to avoid creating loose objects if a packed equivalent exists. I cannot tell whether this would speed up commit and rebase in cases like yours, but it would avoid cluttering the repository with 15.000 loose objects. We can be reasonably confident that a successfully indexed pack file contains the expected file contents. We shouldn't only be relying on the pack index, though. We need to actually open the packed object and close it again to ensure that the object is readable. Opening and closing implicitly verifies the stored content's checksum, too.
Reuse of packed/loose objects?