"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Stefan Sperling <stsp@stsp.name>
Subject:
Re: Issue with gotd and cloning large repos
To:
jrmu@ircnow.org
Cc:
gameoftrees@openbsd.org
Date:
Mon, 21 Jul 2025 14:54:21 +0200

Download raw body.

Thread
On Fri, Jul 18, 2025 at 09:50:37AM -0700, jrmu@ircnow.org wrote:
> Greetings,
> 
> > If you can make the bare repo available for download somewhere,
> > e.g. as a tar archive, I would take a look. There seems to be a
> > bug in delta generation which is being triggered. It would be difficult
> > to reproduce this issue without a copy of the repository in question.
> 
> I have put the .tgz of the git repo at
> https://ircnow.org/software/almanack.tgz
> 
> Thanks for helping to take a look.

The problem is that we are reusing a temporary file without seeking
back to the beginning of the file. As a result, the file gets extended,
rather than rewritten, and will contain data from multiple objects
rather than just the object we want to open. Therefore, the file size
on disk reported by stat(2) does not match what is expected, which gets
reported as "raw object has unexpected size".

This only triggers on large files because smaller files use a different
code path which keeps all the data in memory, avoiding temp files entirely.

I haven't been able to trigger this specific problem by writing a regression
test. Can anyone else manage to do that?

I ended up finding another bug with the regression test I wrote, however.
More on that soon.

M  lib/repository.c  |  2+  0-

1 file changed, 2 insertions(+), 0 deletions(-)

commit - 529f16d393fbab504621b40449967a0a33f4041c
commit + 5344003c15642e6213a5e80e2d1359a9d1c564bf
blob - 8d2c4a28815971860618446c5701073d301b1ee6
blob + 175996e19e6d5ed9c58c09fb0722aef4dab5cd75
--- lib/repository.c
+++ lib/repository.c
@@ -399,6 +399,8 @@ got_repo_temp_fds_get(int *fd, int *idx, struct got_re
 		if (repo->tempfiles[i] != -1) {
 			if (ftruncate(repo->tempfiles[i], 0L) == -1)
 				return got_error_from_errno("ftruncate");
+			if (lseek(repo->tempfiles[i], 0L, SEEK_SET) == -1)
+				return got_error_from_errno("lseek");
 			*fd = repo->tempfiles[i];
 			*idx = i;
 			repo->tempfile_use_mask |= (1 << i);