"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
"Omar Polo" <op@omarpolo.com>
Subject:
Re: Issue with gotd and cloning large repos
To:
Stefan Sperling <stsp@stsp.name>
Cc:
jrmu@ircnow.org, gameoftrees@openbsd.org
Date:
Mon, 21 Jul 2025 15:32:09 +0200

Download raw body.

Thread
Stefan Sperling <stsp@stsp.name> wrote:
> On Fri, Jul 18, 2025 at 09:50:37AM -0700, jrmu@ircnow.org wrote:
> [..]
> 
> The problem is that we are reusing a temporary file without seeking
> back to the beginning of the file. As a result, the file gets extended,
> rather than rewritten, and will contain data from multiple objects
> rather than just the object we want to open. Therefore, the file size
> on disk reported by stat(2) does not match what is expected, which gets
> reported as "raw object has unexpected size".
> 
> This only triggers on large files because smaller files use a different
> code path which keeps all the data in memory, avoiding temp files entirely.
> 
> I haven't been able to trigger this specific problem by writing a regression
> test. Can anyone else manage to do that?
> 
> I ended up finding another bug with the regression test I wrote, however.
> More on that soon.

oouch

> M  lib/repository.c  |  2+  0-
> 
> 1 file changed, 2 insertions(+), 0 deletions(-)
> 
> commit - 529f16d393fbab504621b40449967a0a33f4041c
> commit + 5344003c15642e6213a5e80e2d1359a9d1c564bf
> blob - 8d2c4a28815971860618446c5701073d301b1ee6
> blob + 175996e19e6d5ed9c58c09fb0722aef4dab5cd75
> --- lib/repository.c
> +++ lib/repository.c
> @@ -399,6 +399,8 @@ got_repo_temp_fds_get(int *fd, int *idx, struct got_re
>  		if (repo->tempfiles[i] != -1) {
>  			if (ftruncate(repo->tempfiles[i], 0L) == -1)
>  				return got_error_from_errno("ftruncate");
> +			if (lseek(repo->tempfiles[i], 0L, SEEK_SET) == -1)
> +				return got_error_from_errno("lseek");
>  			*fd = repo->tempfiles[i];
>  			*idx = i;
>  			repo->tempfile_use_mask |= (1 << i);

ok op@