Download raw body.
reuse deltas while packing
On Mon, Feb 07, 2022 at 12:49:56PM +0100, Christian Weisgerber wrote: > Christian Weisgerber: > > > I cloned a pristine copy of FreeBSD src.git, and I'm currently > > exploding the pack file with "git unpack-objects" into a few million > > loose objects. That should give me something to test. > > Sorry, I didn't get as far as testing the diff, since the comparison > run with the old code already suffered problems: > > * My first attempt ran out of /tmp space. 4.8G was not enough. > > * Related to that: With more /tmp space, "gotadmin pack" eventually > wrote a 4.7G pack file. The original git clone of the repository > has only a 1.4G pack file. > > * got-index-pack hung. This patch fixes got-index-pack getting stuck. Unfortunately, zlib uses an unsigned int to represent the number of input bytes available in its input buffer, instead of a size_t. Large mapped files can cause truncation during our assignment from size_t to unsigned int when we update zlib's input length counter. This results in z->avail_in being set to zero, even though plenty of bytes are available for zlib to consume. We then loop forever asking zlib to decompress another zero bytes during every iteration. In gdb, the situation looks like this: (gdb) n 252 z->avail_in = len - *consumed; (gdb) p len $29 = 4294967296 (gdb) p *consumed $30 = 0 (gdb) p z->avail_in $31 = 0 Huh ?!? :-O (gdb) p len - *consumed $33 = 4294967296 (gdb) p /x len $36 = 0x100000000 (gdb) p /x (unsigned int)len $37 = 0x0 (gdb) p /x (unsigned long)len $38 = 0x100000000 I have also spotted wrong updates of zb.outlen after growing the output buffer. This is unrelated to the problem at hand but should still be fixed. I would create a separate commit for those changes. ok? diff 0cc5a39811be39e9f2b47098ec3a38e8817ba0bc /home/stsp/src/got blob - 6a151f9c54a8c0f4fc4fea9a4c91ecc7dec71aed file + lib/deflate.c --- lib/deflate.c +++ lib/deflate.c @@ -153,7 +153,10 @@ got_deflate_read_mmap(struct got_deflate_buf *zb, uint size_t last_total_in = z->total_in; if (z->avail_in == 0) { z->next_in = map + offset + *consumed; - z->avail_in = len - *consumed; + if (len < (size_t)UINT_MAX) + z->avail_in = len - *consumed; + else + z->avail_in = UINT_MAX - *consumed; if (z->avail_in == 0) { /* EOF */ ret = deflate(z, Z_FINISH); blob - b0426141094227c79e5dd1d6e3cd0e5c2a0b28f0 file + lib/inflate.c --- lib/inflate.c +++ lib/inflate.c @@ -249,7 +249,10 @@ got_inflate_read_mmap(struct got_inflate_buf *zb, uint break; } z->next_in = map + offset + *consumed; - z->avail_in = len - *consumed; + if (len < (size_t)UINT_MAX) + z->avail_in = len - *consumed; + else + z->avail_in = UINT_MAX - *consumed; } if (zb->csum) { csum_in = z->next_in; @@ -322,7 +325,6 @@ got_inflate_to_mem(uint8_t **outbuf, size_t *outlen, if (zb.flags & GOT_INFLATE_F_HAVE_MORE) { if (outbuf == NULL) continue; - zb.outlen = (nbuf * GOT_INFLATE_BUFSIZE) - *outlen; newbuf = reallocarray(*outbuf, ++nbuf, GOT_INFLATE_BUFSIZE); if (newbuf == NULL) { @@ -334,6 +336,7 @@ got_inflate_to_mem(uint8_t **outbuf, size_t *outlen, } *outbuf = newbuf; zb.outbuf = newbuf + *outlen; + zb.outlen = (nbuf * GOT_INFLATE_BUFSIZE) - *outlen; } } while (zb.flags & GOT_INFLATE_F_HAVE_MORE); @@ -382,7 +385,6 @@ got_inflate_to_mem_fd(uint8_t **outbuf, size_t *outlen if (zb.flags & GOT_INFLATE_F_HAVE_MORE) { if (outbuf == NULL) continue; - zb.outlen = (nbuf * GOT_INFLATE_BUFSIZE) - *outlen; newbuf = reallocarray(*outbuf, ++nbuf, GOT_INFLATE_BUFSIZE); if (newbuf == NULL) { @@ -394,6 +396,7 @@ got_inflate_to_mem_fd(uint8_t **outbuf, size_t *outlen } *outbuf = newbuf; zb.outbuf = newbuf + *outlen; + zb.outlen = (nbuf * GOT_INFLATE_BUFSIZE) - *outlen; } } while (zb.flags & GOT_INFLATE_F_HAVE_MORE);
reuse deltas while packing