From: Stefan Sperling Subject: Re: bump deltify table increase step To: Omar Polo Cc: gameoftrees@openbsd.org Date: Fri, 28 Jul 2023 20:07:37 +0200 On Fri, Jul 28, 2023 at 10:40:48AM +0200, Omar Polo wrote: > we're pretty slow at deltifying things. it's OK, git itself is bad > with big files, but we can be too slow with moderately-sized files. I > noticed this when a couple of days ago I hit the gotd timeout trying > to push some changes on a private repository because got was taking > too long to deltify ~100mb of data splitted mostly in 3-4 files. > > (The good news is that gotd timeout works!) > > Most of this slowness comes from resizing the table. Performance > degrades when the table is mostly filled, and by incrementing its size > by 64 slot every time we end up reallocating it quite a lot for big > files. > > There's also the memory usage to keep low though, as this code is also > used in gotd, we don't want the table to become 'too big' by > e.g. doubling its size at every resize. Note however that there's > nothing preventing the table to grow arbitrarly large, it just takes > forever and that we don't have cancellation in place in this code path. > > I've set up a repository to do some tests: it's a standalone binary > that wraps the deltify code: . > > The main branch is the exact same code we have in got, the branch > `256' bumps the resize step from 64 to 256. > > Just as-is the improvements are noticeable; on my hw deltifying 20M of > /dev/random takes around 8 seconds, when before it was at 75 seconds. > > Nothing stops us from using an even bigger step size (with 512 > deltifying 20M takes ~2.5 seconds), but since we're already pretty > conservative in the amount of memory used and I'm one of the first to > have hit this I decided to stay on the low side. > > The memory usage doesn't grow considerably: the entries of the table > are 24 bytes long on amd64 (two off_t and one uint32_t, plus padding), > so in the worst case we end up having allocate 24 * 256 = ~6k of > memory. > > Future work on this front includes adding cancellation support. > > thoughts? Please put this in with my ok. We'll see if anyone runs into issues. I don't think we can make very educated guesses about the optimal value. We need to keep experimenting until we feel that we've found a threshold that works, and/or maybe change the scaling strategy eventually.