From: Omar Polo <op@omarpolo.com>
Subject: bump deltify table increase step
To: gameoftrees@openbsd.org
Date: Fri, 28 Jul 2023 10:40:48 +0200

we're pretty slow at deltifying things.  it's OK, git itself is bad
with big files, but we can be too slow with moderately-sized files.  I
noticed this when a couple of days ago I hit the gotd timeout trying
to push some changes on a private repository because got was taking
too long to deltify ~100mb of data splitted mostly in 3-4 files.

(The good news is that gotd timeout works!)

Most of this slowness comes from resizing the table.  Performance
degrades when the table is mostly filled, and by incrementing its size
by 64 slot every time we end up reallocating it quite a lot for big
files.

There's also the memory usage to keep low though, as this code is also
used in gotd, we don't want the table to become 'too big' by
e.g. doubling its size at every resize.  Note however that there's
nothing preventing the table to grow arbitrarly large, it just takes
forever and that we don't have cancellation in place in this code path.

I've set up a repository to do some tests: it's a standalone binary
that wraps the deltify code: <https://codeberg.org/op/deltify/>.

The main branch is the exact same code we have in got, the branch
`256' bumps the resize step from 64 to 256.

Just as-is the improvements are noticeable; on my hw deltifying 20M of
/dev/random takes around 8 seconds, when before it was at 75 seconds.

Nothing stops us from using an even bigger step size (with 512
deltifying 20M takes ~2.5 seconds), but since we're already pretty
conservative in the amount of memory used and I'm one of the first to
have hit this I decided to stay on the low side.

The memory usage doesn't grow considerably: the entries of the table
are 24 bytes long on amd64 (two off_t and one uint32_t, plus padding),
so in the worst case we end up having allocate 24 * 256 = ~6k of
memory.

Future work on this front includes adding cancellation support.

thoughts?

-----------------------------------------------
commit c96c4f62e57c75a03c3594fd3a70ab41c1200e76 (256, origin/256)
from: Omar Polo <op@omarpolo.com>
date: Fri Jul 28 08:11:58 2023 UTC
 
 bump the increment size to 256
 
diff 3931424de03023f577cc02f637c60e65ca80ffd3 c96c4f62e57c75a03c3594fd3a70ab41c1200e76
commit - 3931424de03023f577cc02f637c60e65ca80ffd3
commit + c96c4f62e57c75a03c3594fd3a70ab41c1200e76
blob - 904fe2806d161dfaee79db5f0f3c73e691d3f4dc
blob + 99b5d418b8cec83d130d52be4e5022d0310f8c31
--- lib/deltify.c
+++ lib/deltify.c
@@ -140,18 +140,18 @@ addblk(struct got_delta_table *dt, FILE *f, off_t file
 	dt->blocks[i].offset = offset;
 	dt->blocks[i].hash = h;
 	dt->nblocks++;
-	if (dt->nalloc < dt->nblocks + 64) {
+	if (dt->nalloc < dt->nblocks + 256) {
 		struct got_delta_block *db;
 		size_t old_size = dt->nalloc;
 		db = dt->blocks;
-		dt->blocks = calloc(dt->nalloc + 64,
+		dt->blocks = calloc(dt->nalloc + 256,
 		    sizeof(struct got_delta_block));
 		if (dt->blocks == NULL) {
 			err = got_error_from_errno("calloc");
 			dt->blocks = db;
 			return err;
 		}
-		dt->nalloc += 64;
+		dt->nalloc += 256;
 		/*
 		 * Recompute all block positions. Hash-based indices of blocks
 		 * in the array depend on the allocated length of the array.
@@ -204,18 +204,18 @@ addblk_mem(struct got_delta_table *dt, uint8_t *data, 
 	dt->blocks[i].offset = offset;
 	dt->blocks[i].hash = h;
 	dt->nblocks++;
-	if (dt->nalloc < dt->nblocks + 64) {
+	if (dt->nalloc < dt->nblocks + 256) {
 		struct got_delta_block *db;
 		size_t old_size = dt->nalloc;
 		db = dt->blocks;
-		dt->blocks = calloc(dt->nalloc + 64,
+		dt->blocks = calloc(dt->nalloc + 256,
 		    sizeof(struct got_delta_block));
 		if (dt->blocks == NULL) {
 			err = got_error_from_errno("calloc");
 			dt->blocks = db;
 			return err;
 		}
-		dt->nalloc += 64;
+		dt->nalloc += 256;
 		/*
 		 * Recompute all block positions. Hash-based indices of blocks
 		 * in the array depend on the allocated length of the array.