"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

Omar Polo <op@omarpolo.com>
Re: attempt at speeding up the deltification
Stefan Sperling <stsp@stsp.name>
Sun, 25 Feb 2024 12:02:33 +0100

Download raw body.

Stefan Sperling <stsp@stsp.name> wrote:
> On Sun, Feb 25, 2024 at 02:30:28AM +0100, Omar Polo wrote:
> > To get an idea of the improvements, I've extracted the code for the
> > deltification as a standalone executable.  For example, to
> > got_deltify_init() the alpine iso (207M)
> > 
> > suzaku% m && time ./obj/d /home/op/Downloads/alpine-standard-3.19.1-x86_64.iso
> > array: 2250886/2251136 99% -- ht: 761991/2097152 36%
> > total memory: 62415872
> >     0m01.81s real     0m01.20s user     0m00.62s system
> > 
> > or for a 1.0G mp4 video:
> > 
> > suzaku% m && time ./obj/d /home/op/Downloads/[...].mp4
> > array: 9967559/9967744 99% -- ht: 4011730/8388608 47%
> > total memory: 272780288
> >     0m09.67s real     0m06.12s user     0m03.49s system
> > 
> > The "total memory" is computed by considering the total memory needed by
> > the array and by the table (so around 62M and 272M respectively.)  IMHO
> > the time are more than decent now.  I have no idea how much it'll take
> > with the current implementation, but given the trend it exibits (~3
> > minutes for 100M of /dev/random), "too much" is a good estimate ;-)
> Could you still run a test on the same set of files using the
> old code please, for comparison?
> And show how well the new code performs on 100MB of /dev/random?
> If this change brings us from an order of minutes down to an order
> of less than 10 seconds, that's very impressive.
> However, comparing deltification of /dev/random to deltification
> of structured files could be an apple vs. oranges comparison.
> There is little chance of finding common blocks within /dev/random.

Yeah, I did my testing all with the same set of files from /dev/random,
but when composing the email I thought of using some more realistic
files I had around.

I've published the code I used for these experiments at:

By the way, gathering the numbers for before/after I noticed a bug in
the rehashing of the hash table that was causing my diff to use more
memory than needed.  I've fixed the insert in my repo, will cook an
updated diff for got later.

Here's some numbers.  With the current code (the `main' branch on my
test repo):

suzaku% ./run.sh
=> 1 mb
hash table: 3811/4224 90%; total memory: 101376
    0m00.01s real     0m00.01s user     0m00.00s system
=> 10 mb
hash table: 38102/38528 98%; total memory: 924672
    0m00.48s real     0m00.41s user     0m00.07s system
=> 20 mb
hash table: 76123/76416 99%; total memory: 1833984
    0m02.57s real     0m02.42s user     0m00.14s system
=> 50 mb
hash table: 190719/191104 99%; total memory: 4586496
    0m28.50s real     0m27.45s user     0m00.94s system
=> 100 mb
hash table: 380718/381056 99%; total memory: 9145344
    3m00.34s real     2m56.99s user     0m03.57s system

Note how the hash table is always pushed to its limits.  Even for 1mb
file, with 4224 entries the resize step of 256 is not much, so even the
resize will take a long time.

With my diff (the `ht' branch):

suzaku% ./run.sh
=> 1 mb
array: 3811/3968 96%; hash table: 3811/8192 46%; total memory: 128000
    0m00.01s real     0m00.01s user     0m00.00s system
=> 10 mb
array: 38102/38272 99%; hash table: 38102/65536 58%; total memory: 1180672
    0m00.10s real     0m00.07s user     0m00.03s system
=> 20 mb
array: 76123/76160 99%; hash table: 76123/131072 58%; total memory: 2352128
    0m00.08s real     0m00.04s user     0m00.04s system
=> 50 mb
array: 190719/190848 99%; hash table: 190719/524288 36%; total memory: 6677504
    0m00.20s real     0m00.07s user     0m00.14s system
=> 100 mb
array: 380718/380800 99%; hash table: 380718/1048576 36%; total memory: 13333504
    0m00.43s real     0m00.20s user     0m00.24s system

The array is always almost filled, which is good because if not we'd be
wasting memory, but the hash table has a better load.

These numbers were taken using the run.sh script in the repo on a laptop
plugged to the charger and otherwise idle.  I did a few runs before
taking the numbers.  I thought of doing a more rigorous benchmarking,
but since the time needed is this different, I'm not sure if it's worth
it :)

I should also point out that this is only the timing for
got_deltify_init().  I'm not looking at got_deltify() (yet?), but it
should take less time too since the hash table has a better load and is
again O(1) amortized.