attempt at speeding up the deltification

Stefan Sperling <stsp@stsp.name> wrote: > On Sun, Feb 25, 2024 at 02:30:28AM +0100, Omar Polo wrote: > > To get an idea of the improvements, I've extracted the code for the > > deltification as a standalone executable. For example, to > > got_deltify_init() the alpine iso (207M) > > > > suzaku% m && time ./obj/d /home/op/Downloads/alpine-standard-3.19.1-x86_64.iso > > array: 2250886/2251136 99% -- ht: 761991/2097152 36% > > total memory: 62415872 > > 0m01.81s real 0m01.20s user 0m00.62s system > > > > or for a 1.0G mp4 video: > > > > suzaku% m && time ./obj/d /home/op/Downloads/[...].mp4 > > array: 9967559/9967744 99% -- ht: 4011730/8388608 47% > > total memory: 272780288 > > 0m09.67s real 0m06.12s user 0m03.49s system > > > > The "total memory" is computed by considering the total memory needed by > > the array and by the table (so around 62M and 272M respectively.) IMHO > > the time are more than decent now. I have no idea how much it'll take > > with the current implementation, but given the trend it exibits (~3 > > minutes for 100M of /dev/random), "too much" is a good estimate ;-) > > Could you still run a test on the same set of files using the > old code please, for comparison? > > And show how well the new code performs on 100MB of /dev/random? > > If this change brings us from an order of minutes down to an order > of less than 10 seconds, that's very impressive. > > However, comparing deltification of /dev/random to deltification > of structured files could be an apple vs. oranges comparison. > There is little chance of finding common blocks within /dev/random. Yeah, I did my testing all with the same set of files from /dev/random, but when composing the email I thought of using some more realistic files I had around. I've published the code I used for these experiments at: <https://codeberg.org/op/deltify>. By the way, gathering the numbers for before/after I noticed a bug in the rehashing of the hash table that was causing my diff to use more memory than needed. I've fixed the insert in my repo, will cook an updated diff for got later. Here's some numbers. With the current code (the `main' branch on my test repo): suzaku% ./run.sh => 1 mb hash table: 3811/4224 90%; total memory: 101376 0m00.01s real 0m00.01s user 0m00.00s system => 10 mb hash table: 38102/38528 98%; total memory: 924672 0m00.48s real 0m00.41s user 0m00.07s system => 20 mb hash table: 76123/76416 99%; total memory: 1833984 0m02.57s real 0m02.42s user 0m00.14s system => 50 mb hash table: 190719/191104 99%; total memory: 4586496 0m28.50s real 0m27.45s user 0m00.94s system => 100 mb hash table: 380718/381056 99%; total memory: 9145344 3m00.34s real 2m56.99s user 0m03.57s system Note how the hash table is always pushed to its limits. Even for 1mb file, with 4224 entries the resize step of 256 is not much, so even the resize will take a long time. With my diff (the `ht' branch): suzaku% ./run.sh => 1 mb array: 3811/3968 96%; hash table: 3811/8192 46%; total memory: 128000 0m00.01s real 0m00.01s user 0m00.00s system => 10 mb array: 38102/38272 99%; hash table: 38102/65536 58%; total memory: 1180672 0m00.10s real 0m00.07s user 0m00.03s system => 20 mb array: 76123/76160 99%; hash table: 76123/131072 58%; total memory: 2352128 0m00.08s real 0m00.04s user 0m00.04s system => 50 mb array: 190719/190848 99%; hash table: 190719/524288 36%; total memory: 6677504 0m00.20s real 0m00.07s user 0m00.14s system => 100 mb array: 380718/380800 99%; hash table: 380718/1048576 36%; total memory: 13333504 0m00.43s real 0m00.20s user 0m00.24s system The array is always almost filled, which is good because if not we'd be wasting memory, but the hash table has a better load. These numbers were taken using the run.sh script in the repo on a laptop plugged to the charger and otherwise idle. I did a few runs before taking the numbers. I thought of doing a more rigorous benchmarking, but since the time needed is this different, I'm not sure if it's worth it :) I should also point out that this is only the timing for got_deltify_init(). I'm not looking at got_deltify() (yet?), but it should take less time too since the hash table has a better load and is again O(1) amortized.

2024-02-25 09:40 Omar Polo:
attempt at speeding up the deltification
2024-02-25 10:09 Stefan Sperling:
attempt at speeding up the deltification
- 2024-02-25 11:02 Omar Polo:
  attempt at speeding up the deltification
- - 2024-02-26 10:51 Omar Polo:
    attempt at speeding up the deltification
  - - 2024-02-26 11:28 Stefan Sperling:
      attempt at speeding up the deltification