From: Stefan Sperling Subject: Re: attempt at speeding up the deltification To: Omar Polo Cc: gameoftrees@openbsd.org Date: Sun, 25 Feb 2024 11:09:23 +0100 On Sun, Feb 25, 2024 at 02:30:28AM +0100, Omar Polo wrote: > To get an idea of the improvements, I've extracted the code for the > deltification as a standalone executable. For example, to > got_deltify_init() the alpine iso (207M) > > suzaku% m && time ./obj/d /home/op/Downloads/alpine-standard-3.19.1-x86_64.iso > array: 2250886/2251136 99% -- ht: 761991/2097152 36% > total memory: 62415872 > 0m01.81s real 0m01.20s user 0m00.62s system > > or for a 1.0G mp4 video: > > suzaku% m && time ./obj/d /home/op/Downloads/[...].mp4 > array: 9967559/9967744 99% -- ht: 4011730/8388608 47% > total memory: 272780288 > 0m09.67s real 0m06.12s user 0m03.49s system > > The "total memory" is computed by considering the total memory needed by > the array and by the table (so around 62M and 272M respectively.) IMHO > the time are more than decent now. I have no idea how much it'll take > with the current implementation, but given the trend it exibits (~3 > minutes for 100M of /dev/random), "too much" is a good estimate ;-) Could you still run a test on the same set of files using the old code please, for comparison? And show how well the new code performs on 100MB of /dev/random? If this change brings us from an order of minutes down to an order of less than 10 seconds, that's very impressive. However, comparing deltification of /dev/random to deltification of structured files could be an apple vs. oranges comparison. There is little chance of finding common blocks within /dev/random.