From: Omar Polo Subject: Re: attempt at speeding up the deltification To: Stefan Sperling Cc: gameoftrees@openbsd.org Date: Mon, 26 Feb 2024 13:09:47 +0100 On 2024/02/26 12:28:01 +0100, Stefan Sperling wrote: > On Mon, Feb 26, 2024 at 11:51:26AM +0100, Omar Polo wrote: > > Here's the updated diff for got. > > > > I did some benchmarking on 'gotadmin pack -a' with hyperfine, on the > > repo that once prompted me to bump the resize step from 64 to 256. In > > practice there I can some small improvements (around 0.5s), which I'm > > not sure warrants the added code. > > > > I also did some profiling on a dummy repository with some files off > > /dev/random in there -- 1m, 10m, 20m and 50m -- and the performance gain > > is much noticeable, in the x10 range, but I'm not sure it's a good idea > > to optimize for an edge cases. > > I guess the tradeoff is worth it. ok stsp@ > > Large binaries are an edge case in Git for most people. > But when we run into one and take minutes for processing that will > always be very annoying. In some cases excising such files from > history is very difficult (e.g. when related systems like issue > trackers or CI depend on the hashes not changing). > > Did you already try with large zip/tar.gz files? Not yet, but here's a simple test: % ls -lah /tmp/a total 102 drwxr-xr-x 2 op wheel 512B Feb 26 12:47 ./ drwxrwxrwt 19 root wheel 1.5K Feb 26 12:56 ../ -rw-r--r-- 1 op wheel 46.0M Oct 8 16:55 ports.tar.gz -rw-r--r-- 1 op wheel 55.3M Oct 10 19:43 sys.tar.gz % gotadmin init /tmp/a.git % cd /tmp/a.git % got import -m 'import' /tmp/a A /tmp/a/ports.tar.gz A /tmp/a/sys.tar.gz Created branch refs/heads/main with commit ce20fe3973f914e44b31a02a459fa1401f741c04 % time git repack -a -d Enumerating objects: 4, done. Counting objects: 100% (4/4), done. Delta compression using up to 4 threads Compressing objects: 100% (4/4), done. Writing objects: 100% (4/4), done. Building bitmaps: 100% (1/1), done. Total 4 (delta 0), reused 0 (delta 0), pack-reused 0 0m14.66s real 0m12.45s user 0m02.16s system % time gotadmin pack -a -q 0m16.60s real 0m10.52s user 0m06.02s system % time git repack -a -d -q 0m01.88s real 0m00.89s user 0m00.93s system % time /usr/local/bin/gotadmin pack -a 1 commit colored; 4 objects found; 1 tree scanned packing 2 references; 4 objects; deltify: 100%; writing pack: 96.9M 100% Wrote f4f358badac15e06da30f60577951ff4f7d92753.pack 96.9M packed; indexing 100% Indexed f4f358badac15e06da30f60577951ff4f7d92753.pack 3m34.47s real 3m21.91s user 0m12.45s system I did a first and an intermediate 'git repack' to make sure gotadmin was always starting from one packfile alone.