Download raw body.
attempt at speeding up the deltification
On 2024/02/26 12:28:01 +0100, Stefan Sperling <stsp@stsp.name> wrote:
> On Mon, Feb 26, 2024 at 11:51:26AM +0100, Omar Polo wrote:
> > Here's the updated diff for got.
> >
> > I did some benchmarking on 'gotadmin pack -a' with hyperfine, on the
> > repo that once prompted me to bump the resize step from 64 to 256. In
> > practice there I can some small improvements (around 0.5s), which I'm
> > not sure warrants the added code.
> >
> > I also did some profiling on a dummy repository with some files off
> > /dev/random in there -- 1m, 10m, 20m and 50m -- and the performance gain
> > is much noticeable, in the x10 range, but I'm not sure it's a good idea
> > to optimize for an edge cases.
>
> I guess the tradeoff is worth it. ok stsp@
>
> Large binaries are an edge case in Git for most people.
> But when we run into one and take minutes for processing that will
> always be very annoying. In some cases excising such files from
> history is very difficult (e.g. when related systems like issue
> trackers or CI depend on the hashes not changing).
>
> Did you already try with large zip/tar.gz files?
Not yet, but here's a simple test:
% ls -lah /tmp/a
total 102
drwxr-xr-x 2 op wheel 512B Feb 26 12:47 ./
drwxrwxrwt 19 root wheel 1.5K Feb 26 12:56 ../
-rw-r--r-- 1 op wheel 46.0M Oct 8 16:55 ports.tar.gz
-rw-r--r-- 1 op wheel 55.3M Oct 10 19:43 sys.tar.gz
% gotadmin init /tmp/a.git
% cd /tmp/a.git
% got import -m 'import' /tmp/a
A /tmp/a/ports.tar.gz
A /tmp/a/sys.tar.gz
Created branch refs/heads/main with commit ce20fe3973f914e44b31a02a459fa1401f741c04
% time git repack -a -d
Enumerating objects: 4, done.
Counting objects: 100% (4/4), done.
Delta compression using up to 4 threads
Compressing objects: 100% (4/4), done.
Writing objects: 100% (4/4), done.
Building bitmaps: 100% (1/1), done.
Total 4 (delta 0), reused 0 (delta 0), pack-reused 0
0m14.66s real 0m12.45s user 0m02.16s system
% time gotadmin pack -a -q
0m16.60s real 0m10.52s user 0m06.02s system
% time git repack -a -d -q
0m01.88s real 0m00.89s user 0m00.93s system
% time /usr/local/bin/gotadmin pack -a
1 commit colored; 4 objects found; 1 tree scanned
packing 2 references; 4 objects; deltify: 100%; writing pack: 96.9M 100%
Wrote f4f358badac15e06da30f60577951ff4f7d92753.pack
96.9M packed; indexing 100%
Indexed f4f358badac15e06da30f60577951ff4f7d92753.pack
3m34.47s real 3m21.91s user 0m12.45s system
I did a first and an intermediate 'git repack' to make sure gotadmin was
always starting from one packfile alone.
attempt at speeding up the deltification