"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Omar Polo <op@omarpolo.com>
Subject:
Re: attempt at speeding up the deltification
To:
Stefan Sperling <stsp@stsp.name>
Cc:
gameoftrees@openbsd.org
Date:
Mon, 26 Feb 2024 13:09:47 +0100

Download raw body.

Thread
On 2024/02/26 12:28:01 +0100, Stefan Sperling <stsp@stsp.name> wrote:
> On Mon, Feb 26, 2024 at 11:51:26AM +0100, Omar Polo wrote:
> > Here's the updated diff for got.
> > 
> > I did some benchmarking on 'gotadmin pack -a' with hyperfine, on the
> > repo that once prompted me to bump the resize step from 64 to 256.  In
> > practice there I can some small improvements (around 0.5s), which I'm
> > not sure warrants the added code.
> > 
> > I also did some profiling on a dummy repository with some files off
> > /dev/random in there -- 1m, 10m, 20m and 50m -- and the performance gain
> > is much noticeable, in the x10 range, but I'm not sure it's a good idea
> > to optimize for an edge cases.
> 
> I guess the tradeoff is worth it.  ok stsp@
> 
> Large binaries are an edge case in Git for most people.
> But when we run into one and take minutes for processing that will
> always be very annoying. In some cases excising such files from
> history is very difficult (e.g. when related systems like issue
> trackers or CI depend on the hashes not changing).
> 
> Did you already try with large zip/tar.gz files?

Not yet, but here's a simple test:

% ls -lah /tmp/a
total 102
drwxr-xr-x   2 op    wheel   512B Feb 26 12:47 ./
drwxrwxrwt  19 root  wheel   1.5K Feb 26 12:56 ../
-rw-r--r--   1 op    wheel  46.0M Oct  8 16:55 ports.tar.gz
-rw-r--r--   1 op    wheel  55.3M Oct 10 19:43 sys.tar.gz
% gotadmin init /tmp/a.git
% cd /tmp/a.git
% got import -m 'import' /tmp/a
A  /tmp/a/ports.tar.gz
A  /tmp/a/sys.tar.gz
Created branch refs/heads/main with commit ce20fe3973f914e44b31a02a459fa1401f741c04
% time git repack -a -d
Enumerating objects: 4, done.
Counting objects: 100% (4/4), done.
Delta compression using up to 4 threads
Compressing objects: 100% (4/4), done.
Writing objects: 100% (4/4), done.
Building bitmaps: 100% (1/1), done.
Total 4 (delta 0), reused 0 (delta 0), pack-reused 0
    0m14.66s real     0m12.45s user     0m02.16s system
% time gotadmin pack -a -q
    0m16.60s real     0m10.52s user     0m06.02s system
% time git repack -a -d -q
    0m01.88s real     0m00.89s user     0m00.93s system
% time /usr/local/bin/gotadmin pack -a
1 commit colored; 4 objects found; 1 tree scanned
packing 2 references; 4 objects; deltify: 100%; writing pack: 96.9M 100%
Wrote f4f358badac15e06da30f60577951ff4f7d92753.pack
96.9M packed; indexing 100%
Indexed f4f358badac15e06da30f60577951ff4f7d92753.pack
    3m34.47s real     3m21.91s user     0m12.45s system

I did a first and an intermediate 'git repack' to make sure gotadmin was
always starting from one packfile alone.