Download raw body.
reuse deltas while packing
Quoth Stefan Sperling <stsp@stsp.name>:
> On Tue, Feb 08, 2022 at 05:38:22PM -0500, ori@eigenstate.org wrote:
> > Quoth Stefan Sperling <stsp@stsp.name>:
> > >
> > > However, the deltification algorithms implemented by Git and Got are not
> > > the same. It is possible that a significant difference will always remain
> > > unless we rewrite code inherited from git9 and use a different approach.
> > >
> >
> > Are there any objects that it performs particularly
> > poorly on? I remember measuring, and it wasn't worse
> > by a huge margin (about 10% in my testing).
> >
> > I'd be happy to look and improve the algorithm.
> >
>
> There is one change we made relative to git9 that could be relevant.
> We only try 3 objects back as delta bases whereas the original code tried
> the 10 objects back. This was done to speed up packing without delta-reuse,
> and it did grow our pack files a bit. Relevant discussion with some people
> collecting data points was on IRC and is probably lost by now.
> https://git.gameoftrees.org/gitweb/?p=got.git;a=commit;h=4f4d853e5a672ea469a2532774867305712b418e
>
> I could do a full pack run on the openbsd src repo and log the time it
> takes to deltify each object. That should give us a list of potential
> edge cases. Would that help?
>
> I would not be surprised if some edge cases could be triggered with
> files beneath sys/dev/pci/drm/amd/include/asic_reg/ because these files
> are very slow to unpack during 'got checkout' and have already triggered
> various bugs in our handling of deltas while reading packs.
>
Poked around this a bit:
$ git clone https://github.com/freebsd/freebsd-src.git
Cloning into 'freebsd-src'...
remote: Enumerating objects: 4626509, done.
remote: Counting objects: 100% (10/10), done.
remote: Compressing objects: 100% (5/5), done.
remote: Total 4626509 (delta 5), reused 10 (delta 5), pack-reused 4626499
$ du -sh .git/objects/pack/
2.2G .git/objects/pack/
$ git repack -dFA
Enumerating objects: 4626509, done.
Counting objects: 100% (4626509/4626509), done.
Compressing objects: 100% (4556343/4556343), done.
Writing objects: 100% (4626509/4626509), done.
Total 4626509 (delta 3206504), reused 0 (delta 0), pack-reused 0
$ du -sh .git/objects/pack/
1.6G .git/objects/pack/
Meanwhile with git9, but starting with a repo cloned with
torvalds git (since we pick a different set of commits):
% dircp /mnt/term/tmp/freebsd-src freebsd-src
% du -sh .git/objects/pack
2.32957G .git/objects/pack
% git/repack
% du -sh .git/objects/pack
2.43166G .git/objects/pack
So, for this repo, Torvalds git is better than us by a larger
margin than expected, but a smaller margin than observed with
Got. I'll look into improving the delta search.
Also: github is doing something different from git, and it
seems to be pretty close to to what I'm doing.
reuse deltas while packing