Download raw body.
reuse deltas while packing
Quoth Stefan Sperling <stsp@stsp.name>: > On Tue, Feb 08, 2022 at 05:38:22PM -0500, ori@eigenstate.org wrote: > > Quoth Stefan Sperling <stsp@stsp.name>: > > > > > > However, the deltification algorithms implemented by Git and Got are not > > > the same. It is possible that a significant difference will always remain > > > unless we rewrite code inherited from git9 and use a different approach. > > > > > > > Are there any objects that it performs particularly > > poorly on? I remember measuring, and it wasn't worse > > by a huge margin (about 10% in my testing). > > > > I'd be happy to look and improve the algorithm. > > > > There is one change we made relative to git9 that could be relevant. > We only try 3 objects back as delta bases whereas the original code tried > the 10 objects back. This was done to speed up packing without delta-reuse, > and it did grow our pack files a bit. Relevant discussion with some people > collecting data points was on IRC and is probably lost by now. > https://git.gameoftrees.org/gitweb/?p=got.git;a=commit;h=4f4d853e5a672ea469a2532774867305712b418e > > I could do a full pack run on the openbsd src repo and log the time it > takes to deltify each object. That should give us a list of potential > edge cases. Would that help? > > I would not be surprised if some edge cases could be triggered with > files beneath sys/dev/pci/drm/amd/include/asic_reg/ because these files > are very slow to unpack during 'got checkout' and have already triggered > various bugs in our handling of deltas while reading packs. > Poked around this a bit: $ git clone https://github.com/freebsd/freebsd-src.git Cloning into 'freebsd-src'... remote: Enumerating objects: 4626509, done. remote: Counting objects: 100% (10/10), done. remote: Compressing objects: 100% (5/5), done. remote: Total 4626509 (delta 5), reused 10 (delta 5), pack-reused 4626499 $ du -sh .git/objects/pack/ 2.2G .git/objects/pack/ $ git repack -dFA Enumerating objects: 4626509, done. Counting objects: 100% (4626509/4626509), done. Compressing objects: 100% (4556343/4556343), done. Writing objects: 100% (4626509/4626509), done. Total 4626509 (delta 3206504), reused 0 (delta 0), pack-reused 0 $ du -sh .git/objects/pack/ 1.6G .git/objects/pack/ Meanwhile with git9, but starting with a repo cloned with torvalds git (since we pick a different set of commits): % dircp /mnt/term/tmp/freebsd-src freebsd-src % du -sh .git/objects/pack 2.32957G .git/objects/pack % git/repack % du -sh .git/objects/pack 2.43166G .git/objects/pack So, for this repo, Torvalds git is better than us by a larger margin than expected, but a smaller margin than observed with Got. I'll look into improving the delta search. Also: github is doing something different from git, and it seems to be pretty close to to what I'm doing.
reuse deltas while packing