From: Stefan Sperling Subject: Re: reuse deltas while packing To: ori@eigenstate.org Cc: naddy@mips.inka.de, gameoftrees@openbsd.org Date: Wed, 9 Feb 2022 00:31:10 +0100 On Tue, Feb 08, 2022 at 05:38:22PM -0500, ori@eigenstate.org wrote: > Quoth Stefan Sperling : > > > > However, the deltification algorithms implemented by Git and Got are not > > the same. It is possible that a significant difference will always remain > > unless we rewrite code inherited from git9 and use a different approach. > > > > Are there any objects that it performs particularly > poorly on? I remember measuring, and it wasn't worse > by a huge margin (about 10% in my testing). > > I'd be happy to look and improve the algorithm. > There is one change we made relative to git9 that could be relevant. We only try 3 objects back as delta bases whereas the original code tried the 10 objects back. This was done to speed up packing without delta-reuse, and it did grow our pack files a bit. Relevant discussion with some people collecting data points was on IRC and is probably lost by now. https://git.gameoftrees.org/gitweb/?p=got.git;a=commit;h=4f4d853e5a672ea469a2532774867305712b418e I could do a full pack run on the openbsd src repo and log the time it takes to deltify each object. That should give us a list of potential edge cases. Would that help? I would not be surprised if some edge cases could be triggered with files beneath sys/dev/pci/drm/amd/include/asic_reg/ because these files are very slow to unpack during 'got checkout' and have already triggered various bugs in our handling of deltas while reading packs.