"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Stefan Sperling <stsp@stsp.name>
Subject:
Re: reuse deltas while packing
To:
ori@eigenstate.org
Cc:
naddy@mips.inka.de, gameoftrees@openbsd.org
Date:
Wed, 9 Feb 2022 00:31:10 +0100

Download raw body.

Thread
On Tue, Feb 08, 2022 at 05:38:22PM -0500, ori@eigenstate.org wrote:
> Quoth Stefan Sperling <stsp@stsp.name>:
> > 
> > However, the deltification algorithms implemented by Git and Got are not
> > the same. It is possible that a significant difference will always remain
> > unless we rewrite code inherited from git9 and use a different approach.
> > 
> 
> Are there any objects that it performs particularly
> poorly on? I remember measuring, and it wasn't worse
> by a huge margin (about 10% in my testing).
> 
> I'd be happy to look and improve the algorithm.
> 

There is one change we made relative to git9 that could be relevant.
We only try 3 objects back as delta bases whereas the original code tried
the 10 objects back. This was done to speed up packing without delta-reuse,
and it did grow our pack files a bit. Relevant discussion with some people
collecting data points was on IRC and is probably lost by now.
https://git.gameoftrees.org/gitweb/?p=got.git;a=commit;h=4f4d853e5a672ea469a2532774867305712b418e

I could do a full pack run on the openbsd src repo and log the time it
takes to deltify each object. That should give us a list of potential
edge cases. Would that help?

I would not be surprised if some edge cases could be triggered with
files beneath sys/dev/pci/drm/amd/include/asic_reg/ because these files
are very slow to unpack during 'got checkout' and have already triggered
various bugs in our handling of deltas while reading packs.