"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
ori@eigenstate.org
Subject:
Re: reuse deltas while packing
To:
ori@eigenstate.org, stsp@stsp.name
Cc:
gameoftrees@openbsd.org, naddy@mips.inka.de
Date:
Tue, 08 Feb 2022 18:43:37 -0500

Download raw body.

Thread
Quoth Stefan Sperling <stsp@stsp.name>:
> On Tue, Feb 08, 2022 at 05:38:22PM -0500, ori@eigenstate.org wrote:
> > Quoth Stefan Sperling <stsp@stsp.name>:
> > > 
> > > However, the deltification algorithms implemented by Git and Got are not
> > > the same. It is possible that a significant difference will always remain
> > > unless we rewrite code inherited from git9 and use a different approach.
> > > 
> > 
> > Are there any objects that it performs particularly
> > poorly on? I remember measuring, and it wasn't worse
> > by a huge margin (about 10% in my testing).
> > 
> > I'd be happy to look and improve the algorithm.
> > 
> 
> There is one change we made relative to git9 that could be relevant.
> We only try 3 objects back as delta bases whereas the original code tried
> the 10 objects back. This was done to speed up packing without delta-reuse,
> and it did grow our pack files a bit. Relevant discussion with some people
> collecting data points was on IRC and is probably lost by now.
> https://git.gameoftrees.org/gitweb/?p=got.git;a=commit;h=4f4d853e5a672ea469a2532774867305712b418e
> 
> I could do a full pack run on the openbsd src repo and log the time it
> takes to deltify each object. That should give us a list of potential
> edge cases. Would that help?
> 
> I would not be surprised if some edge cases could be triggered with
> files beneath sys/dev/pci/drm/amd/include/asic_reg/ because these files
> are very slow to unpack during 'got checkout' and have already triggered
> various bugs in our handling of deltas while reading packs.
> 

Ah -- sorry, I meant performance wrt size.

I benchmarked the time it takes too, but I
don't remember the results; I concluded that
if it became a problem, the biggest benefit
would come from delta reuse.

Though, if there are files that are slow to
(un)deltify, that's also worth investigating.