"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
ori@orib.dev
Subject:
Re: reject Git repositories using multi-pack-index (MIDX) files
To:
me@runxiyu.org, stsp@stsp.name
Cc:
gameoftrees@openbsd.org, ori@eigenstate.org, ori@orib.dev
Date:
Wed, 20 May 2026 22:23:47 -0400

Download raw body.

Thread
Quoth Stefan Sperling <stsp@stsp.name>:
> On Wed, May 20, 2026 at 05:59:51PM +0000, Runxi Yu wrote:
> > Did not read in immense detail, but the idea is +1 for me.
> > I'll bikeshed a bit on MIDX itself.
> > 
> > It shouldn't be too hard to implement MIDX,
> > although I removed them from furgit
> > because my initial implementation was faulty
> > and I didn't really bother.
> > Though, personally I'm not a huge fan of MIDX
> > as the approach to searching across multiple packfiles...
> > since it increase a lot of complexity
> > while MIDX'es are not individually mutable either,
> > so you get chains of MIDX'es,
> > which sorta re-create the problem of too many .idx'es.
> > Sure, it allows for a slightly more flexible maintenance strategy,
> > but I'm not so sure about it.
> > If anyone has resources regarding
> > the design of multi-pack-index'es, I'm very happy to read.
> > 
> > Can we consider this alternative together sometime?
> > I haven't ironed out some of the details yet, but:
> > https://codeberg.org/lindenii/furgit/raw/branch/master/research/packfile_bloom.txt
> 
> I haven't read details about MIDX yet, so no idea.
> 
> > Packfile bloom filter RFC
> > =========================
> > 
> > Problem
> > -------
> > 
> > Especially for server-side usages, repacking is extremely expensive, and
> > creating multi-pack-indexes is still rather expensive. Incremental MIDX
> > partially solves this, but would defeat the purpose of MIDX when there are too
> > many of them, as Git would still have to walk the MIDXes in order while
> > performing expensive indexing queries.
> > 
> > Idea
> > ----
> > 
> > Each MIDX layer, and each non-MIDX index, comes with a bloom filter.
> 
> Your idea should work.
> 
> Game of Trees already uses a bloom filter for regular .idx files.
> 
> 
> commit b343c297c60d4200da952ab5b2843eec39ed42b1
> from: Stefan Sperling <stsp@stsp.name>
> date: Mon Oct 11 18:54:11 2021 UTC
>  
>  use a bloom filter to avoid pointless pack index searches
>  
> M  got/Makefile              |    4+  3-
> M  gotadmin/Makefile         |    3+  3-
> M  gotweb/Makefile           |    3+  2-
> A  lib/bloom.c               |  184+  0-
> A  lib/bloom.h               |  187+  0-
> M  lib/got_lib_repository.h  |   16+  0-
> A  lib/murmurhash2.c         |   61+  0-
> A  lib/murmurhash2.h         |    7+  0-
> M  lib/repository.c          |  100+  0-
> M  regress/fetch/Makefile    |    2+  2-
> M  tog/Makefile              |    3+  3-
> 
> 11 files changed, 570 insertions(+), 13 deletions(-)
> 

So, while I don't have a ton of repos where packs
are a problem, I think it may be interesting to
experiment with just.. appending to packs.

It should be easy enough to extend the packfile,
and then either read/expand/rewrite the index or
come up with some sort of appendable format.

Do we have some repos/tarballs we can use as
benchmarks for the various approaches?