From: ori@orib.dev Subject: Re: reject Git repositories using multi-pack-index (MIDX) files To: me@runxiyu.org, stsp@stsp.name Cc: gameoftrees@openbsd.org, ori@eigenstate.org, ori@orib.dev Date: Wed, 20 May 2026 22:23:47 -0400 Quoth Stefan Sperling : > On Wed, May 20, 2026 at 05:59:51PM +0000, Runxi Yu wrote: > > Did not read in immense detail, but the idea is +1 for me. > > I'll bikeshed a bit on MIDX itself. > > > > It shouldn't be too hard to implement MIDX, > > although I removed them from furgit > > because my initial implementation was faulty > > and I didn't really bother. > > Though, personally I'm not a huge fan of MIDX > > as the approach to searching across multiple packfiles... > > since it increase a lot of complexity > > while MIDX'es are not individually mutable either, > > so you get chains of MIDX'es, > > which sorta re-create the problem of too many .idx'es. > > Sure, it allows for a slightly more flexible maintenance strategy, > > but I'm not so sure about it. > > If anyone has resources regarding > > the design of multi-pack-index'es, I'm very happy to read. > > > > Can we consider this alternative together sometime? > > I haven't ironed out some of the details yet, but: > > https://codeberg.org/lindenii/furgit/raw/branch/master/research/packfile_bloom.txt > > I haven't read details about MIDX yet, so no idea. > > > Packfile bloom filter RFC > > ========================= > > > > Problem > > ------- > > > > Especially for server-side usages, repacking is extremely expensive, and > > creating multi-pack-indexes is still rather expensive. Incremental MIDX > > partially solves this, but would defeat the purpose of MIDX when there are too > > many of them, as Git would still have to walk the MIDXes in order while > > performing expensive indexing queries. > > > > Idea > > ---- > > > > Each MIDX layer, and each non-MIDX index, comes with a bloom filter. > > Your idea should work. > > Game of Trees already uses a bloom filter for regular .idx files. > > > commit b343c297c60d4200da952ab5b2843eec39ed42b1 > from: Stefan Sperling > date: Mon Oct 11 18:54:11 2021 UTC > > use a bloom filter to avoid pointless pack index searches > > M got/Makefile | 4+ 3- > M gotadmin/Makefile | 3+ 3- > M gotweb/Makefile | 3+ 2- > A lib/bloom.c | 184+ 0- > A lib/bloom.h | 187+ 0- > M lib/got_lib_repository.h | 16+ 0- > A lib/murmurhash2.c | 61+ 0- > A lib/murmurhash2.h | 7+ 0- > M lib/repository.c | 100+ 0- > M regress/fetch/Makefile | 2+ 2- > M tog/Makefile | 3+ 3- > > 11 files changed, 570 insertions(+), 13 deletions(-) > So, while I don't have a ton of repos where packs are a problem, I think it may be interesting to experiment with just.. appending to packs. It should be easy enough to extend the packfile, and then either read/expand/rewrite the index or come up with some sort of appendable format. Do we have some repos/tarballs we can use as benchmarks for the various approaches?