"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Stefan Sperling <stsp@stsp.name>
Subject:
Re: reject Git repositories using multi-pack-index (MIDX) files
To:
Runxi Yu <me@runxiyu.org>
Cc:
gameoftrees@openbsd.org, ori@eigenstate.org, ori@orib.dev
Date:
Wed, 20 May 2026 20:44:51 +0200

Download raw body.

Thread
On Wed, May 20, 2026 at 05:59:51PM +0000, Runxi Yu wrote:
> Did not read in immense detail, but the idea is +1 for me.
> I'll bikeshed a bit on MIDX itself.
> 
> It shouldn't be too hard to implement MIDX,
> although I removed them from furgit
> because my initial implementation was faulty
> and I didn't really bother.
> Though, personally I'm not a huge fan of MIDX
> as the approach to searching across multiple packfiles...
> since it increase a lot of complexity
> while MIDX'es are not individually mutable either,
> so you get chains of MIDX'es,
> which sorta re-create the problem of too many .idx'es.
> Sure, it allows for a slightly more flexible maintenance strategy,
> but I'm not so sure about it.
> If anyone has resources regarding
> the design of multi-pack-index'es, I'm very happy to read.
> 
> Can we consider this alternative together sometime?
> I haven't ironed out some of the details yet, but:
> https://codeberg.org/lindenii/furgit/raw/branch/master/research/packfile_bloom.txt

I haven't read details about MIDX yet, so no idea.

> Packfile bloom filter RFC
> =========================
> 
> Problem
> -------
> 
> Especially for server-side usages, repacking is extremely expensive, and
> creating multi-pack-indexes is still rather expensive. Incremental MIDX
> partially solves this, but would defeat the purpose of MIDX when there are too
> many of them, as Git would still have to walk the MIDXes in order while
> performing expensive indexing queries.
> 
> Idea
> ----
> 
> Each MIDX layer, and each non-MIDX index, comes with a bloom filter.

Your idea should work.

Game of Trees already uses a bloom filter for regular .idx files.


commit b343c297c60d4200da952ab5b2843eec39ed42b1
from: Stefan Sperling <stsp@stsp.name>
date: Mon Oct 11 18:54:11 2021 UTC
 
 use a bloom filter to avoid pointless pack index searches
 
M  got/Makefile              |    4+  3-
M  gotadmin/Makefile         |    3+  3-
M  gotweb/Makefile           |    3+  2-
A  lib/bloom.c               |  184+  0-
A  lib/bloom.h               |  187+  0-
M  lib/got_lib_repository.h  |   16+  0-
A  lib/murmurhash2.c         |   61+  0-
A  lib/murmurhash2.h         |    7+  0-
M  lib/repository.c          |  100+  0-
M  regress/fetch/Makefile    |    2+  2-
M  tog/Makefile              |    3+  3-

11 files changed, 570 insertions(+), 13 deletions(-)