"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Christian Weisgerber <naddy@mips.inka.de>
Subject:
Re: optimise reading the file index
To:
Mark Jamsek <mark@jamsek.com>
Cc:
gameoftrees@openbsd.org
Date:
Thu, 27 Jul 2023 15:11:04 +0200

Download raw body.

Thread
  • Mark Jamsek:

    optimise reading the file index

  • Christian Weisgerber:

    optimise reading the file index

  • Mark Jamsek:
    
    > The main change is that we memory map the file index to avoid lots of
    > small reads that can be costly in large repos, for example, with many
    > files.
    
    I'd think that "got status" in /usr/src is dominated by the 92,300
    fstatat() calls...
    
    I tried this (cd /usr/src; got st) on an APU2, which qualifies as
    slow for amd64 purposes:
    
    0.92-current, three runs:
        0m19.97s real     0m08.82s user     0m11.16s system
        0m19.95s real     0m08.56s user     0m10.93s system
        0m19.99s real     0m08.74s user     0m10.96s system
    
    + patch, three runs:
        0m19.28s real     0m08.21s user     0m10.96s system
        0m19.14s real     0m08.02s user     0m10.76s system
        0m19.33s real     0m07.72s user     0m11.42s system
    
    I think we need more measurements to be certain that there is any
    effect at all. :->
    
    
    I still have a code comment:
    
    > +static const struct got_error *
    > +mread_fileindex_path(char **path, struct got_hash *ctx, const uint8_t *map,
    > +    size_t mapsz, size_t *offset)
    > +{
    > +	const uint8_t	*p, *nul;
    > +	size_t		 len, pad, pathlen;
    > +
    > +	if (mapsz < *offset)
    > +		return got_error(GOT_ERR_FILEIDX_BAD);
    > +
    > +	p = map + *offset;
    > +
    > +	nul = memchr(p, '\0', mapsz);
    > +	len = nul - p;
    > +	pad = 8 - len % 8;
    > +
    > +	pathlen = len + pad;
    > +
    > +	if (mapsz < *offset + pathlen)
    > +		return got_error(GOT_ERR_FILEIDX_BAD);
    
    If the file index is corrupt, the memchr() can overrun the mapped
    area and trigger a segfault.  The max length should be something
    like map + mapz - p.
    
    -- 
    Christian "naddy" Weisgerber                          naddy@mips.inka.de
    
    
  • Mark Jamsek:

    optimise reading the file index

  • Christian Weisgerber:

    optimise reading the file index