deltify addblk() fixes

From:: Christian Weisgerber <naddy@mips.inka.de>
Subject:: Re: deltify addblk() fixes
To:: gameoftrees@openbsd.org
Date:: Wed, 9 Jun 2021 17:15:35 +0200

Download raw body.

Thread

2021-06-05 09:44 Stefan Sperling:
deltify addblk() fixes
- 2021-06-09 15:15 Christian Weisgerber:
  deltify addblk() fixes
- - 2021-06-09 16:46 ori@eigenstate.org:
    deltify addblk() fixes
  - 2021-06-10 13:40 Stefan Sperling:
    deltify addblk() fixes

Stefan Sperling:

> Fix deltify addblk() list size confusion: We only have 'old_size' number
> of blocks to rehash after growing the array.

That fix is correct.

> And only extend the array slightly when we outgrow it, instead of doubling
> the array's size each time

Isn't the underlying idea that the array is somewhat sparsely
populated to that h % dt->nalloc is likely to hit an empty slot on
insert and the right slot on lookup?

Anyway, this doesn't look right:

>  	dt->nblocks++;
> -	if (dt->nalloc < 2 * dt->nblocks) {
> +	if (dt->nalloc < dt->nblocks + 64) {
>  		struct got_delta_block *db;
> -		nalloc = dt->nalloc * 2;
> +		size_t old_size = dt->nalloc;
>  		db = dt->blocks;
> -		dt->blocks = calloc(nalloc, sizeof(struct got_delta_block));
> +		dt->blocks = calloc(dt->nblocks + 64, sizeof(struct got_delta_block));
>  		if (dt->blocks == NULL) {
>  			err = got_error_from_errno("calloc");
>  			dt->blocks = db;
>  			return err;
>  		}
> -		dt->nalloc = nalloc;
> +		dt->nalloc = dt->nblocks + 64;

Assume that dt->nalloc is 100.  Then as soon as dt->nblock hits 37,
the if triggers (100 < 101).  We then allocate 101 blocks...
Therefore, each addblk() call now extends the array by 1 block.

I think you meant
   calloc(dt->nalloc + 64, ...)
   dt->nalloc += 64


While we're here, I'm also puzzled by this part:

                 * Avoid adding duplicate blocks.
                 * NB: A matching hash is insufficient for detecting equality.
                 * The hash can only detect inequality, so only check 'len'.
                 */
                if (len == dt->blocks[i].len) {
                ...
                        if (memcmp(buf, buf2, len) == 0)
                                return NULL;
                }

If the hash can only detect inequality, shouldn't we still check it

                if (len == dt->blocks[i].len && h == dt->blocks[i].hash) {

to skip expensive compares?

-- 
Christian "naddy" Weisgerber                          naddy@mips.inka.de

2021-06-05 09:44 Stefan Sperling:
deltify addblk() fixes
- 2021-06-09 15:15 Christian Weisgerber:
  deltify addblk() fixes
- - 2021-06-09 16:46 ori@eigenstate.org:
    deltify addblk() fixes
  - 2021-06-10 13:40 Stefan Sperling:
    deltify addblk() fixes