From: ori@eigenstate.org Subject: Re: deltify addblk() fixes To: gameoftrees@openbsd.org, naddy@mips.inka.de Date: Wed, 09 Jun 2021 12:46:17 -0400 Quoth Christian Weisgerber : > > If the hash can only detect inequality, shouldn't we still check it > > if (len == dt->blocks[i].len && h == dt->blocks[i].hash) { > > to skip expensive compares? Yes, that would be correct -- I think this is my fault, as a result of how the code initially evolved. The first iteration used a worse algorithm with sha1 hashes, and it turned out to be faster to just compare rather than hashing. I fixed the algorithm, but didn't change the lookup. It doesn't seem to matter much in practice, but it's not harmful. As a side note, it may be worth citing the algorithm for chunking used with modification: FastCDC, from usenix 2016: https://www.usenix.org/conference/atc16/technical-sessions/presentation/xia The block stretching is an adaptation: FastCDC is concerned with deduping and appending into a persistent store, while we're just interested in deltifying two objects against each other.