From: Stefan Sperling Subject: Re: diff: detect input file truncation To: Dag-Erling Smørgrav Cc: gameoftrees@openbsd.org Date: Thu, 23 May 2024 11:03:12 +0200 On Thu, May 23, 2024 at 05:25:07AM +0200, Dag-Erling Smørgrav wrote: > A downstream consumer of FreeBSD's GoT-derived diff implementation > reported experiencing SIGBUS crashes in diff. They didn't provide much > context, but from the stack traces I was given the crash appears to > occur while atomizing mmapped files, and I was able to reproduce it with > a test program that creates and mmaps a file, then truncates it before > calling diff_atomize_file() with the original length. The atomizer then > crashes when it reaches the end of the page that contains the new EOF. > This turned out to be easy enough to work around, so here are the > patches if you want them. A regression test can be found here: > > https://cgit.freebsd.org/src/tree/lib/libdiff/tests/libdiff_test.c > > Note that pos must be volatile to prevent the compiler from assuming > that it is unchanged when sigsetjmp() returns non-zero, so I added > line_start as an optimizable copy of pos inside the loop. I'm wondering > if this could have been done differently, e.g. by replacing `end = pos` > with `end = *(const uint8_t * volatile *)&pos` so pos itself doesn't > have to be volatile. I don't think we should try to implement a workaround in the diff code for this. If files being diffed change in any way while the diff code is working with them it is unreasonable to expect a useful result. Applications calling into this code should try to ensure that the passed map + length remains valid. If this is not easy and they would like see an error in regular code flow rather than a signal then there is a compile-time switch to disable mmap: $ grep MMAP diff/Makefile #CPPFLAGS += -DDIFF_NO_MMAP $ The diff code would then use stdio only and not crash the application.