From: Steven McDonald Subject: Re: patch: regress: Work around glitchy clocks in vmm guests To: Stefan Sperling , gameoftrees@openbsd.org Date: Sat, 16 Nov 2019 16:53:07 +0100 Hi Stefan, Apologies for not getting back to you sooner. I do still plan to work on it and I already have a hacky patch from a month ago along the lines you suggested which does seem to fix the problem. I'll hopefully have a chance to clean it up and post it this weekend. Also, you are correct that this is not vmm-specific. I have since observed it on real hardware with OpenBSD while doing a full build of base. It does seem to happen more often in vmm guests even under light load, though. Cheers, Steven On Sat, 9 Nov 2019 17:14:12 +0200 Stefan Sperling wrote: > Hey Steven, are you still planning to work on this issue? > If not, that's OK. I will try to handle it. I just don't want > to stomp on a problem area you would like to work on. > > Cheers, > Stefan > > On Sun, Sep 29, 2019 at 11:53:18AM +0200, Stefan Sperling wrote: > > On Sun, Sep 29, 2019 at 12:33:13AM +0200, Steven McDonald wrote: > > > Some tests fail intermittently in vmm because the clock does not > > > advance between steps, resulting in files that should have a newer > > > mtime than the repo metadata but don't. Example failure: > > > > > > test_status_shows_local_mods_after_update --- > > > /tmp/got-test-status_shows_local_mods_after_update-duAj87aL/stdout.expected > > > Sun Sep 29 00:18:36 2019 +++ > > > /tmp/got-test-status_shows_local_mods_after_update-duAj87aL/stdout > > > Sun Sep 29 00:18:36 2019 @@ -1 +0,0 @@ -M numbers test failed; > > > leaving test data in > > > /tmp/got-test-status_shows_local_mods_after_update-duAj87aL > > > > > > Comparing the mtime on the 'numbers' file with that in > > > .got/file-index shows that the timestamps are identical, which is > > > why got does not treat this file as modified. > > > > > > This intermittent behaviour can be observed outside the context > > > of got (here, x is a small program that just calls stat(2) on foo > > > and displays the mtime): > > > > > > $ echo a >foo; ./x; echo b >foo; ./x > > > 1569708831.911759175 > > > 1569708831.931539267 > > > $ echo a >foo; ./x; echo b >foo; ./x > > > 1569708832.741842903 > > > 1569708832.741842903 > > > > > > This is unlikely to be a problem in real-world usage because > > > humans aren't as fast as these regression tests, so I think > > > working around it in the tests is sufficient. > > > > > > The below diff makes this work consistently; sleeping for a > > > nanosecond is sufficient because we just need the clock to > > > advance by any amount at all. > > > > > > Better suggestions/OKs? > > > > Subversion handles this race in C code by sleeping for a short > > amount of time to ensure that files modified after it exits have a > > different time stamp from the one which was recorded in meta data. > > > > Docs: > > https://svn.apache.org/viewvc/subversion/tags/1.10.6/subversion/include/svn_io.h?revision=1821224&view=markup#l580 > > > > Implementation: > > https://svn.apache.org/viewvc/subversion/tags/1.10.6/subversion/libsvn_subr/io.c?revision=1865520&view=markup#l1321 > > > > Note that SVN has to worry about filesystems without sub-second > > timestamp resolution, e.g. VFAT which has a modified timestamp > > resolution of 2 seconds. We can probably assume nanosecond > > timestamps on OpenBSD? Note the comments in SVN's code regarding > > Linux ext3/4, though. Apparently 10ms would be safe? > > > > SVN disables sleep for timestamps in its regress test suite so that > > tests can be run on VFAT in a reasonable amount of time. This means > > SVN's tests have to always change the size of modified files to > > avoid the timestamp race, which test authors (myself included) > > occasionally forget to do... > > > > Writing reliable tests should be an easy thing to do, and your > > suggested approach does not make it any easier. Also consider that > > the scope of this problem is wider than our regress tests. Anyone > > writing shell scripts with got will eventually hit this issue and > > would end up debugging it the hard way. > > > > There's already a test which ensures that file size is changed to > > work around spurious test failures, see below. I did this as a > > quick fix but should have written a proper fix instead. This hack > > could be removed if a better solution were implemented in C. > > > > A good place to sleep might be at the bottom of > > worktree.c:sync_fileindex(). Do you want to try this? > > > > FWIW, the problem fixed below was seen on bare iron, not in a VM. > > > > ----------------------------------------------- > > commit 6c6b73bb412843c96a7dca913c41a91827d9018c > > from: Stefan Sperling > > date: Sat Aug 10 15:24:59 2019 UTC > > > > fix race condition in test_revert_patch_one_change > > > > diff ede14c87a98ea1c97ce15353442b532ac77a8770 > > 9acaee36224e921ff553189e5a437768de635992 blob - > > d66beb7f0760252ff97e00a9b4415b40cf50049b blob + > > 170bc3b29f399c15ed51b80c19922e0d5c6ba58b --- > > regress/cmdline/revert.sh +++ regress/cmdline/revert.sh > > @@ -809,7 +809,10 @@ function test_revert_patch_one_change { > > return 1 > > fi > > > > - sed -i -e 's/^2$/a/' $testroot/wt/numbers > > + # Ensure file size is changed. Avoids race condition > > causing test > > + # failures where 'got revert' does not see changes to > > revert if > > + # timestamps and size in stat info remain unchanged. > > + sed -i -e 's/^2$/aa/' $testroot/wt/numbers > > > > # revert change with -p > > printf "y\n" > $testroot/patchscript > > @@ -826,7 +829,7 @@ function test_revert_patch_one_change { > > @@ -1,5 +1,5 @@ > > 1 > > -2 > > -+a > > ++aa > > 3 > > 4 > > 5 > > @@ -834,6 +837,13 @@ function test_revert_patch_one_change { > > M numbers (change 1 of 1) > > revert this change? [y/n/q] y > > EOF > > + ret="$?" > > + if [ "$ret" != "0" ]; then > > + echo "got revert command failed unexpectedly" >&2 > > + test_done "$testroot" "1" > > + return 1 > > + fi > > + > > cmp -s $testroot/stdout.expected $testroot/stdout > > ret="$?" > > if [ "$ret" != "0" ]; then > > > >