Download raw body.
patch: regress: Work around glitchy clocks in vmm guests
Hi Stefan,
Apologies for not getting back to you sooner. I do still plan to work
on it and I already have a hacky patch from a month ago along the lines
you suggested which does seem to fix the problem. I'll hopefully have a
chance to clean it up and post it this weekend.
Also, you are correct that this is not vmm-specific. I have since
observed it on real hardware with OpenBSD while doing a full build of
base. It does seem to happen more often in vmm guests even under light
load, though.
Cheers,
Steven
On Sat, 9 Nov 2019 17:14:12 +0200
Stefan Sperling <stsp@stsp.name> wrote:
> Hey Steven, are you still planning to work on this issue?
> If not, that's OK. I will try to handle it. I just don't want
> to stomp on a problem area you would like to work on.
>
> Cheers,
> Stefan
>
> On Sun, Sep 29, 2019 at 11:53:18AM +0200, Stefan Sperling wrote:
> > On Sun, Sep 29, 2019 at 12:33:13AM +0200, Steven McDonald wrote:
> > > Some tests fail intermittently in vmm because the clock does not
> > > advance between steps, resulting in files that should have a newer
> > > mtime than the repo metadata but don't. Example failure:
> > >
> > > test_status_shows_local_mods_after_update ---
> > > /tmp/got-test-status_shows_local_mods_after_update-duAj87aL/stdout.expected
> > > Sun Sep 29 00:18:36 2019 +++
> > > /tmp/got-test-status_shows_local_mods_after_update-duAj87aL/stdout
> > > Sun Sep 29 00:18:36 2019 @@ -1 +0,0 @@ -M numbers test failed;
> > > leaving test data in
> > > /tmp/got-test-status_shows_local_mods_after_update-duAj87aL
> > >
> > > Comparing the mtime on the 'numbers' file with that in
> > > .got/file-index shows that the timestamps are identical, which is
> > > why got does not treat this file as modified.
> > >
> > > This intermittent behaviour can be observed outside the context
> > > of got (here, x is a small program that just calls stat(2) on foo
> > > and displays the mtime):
> > >
> > > $ echo a >foo; ./x; echo b >foo; ./x
> > > 1569708831.911759175
> > > 1569708831.931539267
> > > $ echo a >foo; ./x; echo b >foo; ./x
> > > 1569708832.741842903
> > > 1569708832.741842903
> > >
> > > This is unlikely to be a problem in real-world usage because
> > > humans aren't as fast as these regression tests, so I think
> > > working around it in the tests is sufficient.
> > >
> > > The below diff makes this work consistently; sleeping for a
> > > nanosecond is sufficient because we just need the clock to
> > > advance by any amount at all.
> > >
> > > Better suggestions/OKs?
> >
> > Subversion handles this race in C code by sleeping for a short
> > amount of time to ensure that files modified after it exits have a
> > different time stamp from the one which was recorded in meta data.
> >
> > Docs:
> > https://svn.apache.org/viewvc/subversion/tags/1.10.6/subversion/include/svn_io.h?revision=1821224&view=markup#l580
> >
> > Implementation:
> > https://svn.apache.org/viewvc/subversion/tags/1.10.6/subversion/libsvn_subr/io.c?revision=1865520&view=markup#l1321
> >
> > Note that SVN has to worry about filesystems without sub-second
> > timestamp resolution, e.g. VFAT which has a modified timestamp
> > resolution of 2 seconds. We can probably assume nanosecond
> > timestamps on OpenBSD? Note the comments in SVN's code regarding
> > Linux ext3/4, though. Apparently 10ms would be safe?
> >
> > SVN disables sleep for timestamps in its regress test suite so that
> > tests can be run on VFAT in a reasonable amount of time. This means
> > SVN's tests have to always change the size of modified files to
> > avoid the timestamp race, which test authors (myself included)
> > occasionally forget to do...
> >
> > Writing reliable tests should be an easy thing to do, and your
> > suggested approach does not make it any easier. Also consider that
> > the scope of this problem is wider than our regress tests. Anyone
> > writing shell scripts with got will eventually hit this issue and
> > would end up debugging it the hard way.
> >
> > There's already a test which ensures that file size is changed to
> > work around spurious test failures, see below. I did this as a
> > quick fix but should have written a proper fix instead. This hack
> > could be removed if a better solution were implemented in C.
> >
> > A good place to sleep might be at the bottom of
> > worktree.c:sync_fileindex(). Do you want to try this?
> >
> > FWIW, the problem fixed below was seen on bare iron, not in a VM.
> >
> > -----------------------------------------------
> > commit 6c6b73bb412843c96a7dca913c41a91827d9018c
> > from: Stefan Sperling <stsp@stsp.name>
> > date: Sat Aug 10 15:24:59 2019 UTC
> >
> > fix race condition in test_revert_patch_one_change
> >
> > diff ede14c87a98ea1c97ce15353442b532ac77a8770
> > 9acaee36224e921ff553189e5a437768de635992 blob -
> > d66beb7f0760252ff97e00a9b4415b40cf50049b blob +
> > 170bc3b29f399c15ed51b80c19922e0d5c6ba58b ---
> > regress/cmdline/revert.sh +++ regress/cmdline/revert.sh
> > @@ -809,7 +809,10 @@ function test_revert_patch_one_change {
> > return 1
> > fi
> >
> > - sed -i -e 's/^2$/a/' $testroot/wt/numbers
> > + # Ensure file size is changed. Avoids race condition
> > causing test
> > + # failures where 'got revert' does not see changes to
> > revert if
> > + # timestamps and size in stat info remain unchanged.
> > + sed -i -e 's/^2$/aa/' $testroot/wt/numbers
> >
> > # revert change with -p
> > printf "y\n" > $testroot/patchscript
> > @@ -826,7 +829,7 @@ function test_revert_patch_one_change {
> > @@ -1,5 +1,5 @@
> > 1
> > -2
> > -+a
> > ++aa
> > 3
> > 4
> > 5
> > @@ -834,6 +837,13 @@ function test_revert_patch_one_change {
> > M numbers (change 1 of 1)
> > revert this change? [y/n/q] y
> > EOF
> > + ret="$?"
> > + if [ "$ret" != "0" ]; then
> > + echo "got revert command failed unexpectedly" >&2
> > + test_done "$testroot" "1"
> > + return 1
> > + fi
> > +
> > cmp -s $testroot/stdout.expected $testroot/stdout
> > ret="$?"
> > if [ "$ret" != "0" ]; then
> >
> >
patch: regress: Work around glitchy clocks in vmm guests