"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Steven McDonald <steven@steven-mcdonald.id.au>
Subject:
Re: patch: regress: Work around glitchy clocks in vmm guests
To:
Stefan Sperling <stsp@stsp.name>, gameoftrees@openbsd.org
Date:
Sat, 16 Nov 2019 16:53:07 +0100

Download raw body.

Thread
Hi Stefan,

Apologies for not getting back to you sooner. I do still plan to work
on it and I already have a hacky patch from a month ago along the lines
you suggested which does seem to fix the problem. I'll hopefully have a
chance to clean it up and post it this weekend.

Also, you are correct that this is not vmm-specific. I have since
observed it on real hardware with OpenBSD while doing a full build of
base. It does seem to happen more often in vmm guests even under light
load, though.

Cheers,
Steven

On Sat, 9 Nov 2019 17:14:12 +0200
Stefan Sperling <stsp@stsp.name> wrote:

> Hey Steven, are you still planning to work on this issue?
> If not, that's OK. I will try to handle it. I just don't want
> to stomp on a problem area you would like to work on.
> 
> Cheers,
> Stefan
> 
> On Sun, Sep 29, 2019 at 11:53:18AM +0200, Stefan Sperling wrote:
> > On Sun, Sep 29, 2019 at 12:33:13AM +0200, Steven McDonald wrote:  
> > > Some tests fail intermittently in vmm because the clock does not
> > > advance between steps, resulting in files that should have a newer
> > > mtime than the repo metadata but don't. Example failure:
> > > 
> > >   test_status_shows_local_mods_after_update ---
> > > /tmp/got-test-status_shows_local_mods_after_update-duAj87aL/stdout.expected
> > >       Sun Sep 29 00:18:36 2019 +++
> > > /tmp/got-test-status_shows_local_mods_after_update-duAj87aL/stdout
> > >  Sun Sep 29 00:18:36 2019 @@ -1 +0,0 @@ -M  numbers test failed;
> > > leaving test data in
> > > /tmp/got-test-status_shows_local_mods_after_update-duAj87aL
> > > 
> > > Comparing the mtime on the 'numbers' file with that in
> > > .got/file-index shows that the timestamps are identical, which is
> > > why got does not treat this file as modified.
> > > 
> > > This intermittent behaviour can be observed outside the context
> > > of got (here, x is a small program that just calls stat(2) on foo
> > > and displays the mtime):
> > > 
> > >   $ echo a >foo; ./x; echo b >foo; ./x 
> > >   1569708831.911759175
> > >   1569708831.931539267
> > >   $ echo a >foo; ./x; echo b >foo; ./x 
> > >   1569708832.741842903
> > >   1569708832.741842903
> > > 
> > > This is unlikely to be a problem in real-world usage because
> > > humans aren't as fast as these regression tests, so I think
> > > working around it in the tests is sufficient.
> > > 
> > > The below diff makes this work consistently; sleeping for a
> > > nanosecond is sufficient because we just need the clock to
> > > advance by any amount at all.
> > > 
> > > Better suggestions/OKs?  
> > 
> > Subversion handles this race in C code by sleeping for a short
> > amount of time to ensure that files modified after it exits have a
> > different time stamp from the one which was recorded in meta data.
> > 
> > Docs:
> > https://svn.apache.org/viewvc/subversion/tags/1.10.6/subversion/include/svn_io.h?revision=1821224&view=markup#l580
> > 
> > Implementation:
> > https://svn.apache.org/viewvc/subversion/tags/1.10.6/subversion/libsvn_subr/io.c?revision=1865520&view=markup#l1321
> > 
> > Note that SVN has to worry about filesystems without sub-second
> > timestamp resolution, e.g. VFAT which has a modified timestamp
> > resolution of 2 seconds. We can probably assume nanosecond
> > timestamps on OpenBSD? Note the comments in SVN's code regarding
> > Linux ext3/4, though. Apparently 10ms would be safe?
> > 
> > SVN disables sleep for timestamps in its regress test suite so that
> > tests can be run on VFAT in a reasonable amount of time. This means
> > SVN's tests have to always change the size of modified files to
> > avoid the timestamp race, which test authors (myself included)
> > occasionally forget to do...
> > 
> > Writing reliable tests should be an easy thing to do, and your
> > suggested approach does not make it any easier. Also consider that
> > the scope of this problem is wider than our regress tests. Anyone
> > writing shell scripts with got will eventually hit this issue and
> > would end up debugging it the hard way.
> > 
> > There's already a test which ensures that file size is changed to
> > work around spurious test failures, see below. I did this as a
> > quick fix but should have written a proper fix instead. This hack
> > could be removed if a better solution were implemented in C.
> > 
> > A good place to sleep might be at the bottom of
> > worktree.c:sync_fileindex(). Do you want to try this?
> > 
> > FWIW, the problem fixed below was seen on bare iron, not in a VM.
> > 
> > -----------------------------------------------
> > commit 6c6b73bb412843c96a7dca913c41a91827d9018c
> > from: Stefan Sperling <stsp@stsp.name>
> > date: Sat Aug 10 15:24:59 2019 UTC
> >  
> >  fix race condition in test_revert_patch_one_change
> >  
> > diff ede14c87a98ea1c97ce15353442b532ac77a8770
> > 9acaee36224e921ff553189e5a437768de635992 blob -
> > d66beb7f0760252ff97e00a9b4415b40cf50049b blob +
> > 170bc3b29f399c15ed51b80c19922e0d5c6ba58b ---
> > regress/cmdline/revert.sh +++ regress/cmdline/revert.sh
> > @@ -809,7 +809,10 @@ function test_revert_patch_one_change {
> >  		return 1
> >  	fi
> >  
> > -	sed -i -e 's/^2$/a/' $testroot/wt/numbers
> > +	# Ensure file size is changed. Avoids race condition
> > causing test
> > +	# failures where 'got revert' does not see changes to
> > revert if
> > +	# timestamps and size in stat info remain unchanged.
> > +	sed -i -e 's/^2$/aa/' $testroot/wt/numbers
> >  
> >  	# revert change with -p
> >  	printf "y\n" > $testroot/patchscript
> > @@ -826,7 +829,7 @@ function test_revert_patch_one_change {
> >  @@ -1,5 +1,5 @@
> >   1
> >  -2
> > -+a
> > ++aa
> >   3
> >   4
> >   5
> > @@ -834,6 +837,13 @@ function test_revert_patch_one_change {
> >  M  numbers (change 1 of 1)
> >  revert this change? [y/n/q] y
> >  EOF
> > +	ret="$?"
> > +	if [ "$ret" != "0" ]; then
> > +		echo "got revert command failed unexpectedly" >&2
> > +		test_done "$testroot" "1"
> > +		return 1
> > +	fi
> > +
> >  	cmp -s $testroot/stdout.expected $testroot/stdout
> >  	ret="$?"
> >  	if [ "$ret" != "0" ]; then
> > 
> >