"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Stefan Sperling <stsp@stsp.name>
Subject:
Re: patch: regress: Work around glitchy clocks in vmm guests
To:
Steven McDonald <steven@steven-mcdonald.id.au>
Cc:
gameoftrees@openbsd.org
Date:
Fri, 6 Dec 2019 14:20:43 +0100

Download raw body.

Thread
On Sat, Nov 16, 2019 at 04:53:07PM +0100, Steven McDonald wrote:
> Hi Stefan,
> 
> Apologies for not getting back to you sooner. I do still plan to work
> on it and I already have a hacky patch from a month ago along the lines
> you suggested which does seem to fix the problem. I'll hopefully have a
> chance to clean it up and post it this weekend.
> 
> Also, you are correct that this is not vmm-specific. I have since
> observed it on real hardware with OpenBSD while doing a full build of
> base. It does seem to happen more often in vmm guests even under light
> load, though.
> 
> Cheers,
> Steven

Hey Steven,

Any news on your patch for this problem? :-)

I was just reminded by a regress test run which showed errors that
seem to be related to the problem:

./tag.sh
test_tag_create ok
test_tag_list --- /tmp/got-test-tag_list-Jt1dTziv/stdout.expected       Fri Dec  6 14:15:33 2019
+++ /tmp/got-test-tag_list-Jt1dTziv/stdout      Fri Dec  6 14:15:33 2019
@@ -1,7 +1,7 @@
 -----------------------------------------------
 tag 2.0.0 cb08dbbe7c752cb0190cf75d1cb97e58830b2062
 from: Flan Hacker <flan_hacker@openbsd.org>
-date: Fri Dec  6 13:15:32 2019 UTC
+date: Fri Dec  6 13:15:33 2019 UTC
 object: commit 05ff96777f29246373d7f39af449f8926c03ae4a

  test
@@ -9,7 +9,7 @@
 -----------------------------------------------
 tag 1.0.0 6f42a9187a0f56c14179a7b4910d54d54a777651
 from: Flan Hacker <flan_hacker@openbsd.org>
-date: Fri Dec  6 13:15:33 2019 UTC
+date: Fri Dec  6 13:15:32 2019 UTC
 object: commit 05ff96777f29246373d7f39af449f8926c03ae4a

  test
test failed; leaving test data in /tmp/got-test-tag_list-Jt1dTziv


> On Sat, 9 Nov 2019 17:14:12 +0200
> Stefan Sperling <stsp@stsp.name> wrote:
> 
> > Hey Steven, are you still planning to work on this issue?
> > If not, that's OK. I will try to handle it. I just don't want
> > to stomp on a problem area you would like to work on.
> > 
> > Cheers,
> > Stefan
> > 
> > On Sun, Sep 29, 2019 at 11:53:18AM +0200, Stefan Sperling wrote:
> > > On Sun, Sep 29, 2019 at 12:33:13AM +0200, Steven McDonald wrote:  
> > > > Some tests fail intermittently in vmm because the clock does not
> > > > advance between steps, resulting in files that should have a newer
> > > > mtime than the repo metadata but don't. Example failure:
> > > > 
> > > >   test_status_shows_local_mods_after_update ---
> > > > /tmp/got-test-status_shows_local_mods_after_update-duAj87aL/stdout.expected
> > > >       Sun Sep 29 00:18:36 2019 +++
> > > > /tmp/got-test-status_shows_local_mods_after_update-duAj87aL/stdout
> > > >  Sun Sep 29 00:18:36 2019 @@ -1 +0,0 @@ -M  numbers test failed;
> > > > leaving test data in
> > > > /tmp/got-test-status_shows_local_mods_after_update-duAj87aL
> > > > 
> > > > Comparing the mtime on the 'numbers' file with that in
> > > > .got/file-index shows that the timestamps are identical, which is
> > > > why got does not treat this file as modified.
> > > > 
> > > > This intermittent behaviour can be observed outside the context
> > > > of got (here, x is a small program that just calls stat(2) on foo
> > > > and displays the mtime):
> > > > 
> > > >   $ echo a >foo; ./x; echo b >foo; ./x 
> > > >   1569708831.911759175
> > > >   1569708831.931539267
> > > >   $ echo a >foo; ./x; echo b >foo; ./x 
> > > >   1569708832.741842903
> > > >   1569708832.741842903
> > > > 
> > > > This is unlikely to be a problem in real-world usage because
> > > > humans aren't as fast as these regression tests, so I think
> > > > working around it in the tests is sufficient.
> > > > 
> > > > The below diff makes this work consistently; sleeping for a
> > > > nanosecond is sufficient because we just need the clock to
> > > > advance by any amount at all.
> > > > 
> > > > Better suggestions/OKs?  
> > > 
> > > Subversion handles this race in C code by sleeping for a short
> > > amount of time to ensure that files modified after it exits have a
> > > different time stamp from the one which was recorded in meta data.
> > > 
> > > Docs:
> > > https://svn.apache.org/viewvc/subversion/tags/1.10.6/subversion/include/svn_io.h?revision=1821224&view=markup#l580
> > > 
> > > Implementation:
> > > https://svn.apache.org/viewvc/subversion/tags/1.10.6/subversion/libsvn_subr/io.c?revision=1865520&view=markup#l1321
> > > 
> > > Note that SVN has to worry about filesystems without sub-second
> > > timestamp resolution, e.g. VFAT which has a modified timestamp
> > > resolution of 2 seconds. We can probably assume nanosecond
> > > timestamps on OpenBSD? Note the comments in SVN's code regarding
> > > Linux ext3/4, though. Apparently 10ms would be safe?
> > > 
> > > SVN disables sleep for timestamps in its regress test suite so that
> > > tests can be run on VFAT in a reasonable amount of time. This means
> > > SVN's tests have to always change the size of modified files to
> > > avoid the timestamp race, which test authors (myself included)
> > > occasionally forget to do...
> > > 
> > > Writing reliable tests should be an easy thing to do, and your
> > > suggested approach does not make it any easier. Also consider that
> > > the scope of this problem is wider than our regress tests. Anyone
> > > writing shell scripts with got will eventually hit this issue and
> > > would end up debugging it the hard way.
> > > 
> > > There's already a test which ensures that file size is changed to
> > > work around spurious test failures, see below. I did this as a
> > > quick fix but should have written a proper fix instead. This hack
> > > could be removed if a better solution were implemented in C.
> > > 
> > > A good place to sleep might be at the bottom of
> > > worktree.c:sync_fileindex(). Do you want to try this?
> > > 
> > > FWIW, the problem fixed below was seen on bare iron, not in a VM.
> > > 
> > > -----------------------------------------------
> > > commit 6c6b73bb412843c96a7dca913c41a91827d9018c
> > > from: Stefan Sperling <stsp@stsp.name>
> > > date: Sat Aug 10 15:24:59 2019 UTC
> > >  
> > >  fix race condition in test_revert_patch_one_change
> > >  
> > > diff ede14c87a98ea1c97ce15353442b532ac77a8770
> > > 9acaee36224e921ff553189e5a437768de635992 blob -
> > > d66beb7f0760252ff97e00a9b4415b40cf50049b blob +
> > > 170bc3b29f399c15ed51b80c19922e0d5c6ba58b ---
> > > regress/cmdline/revert.sh +++ regress/cmdline/revert.sh
> > > @@ -809,7 +809,10 @@ function test_revert_patch_one_change {
> > >  		return 1
> > >  	fi
> > >  
> > > -	sed -i -e 's/^2$/a/' $testroot/wt/numbers
> > > +	# Ensure file size is changed. Avoids race condition
> > > causing test
> > > +	# failures where 'got revert' does not see changes to
> > > revert if
> > > +	# timestamps and size in stat info remain unchanged.
> > > +	sed -i -e 's/^2$/aa/' $testroot/wt/numbers
> > >  
> > >  	# revert change with -p
> > >  	printf "y\n" > $testroot/patchscript
> > > @@ -826,7 +829,7 @@ function test_revert_patch_one_change {
> > >  @@ -1,5 +1,5 @@
> > >   1
> > >  -2
> > > -+a
> > > ++aa
> > >   3
> > >   4
> > >   5
> > > @@ -834,6 +837,13 @@ function test_revert_patch_one_change {
> > >  M  numbers (change 1 of 1)
> > >  revert this change? [y/n/q] y
> > >  EOF
> > > +	ret="$?"
> > > +	if [ "$ret" != "0" ]; then
> > > +		echo "got revert command failed unexpectedly" >&2
> > > +		test_done "$testroot" "1"
> > > +		return 1
> > > +	fi
> > > +
> > >  	cmp -s $testroot/stdout.expected $testroot/stdout
> > >  	ret="$?"
> > >  	if [ "$ret" != "0" ]; then
> > > 
> > >   
> 
>