"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Stefan Sperling <stsp@stsp.name>
Subject:
Re: make 'got send' detect closed connections
To:
Omar Polo <op@omarpolo.com>
Cc:
gameoftrees@openbsd.org
Date:
Wed, 13 Sep 2023 12:28:53 +0200

Download raw body.

Thread
On Wed, Sep 13, 2023 at 12:18:12AM +0200, Omar Polo wrote:
> On 2023/09/12 21:17:38 +0200, Stefan Sperling <stsp@stsp.name> wrote:
> > I am seeing an issue while trying to send changes to gothub, where
> > the gotsh process on the server runs into an error (cause unknown).
> > 
> > This in turn triggers a problem where 'got send' does not realize
> > that unnecessary work is being done:
> > 
> > $ got send -b qwx gothub
> > Connecting to "gothub" ssh://stsp@gothub.org/src.git
> > Host key fingerprint is SHA256:HT7ohAIc4knlmUm3vFUyyK9/DW/sFccZuPG0oNwBWYA
> > 16419 commits coloredgotsh: operation timed out
> > 228961 commits colored; 48 objects found; 7581 trees scanned
> > packing 1 reference; 48 objects; deltify: 100%; writing pack:  109K 100%
> > got: privsep peer process closed pipe
> > $ 
> > 
> > There are two errors in the above trace:
> > 
> >  1) gotsh: operation timed out
> >  2) got: privsep peer process closed pipe
> > 
> > The first error happens while we are coloring commits. We do not notice the
> > problem at that point, however. We keep building the pack file, upload it
> > (not sure where uploaded data goes -- perhaps ssh discards it?) and then
> > we finally see a problem.
> > 
> > An ssh debug trace shows that ssh detects the connection being closed
> > when gotsh reports error 1:
> > 
> > [[[
> > got-send-pack: updating refs/heads/qwx c351eb807cdb2a9571bc3dde12654f23949e7c45 -> 903db0c03af248b610c1338fc1340e55bfa9074a
> > got-send-pack: writepkt: 0000
> > 16785 commits coloreddebug2: channel 0: rcvd ext data 27
> > gotsh: operation timed out
> > debug2: channel 0: written 27 to efd 6
> > debug2: channel 0: rcvd eof
> > debug2: channel 0: output open -> drain
> > debug2: channel 0: obuf empty
> > ]]]
> > 
> > The patch below makes 'got send' detect the dead connection while
> > generating the pack file.
> > Is there a better way than regularly polling the file descriptor to
> > detect a POLLHUP? I don't have a better idea. It is effective though:
> > 
> > $ got send -b qwx gothub
> > Connecting to "gothub" ssh://stsp@gothub.org/src.git
> > Host key fingerprint is SHA256:HT7ohAIc4knlmUm3vFUyyK9/DW/sFccZuPG0oNwBWYA
> > 15062 commits coloredgotsh: operation timed out
> > 18982 commits colored
> > got: server unexpectedly closed the connection
> > $
> > 
> > Tests keep passing with this diff. I don't think it makes sense to add
> > test coverage for this corner case. It is largely a cosmetic problem.
> > 
> > ok?
> 
> No idea why gotd fails like that.

No clue in the server-side logs either :(
>
> Diff reads fine but I'm not sure it's worth to add polling in
> send_progress() for this fringe case to be honest.
> 
> Alternatively, if ssh dies (not sure it does from the trace), maybe we
> can catch the SIGCHLD, set a flag, and treat it like an interrupt to
> abort the operation?

The idea certainly works (see diff below):

$  got send -b qwx gothub
Connecting to "gothub" ssh://stsp@gothub.org/src.git
Host key fingerprint is SHA256:HT7ohAIc4knlmUm3vFUyyK9/DW/sFccZuPG0oNwBWYA
15671 commits coloredgotsh: operation timed out

got: server unexpectedly closed the connection
$

However, the patch as I wrote it has unintended side effects.
For some reason I am seeing sporadic errors during test runs once
we start catching sigchld.
There are two chldren involved (got-send-pack and ssh), which might
be relatad to this issue.
I don't have time right now to dig into how we could avoid this.

$ ./send.sh
test_send_basic got: ppoll: Interrupted system call
got clone command failed unexpectedly
test failed; leaving test data in /tmp/got-test-send_basic-uyH6MHTmyu
test_send_rebase_required ok
test_send_rebase_required_overwrite got: ppoll: Interrupted system call
got-index-pack: error 7 "unexpected end of file": poll: unexpected end of file
got-index-pack: unexpected end of file
got-index-pack: error 7 "unexpected end of file": poll: unexpected end of file
got clone command failed unexpectedly
test failed; leaving test data in /tmp/got-test-send_rebase_required_overwrite-skJFvpCG11
test_send_merge_commit got: ppoll: Interrupted system call
got clone command failed unexpectedly
test failed; leaving test data in /tmp/got-test-send_merge_commit-hI8I4r8qPy
test_send_delete got: ppoll: Interrupted system call
got-index-pack: error 7 "unexpected end of file": poll: unexpected end of file
got-index-pack: unexpected end of file
got-index-pack: error 7 "unexpected end of file": poll: unexpected end of file
got clone command failed unexpectedly
test failed; leaving test data in /tmp/got-test-send_delete-v4hnaX93Bd
test_send_clone_and_send got: ppoll: Interrupted system call
got-index-pack: error 7 "unexpected end of file": poll: unexpected end of file
got-index-pack: unexpected end of file
got-index-pack: error 7 "unexpected end of file": poll: unexpected end of file
got clone command failed unexpectedly
test failed; leaving test data in /tmp/got-test-send_clone_and_send-kQfi2Kfwi0
test_send_tags got: ppoll: Interrupted system call
got clone command failed unexpectedly
got-index-pack: error 7 "unexpected end of file": poll: unexpected end of file
got-index-pack: unexpected end of file
got-index-pack: error 7 "unexpected end of file": poll: unexpected end of file
test failed; leaving test data in /tmp/got-test-send_tags-fKXguCIf0P
test_send_tag_of_deleted_branch ok
test_send_new_branch got: ppoll: Interrupted system call
got clone command failed unexpectedly
test failed; leaving test data in /tmp/got-test-send_new_branch-uHyO79waUP
test_send_all_branches got: ppoll: Interrupted system call
got clone command failed unexpectedly
test failed; leaving test data in /tmp/got-test-send_all_branches-K5864CWAUk
test_send_to_empty_repo ok
test_send_and_fetch_config ok
test_send_config got: ppoll: Interrupted system call
got clone command failed unexpectedly
test failed; leaving test data in /tmp/got-test-send_fetch_conf-cfJGaTfVV3
test_send_rejected got: ppoll: Interrupted system call
got clone command failed unexpectedly
test failed; leaving test data in /tmp/got-test-send_rejected-lEwmHRd9Zn
$


-----------------------------------------------
 catch sigchld
 
diff 0ec0e499ac07b3fa388b786eb0cb489bd810df26 d40e5178f3f8ef93a417347f667cf79bdc189f2d
commit - 0ec0e499ac07b3fa388b786eb0cb489bd810df26
commit + d40e5178f3f8ef93a417347f667cf79bdc189f2d
blob - 73812ddbb0f2e3a0fee927fc0ae5b873c04e385a
blob + b87cb9ac7d724604ca27622821e0ebd0ba2af4f2
--- got/got.c
+++ got/got.c
@@ -70,6 +70,7 @@
 
 static volatile sig_atomic_t sigint_received;
 static volatile sig_atomic_t sigpipe_received;
+static volatile sig_atomic_t sigchld_received;
 
 static void
 catch_sigint(int signo)
@@ -83,6 +84,11 @@ catch_sigpipe(int signo)
 	sigpipe_received = 1;
 }
 
+static void
+catch_sigchld(int signo)
+{
+	sigchld_received = 1;
+}
 
 struct got_cmd {
 	const char	*cmd_name;
@@ -246,6 +252,7 @@ main(int argc, char *argv[])
 
 	signal(SIGINT, catch_sigint);
 	signal(SIGPIPE, catch_sigpipe);
+	signal(SIGCHLD, catch_sigchld);
 
 	for (i = 0; i < nitems(got_commands); i++) {
 		const struct got_error *error;
@@ -9537,6 +9544,7 @@ struct got_send_progress_arg {
 	int printed_something;
 	int sent_something;
 	struct got_pathlist_head *delete_branches;
+	pid_t *sendpid;
 };
 
 static const struct got_error *
@@ -9552,7 +9560,19 @@ send_progress(void *arg, int ncolored, int nfound, int
 	int print_colored = 0, print_found = 0, print_trees = 0;
 	int print_searching = 0, print_total = 0;
 	int print_deltify = 0, print_written = 0, print_sent = 0;
+	int sendstatus;
 
+	if (sigchld_received) {
+		sigchld_received = 0;
+		if (waitpid(*a->sendpid, &sendstatus, WNOHANG) == -1)
+			return got_error_from_errno("waitpid");
+		if (WIFEXITED(sendstatus)) {
+			*a->sendpid = -1;
+			return got_error_msg(GOT_ERR_CANCELLED,
+			    "server unexpectedly closed the connection");
+		}
+	}
+
 	if (a->verbosity < 0)
 		return NULL;
 
@@ -10030,6 +10050,7 @@ cmd_send(int argc, char *argv[])
 	spa.last_p_written = -1;
 	spa.verbosity = verbosity;
 	spa.delete_branches = &delete_branches;
+	spa.sendpid = &sendpid;
 	error = got_send_pack(remote_name, &branches, &tags, &delete_branches,
 	    verbosity, overwrite_refs, sendfd, repo, send_progress, &spa,
 	    check_cancelled, NULL);