Download raw body.
make 'got send' detect closed connections
On Wed, Sep 13, 2023 at 12:18:12AM +0200, Omar Polo wrote: > On 2023/09/12 21:17:38 +0200, Stefan Sperling <stsp@stsp.name> wrote: > > I am seeing an issue while trying to send changes to gothub, where > > the gotsh process on the server runs into an error (cause unknown). > > > > This in turn triggers a problem where 'got send' does not realize > > that unnecessary work is being done: > > > > $ got send -b qwx gothub > > Connecting to "gothub" ssh://stsp@gothub.org/src.git > > Host key fingerprint is SHA256:HT7ohAIc4knlmUm3vFUyyK9/DW/sFccZuPG0oNwBWYA > > 16419 commits coloredgotsh: operation timed out > > 228961 commits colored; 48 objects found; 7581 trees scanned > > packing 1 reference; 48 objects; deltify: 100%; writing pack: 109K 100% > > got: privsep peer process closed pipe > > $ > > > > There are two errors in the above trace: > > > > 1) gotsh: operation timed out > > 2) got: privsep peer process closed pipe > > > > The first error happens while we are coloring commits. We do not notice the > > problem at that point, however. We keep building the pack file, upload it > > (not sure where uploaded data goes -- perhaps ssh discards it?) and then > > we finally see a problem. > > > > An ssh debug trace shows that ssh detects the connection being closed > > when gotsh reports error 1: > > > > [[[ > > got-send-pack: updating refs/heads/qwx c351eb807cdb2a9571bc3dde12654f23949e7c45 -> 903db0c03af248b610c1338fc1340e55bfa9074a > > got-send-pack: writepkt: 0000 > > 16785 commits coloreddebug2: channel 0: rcvd ext data 27 > > gotsh: operation timed out > > debug2: channel 0: written 27 to efd 6 > > debug2: channel 0: rcvd eof > > debug2: channel 0: output open -> drain > > debug2: channel 0: obuf empty > > ]]] > > > > The patch below makes 'got send' detect the dead connection while > > generating the pack file. > > Is there a better way than regularly polling the file descriptor to > > detect a POLLHUP? I don't have a better idea. It is effective though: > > > > $ got send -b qwx gothub > > Connecting to "gothub" ssh://stsp@gothub.org/src.git > > Host key fingerprint is SHA256:HT7ohAIc4knlmUm3vFUyyK9/DW/sFccZuPG0oNwBWYA > > 15062 commits coloredgotsh: operation timed out > > 18982 commits colored > > got: server unexpectedly closed the connection > > $ > > > > Tests keep passing with this diff. I don't think it makes sense to add > > test coverage for this corner case. It is largely a cosmetic problem. > > > > ok? > > No idea why gotd fails like that. No clue in the server-side logs either :( > > Diff reads fine but I'm not sure it's worth to add polling in > send_progress() for this fringe case to be honest. > > Alternatively, if ssh dies (not sure it does from the trace), maybe we > can catch the SIGCHLD, set a flag, and treat it like an interrupt to > abort the operation? The idea certainly works (see diff below): $ got send -b qwx gothub Connecting to "gothub" ssh://stsp@gothub.org/src.git Host key fingerprint is SHA256:HT7ohAIc4knlmUm3vFUyyK9/DW/sFccZuPG0oNwBWYA 15671 commits coloredgotsh: operation timed out got: server unexpectedly closed the connection $ However, the patch as I wrote it has unintended side effects. For some reason I am seeing sporadic errors during test runs once we start catching sigchld. There are two chldren involved (got-send-pack and ssh), which might be relatad to this issue. I don't have time right now to dig into how we could avoid this. $ ./send.sh test_send_basic got: ppoll: Interrupted system call got clone command failed unexpectedly test failed; leaving test data in /tmp/got-test-send_basic-uyH6MHTmyu test_send_rebase_required ok test_send_rebase_required_overwrite got: ppoll: Interrupted system call got-index-pack: error 7 "unexpected end of file": poll: unexpected end of file got-index-pack: unexpected end of file got-index-pack: error 7 "unexpected end of file": poll: unexpected end of file got clone command failed unexpectedly test failed; leaving test data in /tmp/got-test-send_rebase_required_overwrite-skJFvpCG11 test_send_merge_commit got: ppoll: Interrupted system call got clone command failed unexpectedly test failed; leaving test data in /tmp/got-test-send_merge_commit-hI8I4r8qPy test_send_delete got: ppoll: Interrupted system call got-index-pack: error 7 "unexpected end of file": poll: unexpected end of file got-index-pack: unexpected end of file got-index-pack: error 7 "unexpected end of file": poll: unexpected end of file got clone command failed unexpectedly test failed; leaving test data in /tmp/got-test-send_delete-v4hnaX93Bd test_send_clone_and_send got: ppoll: Interrupted system call got-index-pack: error 7 "unexpected end of file": poll: unexpected end of file got-index-pack: unexpected end of file got-index-pack: error 7 "unexpected end of file": poll: unexpected end of file got clone command failed unexpectedly test failed; leaving test data in /tmp/got-test-send_clone_and_send-kQfi2Kfwi0 test_send_tags got: ppoll: Interrupted system call got clone command failed unexpectedly got-index-pack: error 7 "unexpected end of file": poll: unexpected end of file got-index-pack: unexpected end of file got-index-pack: error 7 "unexpected end of file": poll: unexpected end of file test failed; leaving test data in /tmp/got-test-send_tags-fKXguCIf0P test_send_tag_of_deleted_branch ok test_send_new_branch got: ppoll: Interrupted system call got clone command failed unexpectedly test failed; leaving test data in /tmp/got-test-send_new_branch-uHyO79waUP test_send_all_branches got: ppoll: Interrupted system call got clone command failed unexpectedly test failed; leaving test data in /tmp/got-test-send_all_branches-K5864CWAUk test_send_to_empty_repo ok test_send_and_fetch_config ok test_send_config got: ppoll: Interrupted system call got clone command failed unexpectedly test failed; leaving test data in /tmp/got-test-send_fetch_conf-cfJGaTfVV3 test_send_rejected got: ppoll: Interrupted system call got clone command failed unexpectedly test failed; leaving test data in /tmp/got-test-send_rejected-lEwmHRd9Zn $ ----------------------------------------------- catch sigchld diff 0ec0e499ac07b3fa388b786eb0cb489bd810df26 d40e5178f3f8ef93a417347f667cf79bdc189f2d commit - 0ec0e499ac07b3fa388b786eb0cb489bd810df26 commit + d40e5178f3f8ef93a417347f667cf79bdc189f2d blob - 73812ddbb0f2e3a0fee927fc0ae5b873c04e385a blob + b87cb9ac7d724604ca27622821e0ebd0ba2af4f2 --- got/got.c +++ got/got.c @@ -70,6 +70,7 @@ static volatile sig_atomic_t sigint_received; static volatile sig_atomic_t sigpipe_received; +static volatile sig_atomic_t sigchld_received; static void catch_sigint(int signo) @@ -83,6 +84,11 @@ catch_sigpipe(int signo) sigpipe_received = 1; } +static void +catch_sigchld(int signo) +{ + sigchld_received = 1; +} struct got_cmd { const char *cmd_name; @@ -246,6 +252,7 @@ main(int argc, char *argv[]) signal(SIGINT, catch_sigint); signal(SIGPIPE, catch_sigpipe); + signal(SIGCHLD, catch_sigchld); for (i = 0; i < nitems(got_commands); i++) { const struct got_error *error; @@ -9537,6 +9544,7 @@ struct got_send_progress_arg { int printed_something; int sent_something; struct got_pathlist_head *delete_branches; + pid_t *sendpid; }; static const struct got_error * @@ -9552,7 +9560,19 @@ send_progress(void *arg, int ncolored, int nfound, int int print_colored = 0, print_found = 0, print_trees = 0; int print_searching = 0, print_total = 0; int print_deltify = 0, print_written = 0, print_sent = 0; + int sendstatus; + if (sigchld_received) { + sigchld_received = 0; + if (waitpid(*a->sendpid, &sendstatus, WNOHANG) == -1) + return got_error_from_errno("waitpid"); + if (WIFEXITED(sendstatus)) { + *a->sendpid = -1; + return got_error_msg(GOT_ERR_CANCELLED, + "server unexpectedly closed the connection"); + } + } + if (a->verbosity < 0) return NULL; @@ -10030,6 +10050,7 @@ cmd_send(int argc, char *argv[]) spa.last_p_written = -1; spa.verbosity = verbosity; spa.delete_branches = &delete_branches; + spa.sendpid = &sendpid; error = got_send_pack(remote_name, &branches, &tags, &delete_branches, verbosity, overwrite_refs, sendfd, repo, send_progress, &spa, check_cancelled, NULL);
make 'got send' detect closed connections