From: Stefan Sperling Subject: Re: make 'got send' detect closed connections To: Omar Polo Cc: gameoftrees@openbsd.org Date: Wed, 13 Sep 2023 22:19:13 +0200 On Wed, Sep 13, 2023 at 08:08:42PM +0200, Omar Polo wrote: > On 2023/09/13 12:28:53 +0200, Stefan Sperling wrote: > > On Wed, Sep 13, 2023 at 12:18:12AM +0200, Omar Polo wrote: > > > On 2023/09/12 21:17:38 +0200, Stefan Sperling wrote: > > The idea certainly works (see diff below): > > > > $ got send -b qwx gothub > > Connecting to "gothub" ssh://stsp@gothub.org/src.git > > Host key fingerprint is SHA256:HT7ohAIc4knlmUm3vFUyyK9/DW/sFccZuPG0oNwBWYA > > 15671 commits coloredgotsh: operation timed out > > > > got: server unexpectedly closed the connection > > $ > > > > However, the patch as I wrote it has unintended side effects. > > For some reason I am seeing sporadic errors during test runs once > > we start catching sigchld. > > There are two chldren involved (got-send-pack and ssh), which might > > be relatad to this issue. > > Yeah, I forgot that we have other children other than ssh... sorry. > > It starts to get tricky... > > > I don't have time right now to dig into how we could avoid this. > > > > $ ./send.sh > > test_send_basic got: ppoll: Interrupted system call > > got clone command failed unexpectedly > > test failed; leaving test data in /tmp/got-test-send_basic-uyH6MHTmyu > > test_send_rebase_required ok > > test_send_rebase_required_overwrite got: ppoll: Interrupted system call > > This is SIGCHLD interrupting ppoll() in lib/pollfd.c... > > We'd need something like this on top of your diff but then I'm not > sure. Your initial diff was more self-contained. > > If you really want to stop the coloring early then your original diff > is ok op@. I remembered that the progress callback is ratelimited, so > there's no need to worry about issuing too many poll()s. Thanks! Here is a working diff using SIGCHLD. It contains your pollfd fix and also switches from ERR_CANCELLED to ERR_EOF such that we actually show the error message. (My original draft was already using ERR_EOF but I sent out a version where I had changed this code for some reason.) There were subtle test failures from test_send_delete and test_send_config. In these tests ssh exits early under normal conditions. I have added a check that prevents dead connection detection for pack files that contain no objects. This seems fine since we will never spend a huge amount of time on an empty pack file so there's no need to detect broken connections. In theory we could use sigaction(2) to avoid getting sigchld when a child is stopped rather than exiting. We could ten also use siginfo to find the PID the signal was delivered for, ignoring CHLD signals that are sent for got-send-pack rather than ssh. However, the OpenBSD kernel does not implement siginfo_t for SIGCHLD so all of this would be rather pointless at present. And I don't know how well sigaction would work in -portable. ----------------------------------------------- make 'got send' detect dead ssh connections In case the ssh process dies, stop immediately instead of generating a pack file and uploading it into the void. diff fa9997e790c81002782c0bb2747fa2050576dbb3 971e9c458f41642dc6536996119a20741a2d8e35 commit - fa9997e790c81002782c0bb2747fa2050576dbb3 commit + 971e9c458f41642dc6536996119a20741a2d8e35 blob - 73812ddbb0f2e3a0fee927fc0ae5b873c04e385a blob + 590c12f1e0ba86dee29097b453ac6ea0a789cc4e --- got/got.c +++ got/got.c @@ -70,6 +70,7 @@ static volatile sig_atomic_t sigint_received; static volatile sig_atomic_t sigpipe_received; +static volatile sig_atomic_t sigchld_received; static void catch_sigint(int signo) @@ -83,6 +84,11 @@ catch_sigpipe(int signo) sigpipe_received = 1; } +static void +catch_sigchld(int signo) +{ + sigchld_received = 1; +} struct got_cmd { const char *cmd_name; @@ -246,6 +252,7 @@ main(int argc, char *argv[]) signal(SIGINT, catch_sigint); signal(SIGPIPE, catch_sigpipe); + signal(SIGCHLD, catch_sigchld); for (i = 0; i < nitems(got_commands); i++) { const struct got_error *error; @@ -9537,6 +9544,7 @@ struct got_send_progress_arg { int printed_something; int sent_something; struct got_pathlist_head *delete_branches; + pid_t *sendpid; }; static const struct got_error * @@ -9552,7 +9560,21 @@ send_progress(void *arg, int ncolored, int nfound, int int print_colored = 0, print_found = 0, print_trees = 0; int print_searching = 0, print_total = 0; int print_deltify = 0, print_written = 0, print_sent = 0; + int sendstatus; + if (sigchld_received) { + sigchld_received = 0; + if (nobj_total > 0) { + if (waitpid(*a->sendpid, &sendstatus, WNOHANG) == -1) + return got_error_from_errno("waitpid"); + if (WIFEXITED(sendstatus)) { + *a->sendpid = -1; + return got_error_msg(GOT_ERR_EOF, "server " + "unexpectedly closed the connection"); + } + } + } + if (a->verbosity < 0) return NULL; @@ -10030,6 +10052,7 @@ cmd_send(int argc, char *argv[]) spa.last_p_written = -1; spa.verbosity = verbosity; spa.delete_branches = &delete_branches; + spa.sendpid = &sendpid; error = got_send_pack(remote_name, &branches, &tags, &delete_branches, verbosity, overwrite_refs, sendfd, repo, send_progress, &spa, check_cancelled, NULL); blob - 32efcd7f1a512fe6f1444aaba4ff7e8aa6d051b1 blob + c99cc04285b05fd46dad35c51ef3ac7896672ad4 --- lib/pollfd.c +++ lib/pollfd.c @@ -43,6 +43,8 @@ got_poll_fd(int fd, int events, int timeout) return got_error_from_errno("sigemptyset"); if (sigaddset(&sigset, SIGWINCH) == -1) return got_error_from_errno("sigaddset"); + if (sigaddset(&sigset, SIGCHLD) == -1) + return got_error_from_errno("sigaddset"); n = ppoll(pfd, 1, timeout == INFTIM ? NULL : &ts, &sigset); if (n == -1)