Download raw body.
make 'got send' detect closed connections
On Wed, Sep 13, 2023 at 08:08:42PM +0200, Omar Polo wrote: > On 2023/09/13 12:28:53 +0200, Stefan Sperling <stsp@stsp.name> wrote:
> > On Wed, Sep 13, 2023 at 12:18:12AM +0200, Omar Polo wrote:
> > > On 2023/09/12 21:17:38 +0200, Stefan Sperling <stsp@stsp.name> wrote:
> > The idea certainly works (see diff below):
> >
> > $ got send -b qwx gothub
> > Connecting to "gothub" ssh://stsp@gothub.org/src.git
> > Host key fingerprint is SHA256:HT7ohAIc4knlmUm3vFUyyK9/DW/sFccZuPG0oNwBWYA
> > 15671 commits coloredgotsh: operation timed out
> >
> > got: server unexpectedly closed the connection
> > $
> >
> > However, the patch as I wrote it has unintended side effects.
> > For some reason I am seeing sporadic errors during test runs once
> > we start catching sigchld.
> > There are two chldren involved (got-send-pack and ssh), which might
> > be relatad to this issue.
>
> Yeah, I forgot that we have other children other than ssh... sorry.
>
> It starts to get tricky...
>
> > I don't have time right now to dig into how we could avoid this.
> >
> > $ ./send.sh
> > test_send_basic got: ppoll: Interrupted system call
> > got clone command failed unexpectedly
> > test failed; leaving test data in /tmp/got-test-send_basic-uyH6MHTmyu
> > test_send_rebase_required ok
> > test_send_rebase_required_overwrite got: ppoll: Interrupted system call
>
> This is SIGCHLD interrupting ppoll() in lib/pollfd.c...
>
> We'd need something like this on top of your diff but then I'm not
> sure. Your initial diff was more self-contained.
>
> If you really want to stop the coloring early then your original diff
> is ok op@. I remembered that the progress callback is ratelimited, so
> there's no need to worry about issuing too many poll()s.
Thanks!
Here is a working diff using SIGCHLD. It contains your pollfd fix
and also switches from ERR_CANCELLED to ERR_EOF such that we actually
show the error message. (My original draft was already using ERR_EOF
but I sent out a version where I had changed this code for some reason.)
There were subtle test failures from test_send_delete and test_send_config.
In these tests ssh exits early under normal conditions. I have added a
check that prevents dead connection detection for pack files that contain
no objects. This seems fine since we will never spend a huge amount of
time on an empty pack file so there's no need to detect broken connections.
In theory we could use sigaction(2) to avoid getting sigchld when a
child is stopped rather than exiting. We could ten also use siginfo to
find the PID the signal was delivered for, ignoring CHLD signals that
are sent for got-send-pack rather than ssh.
However, the OpenBSD kernel does not implement siginfo_t for SIGCHLD so
all of this would be rather pointless at present. And I don't know how
well sigaction would work in -portable.
-----------------------------------------------
make 'got send' detect dead ssh connections
In case the ssh process dies, stop immediately instead of generating a
pack file and uploading it into the void.
diff fa9997e790c81002782c0bb2747fa2050576dbb3 971e9c458f41642dc6536996119a20741a2d8e35
commit - fa9997e790c81002782c0bb2747fa2050576dbb3
commit + 971e9c458f41642dc6536996119a20741a2d8e35
blob - 73812ddbb0f2e3a0fee927fc0ae5b873c04e385a
blob + 590c12f1e0ba86dee29097b453ac6ea0a789cc4e
--- got/got.c
+++ got/got.c
@@ -70,6 +70,7 @@
static volatile sig_atomic_t sigint_received;
static volatile sig_atomic_t sigpipe_received;
+static volatile sig_atomic_t sigchld_received;
static void
catch_sigint(int signo)
@@ -83,6 +84,11 @@ catch_sigpipe(int signo)
sigpipe_received = 1;
}
+static void
+catch_sigchld(int signo)
+{
+ sigchld_received = 1;
+}
struct got_cmd {
const char *cmd_name;
@@ -246,6 +252,7 @@ main(int argc, char *argv[])
signal(SIGINT, catch_sigint);
signal(SIGPIPE, catch_sigpipe);
+ signal(SIGCHLD, catch_sigchld);
for (i = 0; i < nitems(got_commands); i++) {
const struct got_error *error;
@@ -9537,6 +9544,7 @@ struct got_send_progress_arg {
int printed_something;
int sent_something;
struct got_pathlist_head *delete_branches;
+ pid_t *sendpid;
};
static const struct got_error *
@@ -9552,7 +9560,21 @@ send_progress(void *arg, int ncolored, int nfound, int
int print_colored = 0, print_found = 0, print_trees = 0;
int print_searching = 0, print_total = 0;
int print_deltify = 0, print_written = 0, print_sent = 0;
+ int sendstatus;
+ if (sigchld_received) {
+ sigchld_received = 0;
+ if (nobj_total > 0) {
+ if (waitpid(*a->sendpid, &sendstatus, WNOHANG) == -1)
+ return got_error_from_errno("waitpid");
+ if (WIFEXITED(sendstatus)) {
+ *a->sendpid = -1;
+ return got_error_msg(GOT_ERR_EOF, "server "
+ "unexpectedly closed the connection");
+ }
+ }
+ }
+
if (a->verbosity < 0)
return NULL;
@@ -10030,6 +10052,7 @@ cmd_send(int argc, char *argv[])
spa.last_p_written = -1;
spa.verbosity = verbosity;
spa.delete_branches = &delete_branches;
+ spa.sendpid = &sendpid;
error = got_send_pack(remote_name, &branches, &tags, &delete_branches,
verbosity, overwrite_refs, sendfd, repo, send_progress, &spa,
check_cancelled, NULL);
blob - 32efcd7f1a512fe6f1444aaba4ff7e8aa6d051b1
blob + c99cc04285b05fd46dad35c51ef3ac7896672ad4
--- lib/pollfd.c
+++ lib/pollfd.c
@@ -43,6 +43,8 @@ got_poll_fd(int fd, int events, int timeout)
return got_error_from_errno("sigemptyset");
if (sigaddset(&sigset, SIGWINCH) == -1)
return got_error_from_errno("sigaddset");
+ if (sigaddset(&sigset, SIGCHLD) == -1)
+ return got_error_from_errno("sigaddset");
n = ppoll(pfd, 1, timeout == INFTIM ? NULL : &ts, &sigset);
if (n == -1)
make 'got send' detect closed connections