Download raw body.
gotd: Don't use WNOHANG when waiting for child processes
On Sat, Mar 04, 2023 at 11:58:46AM +0100, Omar Polo wrote: > On 2023/03/03 09:35:59 -0500, Josiah Frentsos <jfrent@tilde.team> wrote: > > Fixes this: > > > > 2023-03-03T13:51:15.155Z silicon gotd[7715]: child PID 0 terminated; signal 12 > > 2023-03-03T13:51:15.186Z silicon last message repeated 419 times > > I don't think the diff is wrong per-se, wait_for_child is just a > wrapper around waitpid and we're interested in one child only anyway, > so I *think* WNOHANG is not much important. I'm curious why it's > needed though. > > waitpid(WNOHANG) returns 0 when "there are no stopped or exited > children". Since we call it always after kill_proc() maybe it's a > timing issue: we call waitpid(WNOHANG) before the signal is handled > and the other process exits? Given that wait_for_child() runs in libevent handlers (via client_disconnect) I don't think we want to call waitpid() in blocking mode. We should stay in control over how long we are going to wait for the child to exit. Otherwise, if waitpid() never returns for some reason (for example, because the child is spinning on the CPU and doesn't exit) then all of gotd will hang because further events will not be processed. The diff below gives up after 5 seconds. Good enough? Too long? I did consider retrying with kill -9 upon timeout. But for now I would rather be able to attach gdb to a child that doesn't terminate to figure out why. diff /home/stsp/src/got commit - 9628f36dac5ed5319b482a020f06ff9737a0c1f0 path + /home/stsp/src/got blob - f15597dbd0e7299cbf018874f9d81d6c8dcaf403 file + gotd/gotd.c --- gotd/gotd.c +++ gotd/gotd.c @@ -271,15 +271,23 @@ wait_for_child(pid_t child_pid) { pid_t pid; int status; + int tries = 5; - log_debug("waiting for child PID %ld to terminate", - (long)child_pid); - do { + log_debug("waiting for child PID %ld to terminate", + (long)child_pid); + pid = waitpid(child_pid, &status, WNOHANG); if (pid == -1) { if (errno != EINTR && errno != ECHILD) fatal("wait"); + } else if (pid == 0) { + if (--tries <= 0) { + log_warnx("child PID %ld did not terminate", + (long)child_pid); + break; + } + sleep(1); } else if (WIFSIGNALED(status)) { log_warnx("child PID %ld terminated; signal %d", (long)pid, WTERMSIG(status));
gotd: Don't use WNOHANG when waiting for child processes