From: Stefan Sperling Subject: Re: gotd: wait asynchronously for child termination To: Omar Polo Cc: gameoftrees@openbsd.org Date: Thu, 22 Jun 2023 17:13:38 +0200 On Thu, Jun 22, 2023 at 04:53:14PM +0200, Omar Polo wrote: > diff /home/op/w/got > commit - 9b9bad55e7a411af3e01d9e4a86c127876184726 > path + /home/op/w/got > blob - 39b1d2a54632934de2bfaf127e20f0b0f725f4c8 > file + gotd/gotd.c > --- gotd/gotd.c > +++ gotd/gotd.c > @@ -80,7 +80,11 @@ struct gotd_child_proc { > char repo_path[PATH_MAX]; > int pipe[2]; > struct gotd_imsgev iev; > + struct event tmo; > + > + TAILQ_ENTRY(gotd_child_proc) procs; By convention used so far, the above variable should be called 'entry'. > -static void > proc_done(struct gotd_child_proc *proc) > { > - event_del(&proc->iev.ev); > - msgbuf_clear(&proc->iev.ibuf.w); > - close(proc->iev.ibuf.fd); > - kill_proc(proc, 0); > - wait_for_child(proc->pid); > + struct gotd_client *clt; Could you rename 'clt' to 'client' as used elsewhere? > + > + TAILQ_REMOVE(&procs, proc, procs); > + > + clt = find_client_by_proc_fd(proc->iev.ibuf.fd); > + if (clt != NULL) { > + if (proc == clt->repo) > + clt->repo = NULL; > + if (proc == clt->auth) > + clt->auth = NULL; > + if (proc == clt->session) > + clt->session = NULL; > + disconnect_on_error(clt, got_error(GOT_ERR_PRIVSEP_DIED)); I believe this error might arrive at the client user's terminal, and they would see "unprivileged process died unexpectedly", even in the code path where the child process terminates normally. I would suggest to either use GOT_ERR_EOF which will be logged but not be sent to the user, or simply call disconnect() directly. > @@ -745,6 +735,20 @@ kill_proc(struct gotd_child_proc *proc, int fatal) > static void > kill_proc(struct gotd_child_proc *proc, int fatal) > { > + struct timeval tv = { 5, 0 }; > + > + log_debug("kill -%d %d", fatal ? SIGKILL : SIGTERM, proc->pid); > + > + if (proc->iev.ibuf.fd != -1) { > + event_del(&proc->iev.ev); > + msgbuf_clear(&proc->iev.ibuf.w); > + close(proc->iev.ibuf.fd); > + proc->iev.ibuf.fd = -1; > + } > + > + if (!evtimer_pending(&proc->tmo, NULL)) > + evtimer_add(&proc->tmo, &tv); > + The above check needs && !fatal, otherwise we will schedule the timeout again after it has already fired to trigger kill -9. > static void > +kill_proc_timeout(int fd, short ev, void *d) > +{ > + struct gotd_child_proc *proc = d; > + > + log_warnx("wait timeout for PID %d terminated", proc->pid); > + kill_proc(proc, 1); I would rephrase the above as something like: "timeout waiting for PID %d to terminate; retrying with force" > +static struct gotd_child_proc * > +find_proc_by_pid(pid_t pid) > +{ > + struct gotd_child_proc *proc; Must initialize to NULL above, otherwise we return garbage below if no PID matches. > + > + TAILQ_FOREACH(proc, &procs, procs) > + if (proc->pid == pid) > + break; > + > + return proc; > +} > @@ -1508,6 +1560,13 @@ start_listener(char *argv0, const char *confpath, int > if (proc == NULL) > fatal("calloc"); > > + TAILQ_INSERT_HEAD(&procs, proc, procs); > + > + /* > + * XXX start_listener is called before event_init() so can't > + * initialize proc->tmo here. > + */ > + > proc->type = PROC_LISTEN; I would remove the XXX, I suspect that not calling event_init() before starting the listener was deliberate. At least I vaguely recall having problems with forking and event_init(). Perhaps mention that the timeout will be initialized in main() instead?