From: Stefan Sperling Subject: fix gotd notifications race To: gameoftrees@openbsd.org Date: Wed, 4 Jun 2025 11:38:43 +0200 For some time now I have observed that gotd's notify process exits on got.gameoftrees.org whenever a new release gets tagged. The problem is triggered by creating a new commit which changes the main branch and tagging this commit, then sending both the commit and the tag in the same invocation of 'got send' (e.g. got send -T, or got send -t 0.112). I have reproduced this issue locally using got send -T, which causes ref-updates for all tags (questionable if they haven't actually changed, but that is a separate problem), and a lot of notifications. The errors shown by gotd are as follows: session_write /tmp/test.git: updating ref refs/tags/0.1 for uid 1002 session_write /tmp/test.git: ref-update received session_write /tmp/test.git: update-ref from uid 1002 session_write /tmp/test.git: updating ref refs/heads/main for uid 1002 session_write /tmp/test.git: uid 1002: received unexpected privsep message session_write /tmp/test.git: uid 1002: received unexpected privsep message session_write /tmp/test.git: uid 1002: received unexpected privsep message session_write /tmp/test.git: uid 1002: received unexpected privsep message session_write /tmp/test.git: uid 1002: received unexpected privsep message session_write /tmp/test.git: uid 1002: received unexpected privsep message session_write /tmp/test.git: uid 1002: received unexpected privsep message session_write /tmp/test.git: uid 1002: received unexpected privsep message session_write /tmp/test.git: uid 1002: received unexpected privsep message And the notify process exits with an "unexpected EOF" error. The patch below fixes these "unexpected privsep message" errors, and the notify process stays alive as expected. I used gotd's regression test http-server to receive http notifications. Unfortunately, this server is only able to process a single request before stopping itself. So I could only confirm that the first notification was actually sent. And http-server's behaviour makes it difficult to write a test case for this issue. I will send more about this in a follow-up message. ok? fix a race in gotd notification processing Dequeue notifications from the list only once the notify process has confirmed that the notification has been sent. This should be more robust for the following reason: In some spots were are checking the list head against NULL to see if any notifications are still pending. By removing notifications from this list too early, this check did not cover notifications which were still in flight. We could thus end up deciding to shut down the session process too early, and the last notification of several would fail. Issue observed when sending a new commit on a branch and a new tag which tags this commit to gotd in a single 'got send' operation. Only the tag notification would make it, while the other notification never arrived after an 'unexpected EOF' error in the notify process while trying to send a confirmation to the session process. M gotd/session_write.c | 41+ 23- 1 file changed, 41 insertions(+), 23 deletions(-) commit - d907758cbe632fc02ea32fa9363f1b427ed21c32 commit + 77ebbf498677a798fd1077376cb02a3bfeb7cf88 blob - 2b2318d81f74d2c1bfb5dd384acc1b61b4d31c91 blob + 6a93d428b645fa7449b875471e7683b83ab8a090 --- gotd/session_write.c +++ gotd/session_write.c @@ -662,8 +662,6 @@ forward_notification(struct gotd_session_client *clien if (notif == NULL) return got_error(GOT_ERR_PRIVSEP_MSG); - STAILQ_REMOVE(¬ifications, notif, gotd_session_notif, entry); - if (notif->action != icontent.action || notif->fd == -1 || strcmp(notif->refname, refname) != 0) { err = got_error(GOT_ERR_PRIVSEP_MSG); @@ -746,9 +744,6 @@ forward_notification(struct gotd_session_client *clien imsg_close(&iev->ibuf, wbuf); gotd_imsg_event_add(iev); done: - if (notif->fd != -1) - close(notif->fd); - free(notif); free(refname); free(id_str); return err; @@ -984,6 +979,33 @@ recv_notification_content(struct imsg *imsg) return NULL; } +static const struct got_error * +report_ref_updates(int do_notify) +{ + const struct got_error *err; + struct gotd_session_notif *notif; + + err = send_refs_updated(&gotd_session.parent_iev); + if (err) { + log_warn("%s", err->msg); + return err; + } + + if (do_notify) { + notif = STAILQ_FIRST(¬ifications); + if (notif == NULL) + return NULL; + + err = request_notification(notif); + if (err) { + log_warn("could not send notification: %s", err->msg); + return err; + } + } + + return NULL; +} + static void session_dispatch_repo_child(int fd, short event, void *arg) { @@ -1086,8 +1108,6 @@ session_dispatch_repo_child(int fd, short event, void else disconnect(client); } else { - struct gotd_session_notif *notif; - if (do_packfile_verification) { err = verify_packfile(iev); } else if (do_content_verification) { @@ -1110,27 +1130,17 @@ session_dispatch_repo_child(int fd, short event, void if (do_ref_update && client->nref_updates > 0) { client->nref_updates--; if (client->nref_updates == 0) { - err = send_refs_updated( - &gotd_session.parent_iev); - if (err) { - log_warn("%s", err->msg); + err = report_ref_updates(do_notify); + if (err) shut = 1; - } } } - - notif = STAILQ_FIRST(¬ifications); - if (notif && do_notify) { - /* Request content for next notification. */ - err = request_notification(notif); - if (err) { - log_warn("could not send notification: " - "%s", err->msg); - shut = 1; - } - } } + imsg_free(&imsg); + + if (err) + break; } done: if (!shut) { @@ -1668,7 +1678,15 @@ session_dispatch_notifier(int fd, short event, void *a log_warn("unexpected imsg %d", imsg.hdr.type); break; } + + /* First notification on our list has now been sent. */ notif = STAILQ_FIRST(¬ifications); + STAILQ_REMOVE_HEAD(¬ifications, entry); + if (notif->fd != -1) + close(notif->fd); + free(notif); + + notif = STAILQ_FIRST(¬ifications); if (notif == NULL) { disconnect(client); break; /* NOTREACHED */