"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Stefan Sperling <stsp@stsp.name>
Subject:
fix gotd notifications race
To:
gameoftrees@openbsd.org
Date:
Wed, 4 Jun 2025 11:38:43 +0200

Download raw body.

Thread
For some time now I have observed that gotd's notify process exits
on got.gameoftrees.org whenever a new release gets tagged.

The problem is triggered by creating a new commit which changes the
main branch and tagging this commit, then sending both the commit
and the tag in the same invocation of 'got send' (e.g. got send -T,
or got send -t 0.112).

I have reproduced this issue locally using got send -T, which causes
ref-updates for all tags (questionable if they haven't actually
changed, but that is a separate problem), and a lot of notifications.

The errors shown by gotd are as follows:

session_write /tmp/test.git: updating ref refs/tags/0.1 for uid 1002
session_write /tmp/test.git: ref-update received
session_write /tmp/test.git: update-ref from uid 1002
session_write /tmp/test.git: updating ref refs/heads/main for uid 1002
session_write /tmp/test.git: uid 1002: received unexpected privsep message
session_write /tmp/test.git: uid 1002: received unexpected privsep message
session_write /tmp/test.git: uid 1002: received unexpected privsep message
session_write /tmp/test.git: uid 1002: received unexpected privsep message
session_write /tmp/test.git: uid 1002: received unexpected privsep message
session_write /tmp/test.git: uid 1002: received unexpected privsep message
session_write /tmp/test.git: uid 1002: received unexpected privsep message
session_write /tmp/test.git: uid 1002: received unexpected privsep message
session_write /tmp/test.git: uid 1002: received unexpected privsep message

And the notify process exits with an "unexpected EOF" error.

The patch below fixes these "unexpected privsep message" errors,
and the notify process stays alive as expected.

I used gotd's regression test http-server to receive http notifications.
Unfortunately, this server is only able to process a single request
before stopping itself. So I could only confirm that the first notification
was actually sent.
And http-server's behaviour makes it difficult to write a test case for
this issue. I will send more about this in a follow-up message.


ok?

fix a race in gotd notification processing

Dequeue notifications from the list only once the notify process has
confirmed that the notification has been sent. This should be more
robust for the following reason:

In some spots were are checking the list head against NULL to see if
any notifications are still pending. By removing notifications from
this list too early, this check did not cover notifications which were
still in flight. We could thus end up deciding to shut down the session
process too early, and the last notification of several would fail.

Issue observed when sending a new commit on a branch and a new tag
which tags this commit to gotd in a single 'got send' operation.
Only the tag notification would make it, while the other notification
never arrived after an 'unexpected EOF' error in the notify process
while trying to send a confirmation to the session process.

M  gotd/session_write.c  |  41+  23-

1 file changed, 41 insertions(+), 23 deletions(-)

commit - d907758cbe632fc02ea32fa9363f1b427ed21c32
commit + 77ebbf498677a798fd1077376cb02a3bfeb7cf88
blob - 2b2318d81f74d2c1bfb5dd384acc1b61b4d31c91
blob + 6a93d428b645fa7449b875471e7683b83ab8a090
--- gotd/session_write.c
+++ gotd/session_write.c
@@ -662,8 +662,6 @@ forward_notification(struct gotd_session_client *clien
 	if (notif == NULL)
 		return got_error(GOT_ERR_PRIVSEP_MSG);
 
-	STAILQ_REMOVE(&notifications, notif, gotd_session_notif, entry);
-
 	if (notif->action != icontent.action || notif->fd == -1 ||
 	    strcmp(notif->refname, refname) != 0) {
 		err = got_error(GOT_ERR_PRIVSEP_MSG);
@@ -746,9 +744,6 @@ forward_notification(struct gotd_session_client *clien
 	imsg_close(&iev->ibuf, wbuf);
 	gotd_imsg_event_add(iev);
 done:
-	if (notif->fd != -1)
-		close(notif->fd);
-	free(notif);
 	free(refname);
 	free(id_str);
 	return err;
@@ -984,6 +979,33 @@ recv_notification_content(struct imsg *imsg)
 	return NULL;
 }
 
+static const struct got_error *
+report_ref_updates(int do_notify)
+{
+	const struct got_error *err;
+	struct gotd_session_notif *notif;
+
+	err = send_refs_updated(&gotd_session.parent_iev);
+	if (err) {
+		log_warn("%s", err->msg);
+		return err;
+	}
+
+	if (do_notify) {
+		notif = STAILQ_FIRST(&notifications);
+		if (notif == NULL)
+			return NULL;
+
+		err = request_notification(notif);
+		if (err) {
+			log_warn("could not send notification: %s", err->msg);
+			return err;
+		}
+	}
+
+	return NULL;
+}
+
 static void
 session_dispatch_repo_child(int fd, short event, void *arg)
 {
@@ -1086,8 +1108,6 @@ session_dispatch_repo_child(int fd, short event, void 
 			else
 				disconnect(client);
 		} else {
-			struct gotd_session_notif *notif;
-
 			if (do_packfile_verification) {
 				err = verify_packfile(iev);
 			} else if (do_content_verification) {
@@ -1110,27 +1130,17 @@ session_dispatch_repo_child(int fd, short event, void 
 			if (do_ref_update && client->nref_updates > 0) {
 				client->nref_updates--;
 				if (client->nref_updates == 0) {
-					err = send_refs_updated(
-					    &gotd_session.parent_iev);
-					if (err) {
-						log_warn("%s", err->msg);
+					err = report_ref_updates(do_notify);
+					if (err)
 						shut = 1;
-					}
 				}
 			}
-
-			notif = STAILQ_FIRST(&notifications);
-			if (notif && do_notify) {
-				/* Request content for next notification. */
-				err = request_notification(notif);
-				if (err) {
-					log_warn("could not send notification: "
-					    "%s", err->msg);
-					shut = 1;
-				}
-			}
 		}
+
 		imsg_free(&imsg);
+
+		if (err)
+			break;
 	}
 done:
 	if (!shut) {
@@ -1668,7 +1678,15 @@ session_dispatch_notifier(int fd, short event, void *a
 				log_warn("unexpected imsg %d", imsg.hdr.type);
 				break;
 			}
+
+			/* First notification on our list has now been sent. */
 			notif = STAILQ_FIRST(&notifications);
+			STAILQ_REMOVE_HEAD(&notifications, entry);
+			if (notif->fd != -1)
+				close(notif->fd);
+			free(notif);
+
+			notif = STAILQ_FIRST(&notifications);
 			if (notif == NULL) {
 				disconnect(client);
 				break; /* NOTREACHED */