From: Stefan Sperling <stsp@stsp.name>
Subject: Re: Loss of original errors
To: Christian Weisgerber <naddy@mips.inka.de>
Cc: gameoftrees@openbsd.org
Date: Wed, 12 Jul 2023 09:47:14 +0200

On Tue, Jul 11, 2023 at 08:22:20PM +0200, Christian Weisgerber wrote:
> There are a few places in got(1) where, if an error occurs, a
> clean-up action is executed, whose potential error is ignored, and
> the original error is returned.
> 
> Or at least that's the intent.  In practice, the original error can
> be lost because the custom_errors[] ring buffer is too small.
> 
> Specifically, there are three instances in cmd_histedit():
> 
>         error = histedit_...(...);
>         if (error) {
>                 got_worktree_histedit_abort(...);
>                 print_merge_progress_stats(...);
>                 goto done;
>         }
> 
> Here got_worktree_histedit_abort() creates a number of ignored
> errors that overrun the ring buffer.  I first tried increasing the
> buffer size, but a mkdir(2) is run on every directory in the worktree,
> filling the buffer with ignored EEXIST errors, so that doesn't
> scale.
> 
> There is similar-ish code in cmd_integrate() with calls to
> got_worktree_integrate_abort(), but that routine probably doesn't
> do enough work to cause problems.
> 
> The obvious solution would be to save *error into a variable before
> calling the clean-up action, but I haven't found a precedent in the
> code, and it can't be a local variable...
> 
> Suggestions?

I don't have good suggestions to solve this fundamental issue
in the error handling design.

The current design avoids allocation on error, and it avoids
pointers between error objects to form chains, as they exist in
the SVN error system which inspired the error system used by Got.
I would really like to get away without extra allocations and pointers.

The downside are scaling issues like this. We can try being more
careful about the way we are using the system. That will hopefully
be good enough.

In this particular case we can stop sending EEXIST up the call chain.
It is not really an error condition since the directory the caller wants
to create already exists. I only found one caller that relies on seeing
EEXIST passed up. That bit of code could easily be convered to expect
success instead.

Does this help? Tests are still passing.


diff /home/stsp/src/got
commit - ad6dd0bb6c5ebeafaa57204a04330df6658c3861
path + /home/stsp/src/got
blob - a43c93bf01b5cb12693933595ad320101db7fc5b
file + lib/path.c
--- lib/path.c
+++ lib/path.c
@@ -329,7 +329,7 @@ got_path_mkdir(const char *abspath)
 				goto done;
 			if (mkdir(abspath, GOT_DEFAULT_DIR_MODE) == -1)
 				err = got_error_from_errno2("mkdir", abspath);
-		} else
+		} else if (errno != EEXIST)
 			err = got_error_from_errno2("mkdir", abspath);
 	}
 
blob - 20e319355137e3ec594f1a28760412c19ab58905
file + lib/worktree.c
--- lib/worktree.c
+++ lib/worktree.c
@@ -396,21 +396,22 @@ add_dir_on_disk(struct got_worktree *worktree, const c
 {
 	const struct got_error *err = NULL;
 	char *abspath;
+	struct stat sb;
 
 	if (asprintf(&abspath, "%s/%s", worktree->root_path, path) == -1)
 		return got_error_from_errno("asprintf");
 
 	err = got_path_mkdir(abspath);
-	if (err && err->code == GOT_ERR_ERRNO && errno == EEXIST) {
-		struct stat sb;
-		err = NULL;
-		if (lstat(abspath, &sb) == -1) {
-			err = got_error_from_errno2("lstat", abspath);
-		} else if (!S_ISDIR(sb.st_mode)) {
-			/* TODO directory is obstructed; do something */
-			err = got_error_path(abspath, GOT_ERR_FILE_OBSTRUCTED);
-		}
+	if (err)
+		goto done;
+
+	if (lstat(abspath, &sb) == -1) {
+		err = got_error_from_errno2("lstat", abspath);
+	} else if (!S_ISDIR(sb.st_mode)) {
+		/* TODO directory is obstructed; do something */
+		err = got_error_path(abspath, GOT_ERR_FILE_OBSTRUCTED);
 	}
+done:
 	free(abspath);
 	return err;
 }