"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Omar Polo <op@omarpolo.com>
Subject:
Re: draft: keeping authorship of diffs in commits
To:
Stefan Sperling <stsp@stsp.name>
Cc:
gameoftrees@openbsd.org
Date:
Mon, 18 Jul 2022 15:48:28 +0200

Download raw body.

Thread
Stefan Sperling <stsp@stsp.name> wrote:
> On Sun, Feb 06, 2022 at 02:16:39PM +0100, Stefan Sperling wrote:
> > On Sun, Feb 06, 2022 at 11:20:55AM +0100, Omar Polo wrote:
> > > There's also another one issue I think: nomenclature.  got(1) mention
> > > various time the "commit author", the env variable is called GOT_AUTHOR
> > > and the got.conf setting is `author'.  Adding a "-A author" to got ci
> > > which does a subtly different thing can easily confuse people.
> > 
> > Yes, in hindsight this is bad, because it makes it harder to make
> > a clear distinction between committer and author. This happened
> > because I did not consider this distinction important enough, as
> > this is no problem with the way CVS is being used today. However,
> > expectations may change with a new repository format that is able
> > to store additional meta-data.
> > 
> > The good news is that we can change anything! At this stage, breaking
> > things in incompatible ways is absolutely fine. Anybody who is using
> > Got nowadays must be ready to adjust their setup to changes we make
> > at any moment. Release numbers start with a zero for a reason and I
> > do not expect this to change anytime soon.
> > 
> > Should we just rename all these things to "committer" or is there
> > a better path forward?
> 
> Going back to this: I tought about this some more and I don't think
> it is a big problem. Usually, the person who creates a new commit
> is the author, so the names we have for configuration items are fine.
> 
> We can add specific options to override the "author" in commit objects,
> and simply document that the default value for this comes from GOT_AUTHOR
> or the config file, and document that this default value will always
> be used for the "committer" field. This should be clear enough.

I thought a bit about it too and i've come to agree to this.  GOT_AUTHOR
is more clear than GOT_COMMITTER IMHO, and the case of "author !=
committer" can be considered an exception to the usual rule IMHO.

Below there's a rebased (and slightly improved) diff for got commit -A

> Your patch is a good start. It satisfies the use case of patch contributions.
> Of course, since this requires action on part of the committer, there will
> always be commit objects created from patches which lack attribution of the
> author in the commit object's author field, which means this "meta-data" will
> not be very reliable. Not much we can do about that.
> I don't really like the idea of complicating 'got patch' to the point where
> it tries to figure out author information from patch files. We would need to
> encourage people to save patches as an mbox file instead of saving only the
> body of an email, which is just as unlikely to work as consistently adding
> an option like -A to 'got commit' command lines.

One of my initial goal for 'got patch' was also for it to be able to
handle the common case of applying patches from email preserving the
author.  However, it kept falling down as priority for myself, to the
point of being completely dropped from my mental todo-list.  The more I
use got, the more I dislike how git implicitly creates commit for
various actions, and I'd like for 'git patch' to avoid doing so.  I
prefer doing a 'got patch', maybe tweak something, and then commit it.
(admittedly, I don't receive tons of patches to commit every day :-)

(maybe we could envision something where got patch put us in a situation
similar to 'edit' in a histedit script with the changes applied to the
worktree and a default commit message pre-filled on the next commit?
doesn't seem easy to do in an obvious way for the user though.)

> But because this would interop with output of git-format-patch, perhaps some
> people would use such a feature? If we added this to 'got patch' we should
> probably also extend 'got diff' to produce compatible output, at least as
> an option?

Maybe some sort of output format that's more compatible with git could
be useful.  not sure if we really need/want it, but it happened to me in
the past that I confused some upstream maintainers when sending patches
generated with `got diff'.

(in at least one case part of the issue was also the default behaviour
of GNU' patch i guess)

> Additionally, it would be nice if 'got rebase' and 'got histedit' would
> preserve the original commit's author field as "author", while the
> "committer" field would be reset to the value of GOT_AUTHOR or the
> config file. Once we do this, I think the distinction will become more
> or less self-evident for people who are familiar with these commands.

Don't remember if git does the same, but it seems intuitive and (for
what it's worth) i agree with it at least for `got histedit', not so
sure about rebase still.

 ---

Here's the old diff rebased and tweaked a bit.

diff /home/op/w/got
commit - d42bbff9e9ec0e6d9d64da5c56bbf0c7021bdc00
path + /home/op/w/got
blob - e40b2fea7364f818d1cc038e58b39c5e8b17968b
file + got/got.1
--- got/got.1
+++ got/got.1
@@ -1476,7 +1476,7 @@ will refuse to run if a specified
 is a directory.
 .El
 .Tg ci
-.It Cm commit Oo Fl F Ar path Oc Oo Fl m Ar message Oc Oo Fl N Oc Oo Fl S Oc Op Ar path ...
+.It Cm commit Oo Fl A Ar author Oc Oo Fl F Ar path Oc Oo Fl m Ar message Oc Oo Fl N Oc Oo Fl S Oc Op Ar path ...
 .Dl Pq alias: Cm ci
 Create a new commit in the repository from changes in a work tree
 and use this commit as the new base commit for the work tree.
@@ -1541,6 +1541,12 @@ The options for
 .Cm got commit
 are as follows:
 .Bl -tag -width Ds
+.It Fl A Ar author
+Specify that the commit originates from
+.Ar author .
+The user is still acknowledged as the
+.Dq committer
+of the changeset.
 .It Fl F Ar path
 Use the prepared log message stored in the file found at
 .Ar path
blob - b7c305dd8fba01e7b8b4d08eae548d3f7c53246c
file + got/got.c
--- got/got.c
+++ got/got.c
@@ -548,9 +548,11 @@ import_progress(void *arg, const char *path)
 	return NULL;
 }
 
-static int
+static const struct got_error *
 valid_author(const char *author)
 {
+	const char *email = author;
+
 	/*
 	 * Really dumb email address check; we're only doing this to
 	 * avoid git's object parser breaking on commits we create.
@@ -558,14 +560,19 @@ valid_author(const char *author)
 	while (*author && *author != '<')
 		author++;
 	if (*author != '<')
-		return 0;
+		goto err;
 	while (*author && *author != '@')
 		author++;
 	if (*author != '@')
-		return 0;
+		goto err;
 	while (*author && *author != '>')
 		author++;
-	return *author == '>';
+	if (*author != '>')
+		goto err;
+	return NULL;
+
+err:
+	return got_error_fmt(GOT_ERR_BAD_EMAIL, email);
 }
 
 static const struct got_error *
@@ -625,8 +632,8 @@ get_author(char **author, struct got_repository *repo,
 	if (*author == NULL)
 		return got_error_from_errno("strdup");
 
-	if (!valid_author(*author)) {
-		err = got_error_fmt(GOT_ERR_COMMIT_NO_EMAIL, "%s", *author);
+	err = valid_author(*author);
+	if (err != NULL) {
 		free(*author);
 		*author = NULL;
 	}
@@ -8315,8 +8322,8 @@ done:
 __dead static void
 usage_commit(void)
 {
-	fprintf(stderr, "usage: %s commit [-F path] [-m msg] [-N] [-S] "
-	    "[path ...]\n", getprogname());
+	fprintf(stderr, "usage: %s commit [-A author] [-F path] [-m msg] "
+	    "[-N] [-S] [path ...]\n", getprogname());
 	exit(1);
 }
 
@@ -8476,7 +8483,8 @@ cmd_commit(int argc, char *argv[])
 	const char *logmsg = NULL;
 	char *prepared_logmsg = NULL;
 	struct collect_commit_logmsg_arg cl_arg;
-	char *gitconfig_path = NULL, *editor = NULL, *author = NULL;
+	const char *author = NULL;
+	char *gitconfig_path = NULL, *editor = NULL, *committer = NULL;
 	int ch, rebase_in_progress, histedit_in_progress, preserve_logmsg = 0;
 	int allow_bad_symlinks = 0, non_interactive = 0, merge_in_progress = 0;
 	struct got_pathlist_head paths;
@@ -8485,8 +8493,13 @@ cmd_commit(int argc, char *argv[])
 	TAILQ_INIT(&paths);
 	cl_arg.logmsg_path = NULL;
 
-	while ((ch = getopt(argc, argv, "F:m:NS")) != -1) {
+	while ((ch = getopt(argc, argv, "A:F:m:NS")) != -1) {
 		switch (ch) {
+		case 'A':
+			author = optarg;
+			if ((error = valid_author(author)) != NULL)
+				return error;
+			break;
 		case 'F':
 			if (logmsg != NULL)
 				option_conflict('F', 'm');
@@ -8566,9 +8579,11 @@ cmd_commit(int argc, char *argv[])
 		goto done;
 	}
 
-	error = get_author(&author, repo, worktree);
+	error = get_author(&committer, repo, worktree);
 	if (error)
 		return error;
+	if (author == NULL)
+		author = committer;
 
 	/*
 	 * unveil(2) traverses exec(2); if an editor is used we have
@@ -8600,7 +8615,7 @@ cmd_commit(int argc, char *argv[])
 		cl_arg.branch_name += 11;
 	}
 	cl_arg.repo_path = got_repo_get_path(repo);
-	error = got_worktree_commit(&id, worktree, &paths, author, NULL,
+	error = got_worktree_commit(&id, worktree, &paths, author, committer,
 	    allow_bad_symlinks, collect_commit_logmsg, &cl_arg,
 	    print_status, NULL, repo);
 	if (error) {
@@ -8639,7 +8654,7 @@ done:
 	free(id_str);
 	free(gitconfig_path);
 	free(editor);
-	free(author);
+	free(committer);
 	free(prepared_logmsg);
 	return error;
 }
blob - 9dcd5b85b181ee63d76a75bf16c9ae0dc46aaa26
file + include/got_error.h
--- include/got_error.h
+++ include/got_error.h
@@ -122,7 +122,7 @@
 #define GOT_ERR_FILE_NOT_STAGED 105
 #define GOT_ERR_STAGED_PATHS	106
 #define GOT_ERR_PATCH_CHOICE	107
-#define GOT_ERR_COMMIT_NO_EMAIL	108
+#define GOT_ERR_BAD_EMAIL	108
 #define GOT_ERR_TAG_EXISTS	109
 #define GOT_ERR_GIT_REPO_FORMAT	110
 #define GOT_ERR_REBASE_REQUIRED	111
blob - bde78fabea00a9c132a5ec2ea937fc0665428986
file + lib/error.c
--- lib/error.c
+++ lib/error.c
@@ -159,8 +159,7 @@ static const struct got_error got_errors[] = {
 	{ GOT_ERR_STAGED_PATHS, "work tree contains files with staged "
 	    "changes; these changes must be committed or unstaged first" },
 	{ GOT_ERR_PATCH_CHOICE, "invalid patch choice" },
-	{ GOT_ERR_COMMIT_NO_EMAIL, "commit author's email address is required "
-	    "for compatibility with Git" },
+	{ GOT_ERR_BAD_EMAIL, "invalid email address" },
 	{ GOT_ERR_TAG_EXISTS,"specified tag already exists" },
 	{ GOT_ERR_GIT_REPO_FORMAT,"unknown git repository format version" },
 	{ GOT_ERR_REBASE_REQUIRED,"specified branch must be rebased first" },
blob - 4fe2c68a57b89e39bec69f8c2dbcf24f0f265384
file + regress/cmdline/commit.sh
--- regress/cmdline/commit.sh
+++ regress/cmdline/commit.sh
@@ -653,10 +653,8 @@ test_commit_no_email() {
 		got commit -m 'test no email' > $testroot/stdout \
 		2> $testroot/stderr)
 
-	echo -n "got: :flan_hacker:: commit author's email address " \
+	echo "got: :flan_hacker:: invalid email address" \
 		> $testroot/stderr.expected
-	echo "is required for compatibility with Git" \
-		>> $testroot/stderr.expected
 	cmp -s $testroot/stderr.expected $testroot/stderr
 	ret=$?
 	if [ $ret -ne 0 ]; then
@@ -703,6 +701,46 @@ test_commit_tree_entry_sorting() {
 	test_done "$testroot" "$ret"
 }
 
+test_commit_cmdline_author() {
+	local testroot=`test_init commit_cmdline_author`
+
+	got checkout $testroot/repo $testroot/wt > /dev/null
+	ret=$?
+	if [ $ret -ne 0 ]; then
+		test_done "$testroot" $ret
+		return 1
+	fi
+
+	author="Foo <foo@example.com>"
+	committer="Flan Hacker <flan_hacker@openbsd.org>"
+
+	echo "modified alpha" > $testroot/wt/alpha
+	(cd $testroot/wt && got commit -A "$author" -m 'from foo') \
+		> /dev/null
+	ret=$?
+	if [ $ret -ne 0 ]; then
+		test_done "$testroot" $ret
+		return 1
+	fi
+
+	(cd $testroot/repo && got log -l1 | egrep '^(from|via):') \
+		> $testroot/stdout
+	ret=$?
+	if [ $ret -ne 0 ]; then
+		test_done "$testroot" $ret
+		return 1
+	fi
+
+	echo "from: $author" > $testroot/stdout.expected
+	echo "via: $committer" >> $testroot/stdout.expected
+	cmp -s $testroot/stdout.expected $testroot/stdout
+	ret=$?
+	if [ $ret -ne 0 ]; then
+		diff -u $testroot/stdout.expected $testroot/stdout
+	fi
+	test_done "$testroot" $ret
+}
+
 test_commit_gotconfig_author() {
 	local testroot=`test_init commit_gotconfig_author`
 
@@ -1526,6 +1564,7 @@ run_test test_commit_selected_paths
 run_test test_commit_outside_refs_heads
 run_test test_commit_no_email
 run_test test_commit_tree_entry_sorting
+run_test test_commit_cmdline_author
 run_test test_commit_gotconfig_author
 run_test test_commit_gotconfig_worktree_author
 run_test test_commit_gitconfig_author