"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Omar Polo <op@omarpolo.com>
Subject:
Re: fix email parsing
To:
Stefan Sperling <stsp@stsp.name>
Cc:
gameoftrees@openbsd.org
Date:
Tue, 19 Jul 2022 11:44:31 +0200

Download raw body.

Thread
Stefan Sperling <stsp@stsp.name> wrote:
> On Tue, Jul 19, 2022 at 10:55:17AM +0200, Omar Polo wrote:
> > -static int
> > +static const struct got_error *
> >  valid_author(const char *author)
> >  {
> > +	const char *email = author;
> > +
> >  	/*
> >  	 * Really dumb email address check; we're only doing this to
> >  	 * avoid git's object parser breaking on commits we create.
> >  	 */
> 
> The check isn't dumb anymore now, is it? :)

yep :)

> Actually, I think it would help to mention which git man page section
> these rules come from, like you did in your initial email. The comment
> could even quote the whole paragraph in case they move their docs around.
> 
> What I think is important to communicate to the reader (and to the user
> who sees an error message) is that we are raising an error only because
> Git enforces an email address (well, actually "<" and ">" as you found
> out, but that's just an implementation detail). Otherwise, in Got we
> could set GOT_AUTHOR="Flan Hacker" and it would work just fine.

agreed, diff below reword the comment to point to the manpage.

> > -	while (*author && *author != '<')
> > +
> > +	while (*author && *author != '\n' && *author != '<')
> >  		author++;
> > -	if (*author != '<')
> > -		return 0;
> > -	while (*author && *author != '@')
> > +	if (*author++ != '<')
> > +		goto err;
> 
> If a function does not need to free resources before returning then it
> does not really need 'goto'. I don't see an advantage of having two
> 'goto err' cases over seeing this line appear twice:
> 
> 	return got_error_fmt(GOT_ERR_COMMIT_NO_EMAIL, "%s", email);

yep, initially there were a few more returns, but it's ugly.  fixed.

> Alternatively, this function could return of 0/1 and we could keep
> raising an error in the caller.

it was just to shorted a bit a future usage of it in cmd_commit if we
add a -A flag.  I can make it return 0/1 as before though

> Sorry, I missed this in my initial review. 

no problem, and thanks for looking into this :)

diff refs/heads/main refs/heads/fixmail
commit - d2587c5f95c6edb51ccc8d4abfac838b58f3a463
commit + 748d595a995c322543a2ecaae252f8446f4fe391
blob - b7c305dd8fba01e7b8b4d08eae548d3f7c53246c
blob + fc3f426e58ce1bbe4245627188786b1b9227e1c0
--- got/got.c
+++ got/got.c
@@ -548,24 +548,28 @@ import_progress(void *arg, const char *path)
 	return NULL;
 }
 
-static int
+static const struct got_error *
 valid_author(const char *author)
 {
+	const char *email = author;
+
 	/*
-	 * Really dumb email address check; we're only doing this to
-	 * avoid git's object parser breaking on commits we create.
+	 * Git' expects the author (or committer) to be in the form
+	 * "name <email>", which are mostly free form (see the
+	 * "committer" description in git-fast-import(1)).  We're only
+	 * doing this to avoid git's object parser breaking on commits
+	 * we create.
 	 */
-	while (*author && *author != '<')
+
+	while (*author && *author != '\n' && *author != '<' && *author != '>')
 		author++;
-	if (*author != '<')
-		return 0;
-	while (*author && *author != '@')
+	if (*author++ != '<')
+		return got_error_fmt(GOT_ERR_COMMIT_NO_EMAIL, "%s", email);
+	while (*author && *author != '\n' && *author != '<' && *author != '>')
 		author++;
-	if (*author != '@')
-		return 0;
-	while (*author && *author != '>')
-		author++;
-	return *author == '>';
+	if (strcmp(author, ">") != 0)
+		return got_error_fmt(GOT_ERR_COMMIT_NO_EMAIL, "%s", email);
+	return NULL;
 }
 
 static const struct got_error *
@@ -625,8 +629,8 @@ get_author(char **author, struct got_repository *repo,
 	if (*author == NULL)
 		return got_error_from_errno("strdup");
 
-	if (!valid_author(*author)) {
-		err = got_error_fmt(GOT_ERR_COMMIT_NO_EMAIL, "%s", *author);
+	err = valid_author(*author);
+	if (err) {
 		free(*author);
 		*author = NULL;
 	}
blob - 4fe2c68a57b89e39bec69f8c2dbcf24f0f265384
blob + 74b1300abdc8bad6089c22251a9efd9642490d49
--- regress/cmdline/commit.sh
+++ regress/cmdline/commit.sh
@@ -640,7 +640,11 @@ test_commit_outside_refs_heads() {
 
 test_commit_no_email() {
 	local testroot=`test_init commit_no_email`
+	local errmsg=""
 
+	errmsg="commit author's email address is required for"
+	errmsg="$errmsg compatibility with Git"
+
 	got checkout $testroot/repo $testroot/wt > /dev/null
 	ret=$?
 	if [ $ret -ne 0 ]; then
@@ -653,10 +657,7 @@ test_commit_no_email() {
 		got commit -m 'test no email' > $testroot/stdout \
 		2> $testroot/stderr)
 
-	echo -n "got: :flan_hacker:: commit author's email address " \
-		> $testroot/stderr.expected
-	echo "is required for compatibility with Git" \
-		>> $testroot/stderr.expected
+	printf "got: :flan_hacker:: %s\n" "$errmsg" > $testroot/stderr.expected
 	cmp -s $testroot/stderr.expected $testroot/stderr
 	ret=$?
 	if [ $ret -ne 0 ]; then
@@ -670,8 +671,56 @@ test_commit_no_email() {
 	ret=$?
 	if [ $ret -ne 0 ]; then
 		diff -u $testroot/stdout.expected $testroot/stdout
+		test_done "$testroot" $ret
+		return 1
 	fi
-	test_done "$testroot" "$ret"
+
+	# try again with a newline inside the email
+	(cd $testroot/wt \
+		&& FS=' ' env GOT_AUTHOR="$(printf "Flan <hack\ner>")" \
+		got commit -m 'test invalid email' > $testroot/stdout \
+		2> $testroot/stderr)
+
+	printf "got: Flan <hack\ner>: %s\n" "$errmsg" \
+		> $testroot/stderr.expected
+	cmp -s $testroot/stderr.expected $testroot/stderr
+	ret=$?
+	if [ $ret -ne 0 ]; then
+		diff -u $testroot/stderr.expected $testroot/stderr
+		test_done "$testroot" $ret
+		return 1
+	fi
+
+	echo -n > $testroot/stdout.expected
+	cmp -s $testroot/stdout.expected $testroot/stdout
+	ret=$?
+	if [ $ret -ne 0 ]; then
+		diff -u $testroot/stdout.expected $testroot/stdout
+		test_done "$testroot" $ret
+		return 1
+	fi
+
+	# try again with a < inside the email
+	(cd $testroot/wt && env GOT_AUTHOR="$(printf "Flan <ha<ker>")" \
+		got commit -m 'test invalid email' > $testroot/stdout \
+		2> $testroot/stderr)
+
+	printf "got: Flan <ha<ker>: %s\n" "$errmsg" > $testroot/stderr.expected
+	cmp -s $testroot/stderr.expected $testroot/stderr
+	ret=$?
+	if [ $ret -ne 0 ]; then
+		diff -u $testroot/stderr.expected $testroot/stderr
+		test_done "$testroot" $ret
+		return 1
+	fi
+
+	echo -n > $testroot/stdout.expected
+	cmp -s $testroot/stdout.expected $testroot/stdout
+	ret=$?
+	if [ $ret -ne 0 ]; then
+		diff -u $testroot/stdout.expected $testroot/stdout
+	fi
+	test_done "$testroot" $ret
 }
 
 test_commit_tree_entry_sorting() {