"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Omar Polo <op@omarpolo.com>
Subject:
Re: got patch: add flag to ignore whitespace?
To:
Stefan Sperling <stsp@stsp.name>
Cc:
gameoftrees@openbsd.org
Date:
Sat, 02 Jul 2022 22:22:30 +0200

Download raw body.

Thread
Stefan Sperling <stsp@stsp.name> wrote:
> On Sat, Jul 02, 2022 at 05:27:30PM +0200, Omar Polo wrote:
> > Omar Polo <op@omarpolo.com> wrote:
> > > The last patch from Mark reminded me that I thought some time ago to add
> > > a "-w" flag for got patch to ignore white spaces.  It's conceptually
> > > similar to patch(1) -l/--ignore-whitespace or git-apply(1)
> > > --ignore-whitespace.
> > > 
> > > Here's a possible usage (using Mark' last diff)
> > > 
> > > 	% mshow | got patch
> > > 	#  tog/tog.1
> > > 	@@ -295,7 +305,7 @@ hunk failed to apply
> > > 	G  tog/tog.c
> > > 	got: patch failed to apply
> > > 	% # woops, try again with -w
> > > 	% mshow | got patch -w
> > > 	G  tog/tog.1
> > > 	G  tog/tog.c
> > > 	% # yay!
> > > 
> > > The diff does exactly what one would expect.  The line comparisons are
> > > now done in linecmp, which first runs strcmp(3) and optionally falls
> > > back to a whitespace-loose matching.
> > > 
> > > There's a bit of churn in apply_hunk now.  Before, I was re-using the
> > > lines saved in the hunk to apply it, now I can't.  If the context lines
> > > of the hunk may have the whitespaces mangled and so I have to re-read
> > > the lines from the original file.  Even if we're using a loose match,
> > > the context line needs not to be changed.
> > 
> > Wops, it introduced a stupid error in apply_hunk: when adding a file we
> > wouldn't add \n at the end of the lines.  The regress suite didn't catch
> > this, I'll add a testcase for it later as it's independent from this
> > diff
> > 
> > The change with the previous is to strip newlines read in apply_hunk and
> > always add them when printing to the temp file.
> 
> Nice idea.
> 
> Recalling that patch(1) is supposed to cope with fuzzyness of changes,
> I wonder if -w as you implemented it here should be the default behaviour?

Yes, it's probably a good idea.  I'm sometimes worried about being too
lax, but on the other hand being strict and refusing to patch files is
not a useful behavior either.

> With an option (maybe -w) to turn strict whitespace matching back on?
>
> In cases where one would need to use the -w option as you have implemented it
> there is no other way to get the diff to apply. So we could make your -w the
> default and provide a visual hint when whitespace differences are being
> ignored:
> 
>  	@@ -295,7 +305,7 @@ hunk contains mangled whitespace

I like the idea!  Diff belows keeps the behavior, drops -w and adds this
visual aid.  I've added it with a lower precedence, so if a hunk is
applied at a different offset _and_ with mangled whitespaces only the
"applied at offset" info will be printed.

We avoid adding a flag, love it :)

> And if anyone is for some reason worried about applying such mangled
> whitespace diffs they could use an option to ask for strict
> whitespace matching instead.

Rather than a flag specific for whitespaces, a "strict mode" that
considers an error a non-perfect application of the diff seems more
useful.  But probably we can get away without it: we already print the
info on the screen and it's even easy to parse (`got patch | grep ^@').
If deemed useful I can add it tho.


P.S.: i noticed now that in the long -> int change I forgot to move the
progress callback to int.  Will do after this lands to avoid conflicts.

diff refs/heads/main refs/heads/pw
commit - f5b0315f0e07bfd36a4eb37d91884fcd8614745a
commit + 393c10527c09d54aa19c1f121b2cc74c31ac4746
blob - d2db3f11b55b0dcd7008e9f4662210887aa2742b
blob + a79c8a428a21ddd30f49aa132dd4457a73135fa1
--- got/got.1
+++ got/got.1
@@ -1347,6 +1347,9 @@ If a change does not match at its exact line number, a
 apply it somewhere else in the file if a good spot can be found.
 Otherwise, the patch will fail to apply.
 .Pp
+Whitespaces may be ignored when trying to match the context of a
+change, as they may have been mangled.
+.Pp
 .Nm
 .Cm patch
 will refuse to apply a patch if certain preconditions are not met.
blob - 5f7f00007937a048564e685aac0a8c6b9e98adab
blob + f547d6ee3387c840c7b759ff4a8675fd0da2dae4
--- got/got.c
+++ got/got.c
@@ -7758,7 +7758,7 @@ static const struct got_error *
 patch_progress(void *arg, const char *old, const char *new,
     unsigned char status, const struct got_error *error, long old_from,
     long old_lines, long new_from, long new_lines, long offset,
-    const struct got_error *hunk_err)
+    int ws_mangled, const struct got_error *hunk_err)
 {
 	const char *path = new == NULL ? old : new;
 
@@ -7771,13 +7771,15 @@ patch_progress(void *arg, const char *old, const char 
 	if (error != NULL)
 		fprintf(stderr, "%s: %s\n", getprogname(), error->msg);
 
-	if (offset != 0 || hunk_err != NULL) {
+	if (offset != 0 || hunk_err != NULL || ws_mangled) {
 		printf("@@ -%ld,%ld +%ld,%ld @@ ", old_from,
 		    old_lines, new_from, new_lines);
 		if (hunk_err != NULL)
 			printf("%s\n", hunk_err->msg);
-		else
+		else if (offset != 0)
 			printf("applied with offset %ld\n", offset);
+		else
+			printf("hunk contains mangled whitespaces\n");
 	}
 
 	return NULL;
blob - bc2e95bb4fa535617bc28f76935a78e36dea8a3e
blob + e72ded72ccf8311606f1c945e5241311de5cc179
--- include/got_patch.h
+++ include/got_patch.h
@@ -22,7 +22,7 @@
  */
 typedef const struct got_error *(*got_patch_progress_cb)(void *,
     const char *, const char *, unsigned char, const struct got_error *,
-    long, long, long, long, long, const struct got_error *);
+    long, long, long, long, long, int, const struct got_error *);
 
 /*
  * Apply the (already opened) patch to the repository and register the
blob - 6b6915f9998d30a1ae5b1846365b299ba2c974fe
blob + 15c33761234e42c0eb7fec5939c05a49aa94a4a1
--- lib/patch.c
+++ lib/patch.c
@@ -26,6 +26,7 @@
 #include <sys/stat.h>
 #include <sys/uio.h>
 
+#include <ctype.h>
 #include <errno.h>
 #include <limits.h>
 #include <sha1.h>
@@ -58,6 +59,7 @@
 struct got_patch_hunk {
 	STAILQ_ENTRY(got_patch_hunk) entries;
 	const struct got_error *err;
+	int	ws_mangled;
 	int	offset;
 	int	old_nonl;
 	int	new_nonl;
@@ -401,6 +403,30 @@ locate_hunk(FILE *orig, struct got_patch_hunk *h, off_
 	return err;
 }
 
+static int
+linecmp(const char *a, const char *b, int *mangled)
+{
+	int c;
+
+	*mangled = 0;
+	c = strcmp(a, b);
+	if (c == 0)
+		return c;
+
+	*mangled = 1;
+	for (;;) {
+		while (isspace(*a))
+			a++;
+		while (isspace(*b))
+			b++;
+		if (*a == '\0' || *b == '\0' || *a != *b)
+			break;
+		a++, b++;
+	}
+
+	return *a - *b;
+}
+
 static const struct got_error *
 test_hunk(FILE *orig, struct got_patch_hunk *h)
 {
@@ -408,6 +434,7 @@ test_hunk(FILE *orig, struct got_patch_hunk *h)
 	char *line = NULL;
 	size_t linesize = 0, i = 0;
 	ssize_t linelen;
+	int mangled;
 
 	for (i = 0; i < h->len; ++i) {
 		switch (*h->lines[i]) {
@@ -426,10 +453,12 @@ test_hunk(FILE *orig, struct got_patch_hunk *h)
 			}
 			if (line[linelen - 1] == '\n')
 				line[linelen - 1] = '\0';
-			if (strcmp(h->lines[i] + 1, line)) {
+			if (linecmp(h->lines[i] + 1, line, &mangled)) {
 				err = got_error(GOT_ERR_HUNK_FAILED);
 				goto done;
 			}
+			if (mangled)
+				h->ws_mangled = 1;
 			break;
 		}
 	}
@@ -440,32 +469,61 @@ done:
 }
 
 static const struct got_error *
-apply_hunk(FILE *tmp, struct got_patch_hunk *h, int *lineno)
+apply_hunk(FILE *orig, FILE *tmp, struct got_patch_hunk *h, int *lineno,
+    off_t from)
 {
-	size_t i, new = 0;
+	const struct got_error *err = NULL;
+	const char *t;
+	size_t linesize = 0, i, new = 0;
+	char *line = NULL;
+	char mode;
+	ssize_t linelen;
 
+	if (orig != NULL && fseeko(orig, from, SEEK_SET) == -1)
+		return got_error_from_errno("fseeko");
+
 	for (i = 0; i < h->len; ++i) {
-		switch (*h->lines[i]) {
-		case ' ':
-			if (fprintf(tmp, "%s\n", h->lines[i] + 1) < 0)
-				return got_error_from_errno("fprintf");
-			/* fallthrough */
+		switch (mode = *h->lines[i]) {
 		case '-':
+		case ' ':
 			(*lineno)++;
+			if (orig != NULL) {
+				linelen = getline(&line, &linesize, orig);
+				if (linelen == -1) {
+					err = got_error_from_errno("getline");
+					goto done;
+				}
+				if (line[linelen - 1] == '\n')
+					line[linelen - 1] = '\0';
+				t = line;
+			} else
+				t = h->lines[i] + 1;
+			if (mode == '-')
+				continue;
+			if (fprintf(tmp, "%s\n", t) < 0) {
+				err = got_error_from_errno("fprintf");
+				goto done;
+			}
 			break;
 		case '+':
 			new++;
-			if (fprintf(tmp, "%s", h->lines[i] + 1) < 0)
-				return got_error_from_errno("fprintf");
+			if (fprintf(tmp, "%s", h->lines[i] + 1) < 0) {
+				err = got_error_from_errno("fprintf");
+				goto done;
+			}
 			if (new != h->new_lines || !h->new_nonl) {
-				if (fprintf(tmp, "\n") < 0)
-					return got_error_from_errno(
-					    "fprintf");
+				if (fprintf(tmp, "\n") < 0) {
+					err = got_error_from_errno("fprintf");
+					goto done;
+				}
 			}
 			break;
 		}
 	}
-	return NULL;
+
+done:
+	free(line);
+	return err;
 }
 
 static const struct got_error *
@@ -484,7 +542,7 @@ patch_file(struct got_patch *p, FILE *orig, FILE *tmp,
 		h = STAILQ_FIRST(&p->head);
 		if (h == NULL || STAILQ_NEXT(h, entries) != NULL)
 			return got_error(GOT_ERR_PATCH_MALFORMED);
-		return apply_hunk(tmp, h, &lineno);
+		return apply_hunk(orig, tmp, h, &lineno, 0);
 	}
 
 	if (fstat(fileno(orig), &sb) == -1)
@@ -524,7 +582,7 @@ patch_file(struct got_patch *p, FILE *orig, FILE *tmp,
 		if (lineno + 1 != h->old_from)
 			h->offset = lineno + 1 - h->old_from;
 
-		err = apply_hunk(tmp, h, &lineno);
+		err = apply_hunk(orig, tmp, h, &lineno, pos);
 		if (err != NULL)
 			return err;
 
@@ -551,17 +609,17 @@ report_progress(struct patch_args *pa, const char *old
 	struct got_patch_hunk *h;
 
 	err = pa->progress_cb(pa->progress_arg, old, new, status,
-	    orig_error, 0, 0, 0, 0, 0, NULL);
+	    orig_error, 0, 0, 0, 0, 0, 0, NULL);
 	if (err)
 		return err;
 
 	STAILQ_FOREACH(h, pa->head, entries) {
-		if (h->offset == 0 && h->err == NULL)
+		if (h->offset == 0 && !h->ws_mangled && h->err == NULL)
 			continue;
 
 		err = pa->progress_cb(pa->progress_arg, old, new, 0, NULL,
 		    h->old_from, h->old_lines, h->new_from, h->new_lines,
-		    h->offset, h->err);
+		    h->offset, h->ws_mangled, h->err);
 		if (err)
 			return err;
 	}
blob - a7ff83ea9847a8fa722fdbfda0cfc471da1d050d
blob + 582587d781c16be8549c3e5369d44a37927d0a4d
--- regress/cmdline/patch.sh
+++ regress/cmdline/patch.sh
@@ -1277,6 +1277,91 @@ EOF
 	test_done $testroot 0
 }
 
+test_patch_whitespace() {
+	local testroot=`test_init patch_whitespace`
+
+	got checkout $testroot/repo $testroot/wt > /dev/null
+	ret=$?
+	if [ $ret -ne 0 ]; then
+		test_done $testroot $ret
+		return 1
+	fi
+
+	trailing="		"
+
+	cat <<EOF > $testroot/wt/hello.c
+#include <stdio.h>
+
+int
+main(void)
+{
+	/* the trailing whitespace is on purpose */
+	printf("hello, world\n");$trailing
+	return 0;
+}
+EOF
+
+	(cd $testroot/wt && got add hello.c && got ci -m '+hello.c') \
+		> /dev/null
+	ret=$?
+	if [ $ret -ne 0 ]; then
+		test_done $testroot $ret
+		return 1
+	fi
+
+	# test with a diff with various whitespace corruptions
+	cat <<EOF > $testroot/wt/patch
+--- hello.c
++++ hello.c
+@@ -5,5 +5,5 @@
+ {
+ /* the trailing whitespace is on purpose */
+	printf("hello, world\n");
+-    return 0;
++    return 5; /* always fails */
+ }
+EOF
+
+	(cd $testroot/wt && got patch patch) \
+		2>$testroot/stderr >$testroot/stdout
+	ret=$?
+	if [ $ret -ne 0 ]; then
+		echo "failed to apply diff" >&2
+		test_done $testroot $ret
+		return 1
+	fi
+
+	echo 'M  hello.c' > $testroot/stdout.expected
+	echo '@@ -5,5 +5,5 @@ hunk contains mangled whitespaces' \
+		>> $testroot/stdout.expected
+	cmp -s $testroot/stdout.expected $testroot/stdout
+	ret=$?
+	if [ $ret -ne 0 ]; then
+		diff -u $testroot/stdout.expected $testroot/stdout
+		test_done $testroot $ret
+		return 1
+	fi
+
+	cat <<EOF > $testroot/wt/hello.c.expected
+#include <stdio.h>
+
+int
+main(void)
+{
+	/* the trailing whitespace is on purpose */
+	printf("hello, world\n");$trailing
+    return 5; /* always fails */
+}
+EOF
+
+	cmp -s $testroot/wt/hello.c.expected $testroot/wt/hello.c
+	ret=$?
+	if [ $ret -ne 0 ]; then
+		diff -u $testroot/wt/hello.c.expected $testroot/wt/hello.c
+	fi
+	test_done $testroot $ret
+}
+
 test_patch_relative_paths() {
 	local testroot=`test_init patch_relative_paths`
 
@@ -1802,6 +1887,7 @@ run_test test_patch_with_offset
 run_test test_patch_prefer_new_path
 run_test test_patch_no_newline
 run_test test_patch_strip
+run_test test_patch_whitespace
 run_test test_patch_relative_paths
 run_test test_patch_with_path_prefix
 run_test test_patch_relpath_with_path_prefix