Download raw body.
got patch: add flag to ignore whitespace?
Stefan Sperling <stsp@stsp.name> wrote: > On Sat, Jul 02, 2022 at 05:27:30PM +0200, Omar Polo wrote: > > Omar Polo <op@omarpolo.com> wrote: > > > The last patch from Mark reminded me that I thought some time ago to add > > > a "-w" flag for got patch to ignore white spaces. It's conceptually > > > similar to patch(1) -l/--ignore-whitespace or git-apply(1) > > > --ignore-whitespace. > > > > > > Here's a possible usage (using Mark' last diff) > > > > > > % mshow | got patch > > > # tog/tog.1 > > > @@ -295,7 +305,7 @@ hunk failed to apply > > > G tog/tog.c > > > got: patch failed to apply > > > % # woops, try again with -w > > > % mshow | got patch -w > > > G tog/tog.1 > > > G tog/tog.c > > > % # yay! > > > > > > The diff does exactly what one would expect. The line comparisons are > > > now done in linecmp, which first runs strcmp(3) and optionally falls > > > back to a whitespace-loose matching. > > > > > > There's a bit of churn in apply_hunk now. Before, I was re-using the > > > lines saved in the hunk to apply it, now I can't. If the context lines > > > of the hunk may have the whitespaces mangled and so I have to re-read > > > the lines from the original file. Even if we're using a loose match, > > > the context line needs not to be changed. > > > > Wops, it introduced a stupid error in apply_hunk: when adding a file we > > wouldn't add \n at the end of the lines. The regress suite didn't catch > > this, I'll add a testcase for it later as it's independent from this > > diff > > > > The change with the previous is to strip newlines read in apply_hunk and > > always add them when printing to the temp file. > > Nice idea. > > Recalling that patch(1) is supposed to cope with fuzzyness of changes, > I wonder if -w as you implemented it here should be the default behaviour? Yes, it's probably a good idea. I'm sometimes worried about being too lax, but on the other hand being strict and refusing to patch files is not a useful behavior either. > With an option (maybe -w) to turn strict whitespace matching back on? > > In cases where one would need to use the -w option as you have implemented it > there is no other way to get the diff to apply. So we could make your -w the > default and provide a visual hint when whitespace differences are being > ignored: > > @@ -295,7 +305,7 @@ hunk contains mangled whitespace I like the idea! Diff belows keeps the behavior, drops -w and adds this visual aid. I've added it with a lower precedence, so if a hunk is applied at a different offset _and_ with mangled whitespaces only the "applied at offset" info will be printed. We avoid adding a flag, love it :) > And if anyone is for some reason worried about applying such mangled > whitespace diffs they could use an option to ask for strict > whitespace matching instead. Rather than a flag specific for whitespaces, a "strict mode" that considers an error a non-perfect application of the diff seems more useful. But probably we can get away without it: we already print the info on the screen and it's even easy to parse (`got patch | grep ^@'). If deemed useful I can add it tho. P.S.: i noticed now that in the long -> int change I forgot to move the progress callback to int. Will do after this lands to avoid conflicts. diff refs/heads/main refs/heads/pw commit - f5b0315f0e07bfd36a4eb37d91884fcd8614745a commit + 393c10527c09d54aa19c1f121b2cc74c31ac4746 blob - d2db3f11b55b0dcd7008e9f4662210887aa2742b blob + a79c8a428a21ddd30f49aa132dd4457a73135fa1 --- got/got.1 +++ got/got.1 @@ -1347,6 +1347,9 @@ If a change does not match at its exact line number, a apply it somewhere else in the file if a good spot can be found. Otherwise, the patch will fail to apply. .Pp +Whitespaces may be ignored when trying to match the context of a +change, as they may have been mangled. +.Pp .Nm .Cm patch will refuse to apply a patch if certain preconditions are not met. blob - 5f7f00007937a048564e685aac0a8c6b9e98adab blob + f547d6ee3387c840c7b759ff4a8675fd0da2dae4 --- got/got.c +++ got/got.c @@ -7758,7 +7758,7 @@ static const struct got_error * patch_progress(void *arg, const char *old, const char *new, unsigned char status, const struct got_error *error, long old_from, long old_lines, long new_from, long new_lines, long offset, - const struct got_error *hunk_err) + int ws_mangled, const struct got_error *hunk_err) { const char *path = new == NULL ? old : new; @@ -7771,13 +7771,15 @@ patch_progress(void *arg, const char *old, const char if (error != NULL) fprintf(stderr, "%s: %s\n", getprogname(), error->msg); - if (offset != 0 || hunk_err != NULL) { + if (offset != 0 || hunk_err != NULL || ws_mangled) { printf("@@ -%ld,%ld +%ld,%ld @@ ", old_from, old_lines, new_from, new_lines); if (hunk_err != NULL) printf("%s\n", hunk_err->msg); - else + else if (offset != 0) printf("applied with offset %ld\n", offset); + else + printf("hunk contains mangled whitespaces\n"); } return NULL; blob - bc2e95bb4fa535617bc28f76935a78e36dea8a3e blob + e72ded72ccf8311606f1c945e5241311de5cc179 --- include/got_patch.h +++ include/got_patch.h @@ -22,7 +22,7 @@ */ typedef const struct got_error *(*got_patch_progress_cb)(void *, const char *, const char *, unsigned char, const struct got_error *, - long, long, long, long, long, const struct got_error *); + long, long, long, long, long, int, const struct got_error *); /* * Apply the (already opened) patch to the repository and register the blob - 6b6915f9998d30a1ae5b1846365b299ba2c974fe blob + 15c33761234e42c0eb7fec5939c05a49aa94a4a1 --- lib/patch.c +++ lib/patch.c @@ -26,6 +26,7 @@ #include <sys/stat.h> #include <sys/uio.h> +#include <ctype.h> #include <errno.h> #include <limits.h> #include <sha1.h> @@ -58,6 +59,7 @@ struct got_patch_hunk { STAILQ_ENTRY(got_patch_hunk) entries; const struct got_error *err; + int ws_mangled; int offset; int old_nonl; int new_nonl; @@ -401,6 +403,30 @@ locate_hunk(FILE *orig, struct got_patch_hunk *h, off_ return err; } +static int +linecmp(const char *a, const char *b, int *mangled) +{ + int c; + + *mangled = 0; + c = strcmp(a, b); + if (c == 0) + return c; + + *mangled = 1; + for (;;) { + while (isspace(*a)) + a++; + while (isspace(*b)) + b++; + if (*a == '\0' || *b == '\0' || *a != *b) + break; + a++, b++; + } + + return *a - *b; +} + static const struct got_error * test_hunk(FILE *orig, struct got_patch_hunk *h) { @@ -408,6 +434,7 @@ test_hunk(FILE *orig, struct got_patch_hunk *h) char *line = NULL; size_t linesize = 0, i = 0; ssize_t linelen; + int mangled; for (i = 0; i < h->len; ++i) { switch (*h->lines[i]) { @@ -426,10 +453,12 @@ test_hunk(FILE *orig, struct got_patch_hunk *h) } if (line[linelen - 1] == '\n') line[linelen - 1] = '\0'; - if (strcmp(h->lines[i] + 1, line)) { + if (linecmp(h->lines[i] + 1, line, &mangled)) { err = got_error(GOT_ERR_HUNK_FAILED); goto done; } + if (mangled) + h->ws_mangled = 1; break; } } @@ -440,32 +469,61 @@ done: } static const struct got_error * -apply_hunk(FILE *tmp, struct got_patch_hunk *h, int *lineno) +apply_hunk(FILE *orig, FILE *tmp, struct got_patch_hunk *h, int *lineno, + off_t from) { - size_t i, new = 0; + const struct got_error *err = NULL; + const char *t; + size_t linesize = 0, i, new = 0; + char *line = NULL; + char mode; + ssize_t linelen; + if (orig != NULL && fseeko(orig, from, SEEK_SET) == -1) + return got_error_from_errno("fseeko"); + for (i = 0; i < h->len; ++i) { - switch (*h->lines[i]) { - case ' ': - if (fprintf(tmp, "%s\n", h->lines[i] + 1) < 0) - return got_error_from_errno("fprintf"); - /* fallthrough */ + switch (mode = *h->lines[i]) { case '-': + case ' ': (*lineno)++; + if (orig != NULL) { + linelen = getline(&line, &linesize, orig); + if (linelen == -1) { + err = got_error_from_errno("getline"); + goto done; + } + if (line[linelen - 1] == '\n') + line[linelen - 1] = '\0'; + t = line; + } else + t = h->lines[i] + 1; + if (mode == '-') + continue; + if (fprintf(tmp, "%s\n", t) < 0) { + err = got_error_from_errno("fprintf"); + goto done; + } break; case '+': new++; - if (fprintf(tmp, "%s", h->lines[i] + 1) < 0) - return got_error_from_errno("fprintf"); + if (fprintf(tmp, "%s", h->lines[i] + 1) < 0) { + err = got_error_from_errno("fprintf"); + goto done; + } if (new != h->new_lines || !h->new_nonl) { - if (fprintf(tmp, "\n") < 0) - return got_error_from_errno( - "fprintf"); + if (fprintf(tmp, "\n") < 0) { + err = got_error_from_errno("fprintf"); + goto done; + } } break; } } - return NULL; + +done: + free(line); + return err; } static const struct got_error * @@ -484,7 +542,7 @@ patch_file(struct got_patch *p, FILE *orig, FILE *tmp, h = STAILQ_FIRST(&p->head); if (h == NULL || STAILQ_NEXT(h, entries) != NULL) return got_error(GOT_ERR_PATCH_MALFORMED); - return apply_hunk(tmp, h, &lineno); + return apply_hunk(orig, tmp, h, &lineno, 0); } if (fstat(fileno(orig), &sb) == -1) @@ -524,7 +582,7 @@ patch_file(struct got_patch *p, FILE *orig, FILE *tmp, if (lineno + 1 != h->old_from) h->offset = lineno + 1 - h->old_from; - err = apply_hunk(tmp, h, &lineno); + err = apply_hunk(orig, tmp, h, &lineno, pos); if (err != NULL) return err; @@ -551,17 +609,17 @@ report_progress(struct patch_args *pa, const char *old struct got_patch_hunk *h; err = pa->progress_cb(pa->progress_arg, old, new, status, - orig_error, 0, 0, 0, 0, 0, NULL); + orig_error, 0, 0, 0, 0, 0, 0, NULL); if (err) return err; STAILQ_FOREACH(h, pa->head, entries) { - if (h->offset == 0 && h->err == NULL) + if (h->offset == 0 && !h->ws_mangled && h->err == NULL) continue; err = pa->progress_cb(pa->progress_arg, old, new, 0, NULL, h->old_from, h->old_lines, h->new_from, h->new_lines, - h->offset, h->err); + h->offset, h->ws_mangled, h->err); if (err) return err; } blob - a7ff83ea9847a8fa722fdbfda0cfc471da1d050d blob + 582587d781c16be8549c3e5369d44a37927d0a4d --- regress/cmdline/patch.sh +++ regress/cmdline/patch.sh @@ -1277,6 +1277,91 @@ EOF test_done $testroot 0 } +test_patch_whitespace() { + local testroot=`test_init patch_whitespace` + + got checkout $testroot/repo $testroot/wt > /dev/null + ret=$? + if [ $ret -ne 0 ]; then + test_done $testroot $ret + return 1 + fi + + trailing=" " + + cat <<EOF > $testroot/wt/hello.c +#include <stdio.h> + +int +main(void) +{ + /* the trailing whitespace is on purpose */ + printf("hello, world\n");$trailing + return 0; +} +EOF + + (cd $testroot/wt && got add hello.c && got ci -m '+hello.c') \ + > /dev/null + ret=$? + if [ $ret -ne 0 ]; then + test_done $testroot $ret + return 1 + fi + + # test with a diff with various whitespace corruptions + cat <<EOF > $testroot/wt/patch +--- hello.c ++++ hello.c +@@ -5,5 +5,5 @@ + { + /* the trailing whitespace is on purpose */ + printf("hello, world\n"); +- return 0; ++ return 5; /* always fails */ + } +EOF + + (cd $testroot/wt && got patch patch) \ + 2>$testroot/stderr >$testroot/stdout + ret=$? + if [ $ret -ne 0 ]; then + echo "failed to apply diff" >&2 + test_done $testroot $ret + return 1 + fi + + echo 'M hello.c' > $testroot/stdout.expected + echo '@@ -5,5 +5,5 @@ hunk contains mangled whitespaces' \ + >> $testroot/stdout.expected + cmp -s $testroot/stdout.expected $testroot/stdout + ret=$? + if [ $ret -ne 0 ]; then + diff -u $testroot/stdout.expected $testroot/stdout + test_done $testroot $ret + return 1 + fi + + cat <<EOF > $testroot/wt/hello.c.expected +#include <stdio.h> + +int +main(void) +{ + /* the trailing whitespace is on purpose */ + printf("hello, world\n");$trailing + return 5; /* always fails */ +} +EOF + + cmp -s $testroot/wt/hello.c.expected $testroot/wt/hello.c + ret=$? + if [ $ret -ne 0 ]; then + diff -u $testroot/wt/hello.c.expected $testroot/wt/hello.c + fi + test_done $testroot $ret +} + test_patch_relative_paths() { local testroot=`test_init patch_relative_paths` @@ -1802,6 +1887,7 @@ run_test test_patch_with_offset run_test test_patch_prefer_new_path run_test test_patch_no_newline run_test test_patch_strip +run_test test_patch_whitespace run_test test_patch_relative_paths run_test test_patch_with_path_prefix run_test test_patch_relpath_with_path_prefix
got patch: add flag to ignore whitespace?