"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Omar Polo <op@omarpolo.com>
Subject:
Re: got patch: add flag to ignore whitespace?
To:
Mark Jamsek <mark@jamsek.com>
Cc:
Christian Weisgerber <naddy@mips.inka.de>, gameoftrees@openbsd.org
Date:
Sun, 03 Jul 2022 15:28:55 +0200

Download raw body.

Thread
Mark Jamsek <mark@jamsek.com> wrote:
> On 22-07-03 12:24am, Christian Weisgerber wrote:
> > Omar Polo:
> > 
> > > The last patch from Mark reminded me that I thought some time ago to add
> > > a "-w" flag for got patch to ignore white spaces.  It's conceptually
> > > similar to patch(1) -l/--ignore-whitespace or git-apply(1)
> > > --ignore-whitespace.
> 
> Firstly, I think this option is great. Thanks, op :)
> 
> > > The diff does exactly what one would expect.
> > 
> > Well, I'm dense.  What does "ignore whitespace" mean exactly?
> > Note that diff(1) has two ways to do this, -b and -w.
> 
> I had the same ambiguity when reading the docs for this too.
> Also, regarding "Ignore whitespaces in context lines."; how is "context
> lines" defined?

patching a file is basically a search and replace operation on lines.
you open the "old file", search around the line address mentioned in the
hunk header a range of lines that matches the hunk, and apply it.

For a hunk to match it means that all the lines that start with a - or a
space ' ' should match, in order.  This is called the context of the
diff.  (I'm actually not sure wether "context line" only applies to the
' ' lines or to both ' ' and '-' lines.)

> For instance, -w made my diff apply cleanly, but the whitespace was on
> a changed line.

Your diff failed to apply because the deleted lines were different from
the ones in the file (because of the trailing whitespaces) and so got
patch couldn't find the right offset in the file.  After it learned how
to ignore the whitespaces, the context matched and the lines were
changed (althought a few become no-op.)

> > If I understand linecmp() correctly, it's like diff -w?
> > 
> > Meanwhile patch -l seems to be more like diff -b?
> 
> I like Fossil's semantics for this:
> 
>   -w|--ignore-all-space       Ignore white space when comparing lines
>   -Z|--ignore-trailing-space  Ignore changes to end-of-line whitespace
> 
> but I think -w and -W might be better flags than -w and -Z.

I'm not sure how valuable is to provide both options actually.  I think
that the less flag we have, the better.  Also, users are really
interested in -w vs -Z?  I think they're more interested in got applying
the patch.

I was going with -w with the same semantics as Fossil here until Stefan
convinced me that there's no point in running the same command twice :)

> I'm not sure how I feel about this being the default though. I admit,
> it's nice not having failed patches, but as far as I'm aware, no other
> patch implementations do it by default so it might fail the Rule of
> Least Surprise. And--ironically coming from the person who sent out the
> mangled diff that instigated this--it shouldn't happen often. I can't
> even remember the last time I did it.

I have to agree here, my experience is that there aren't _that_ many
diffs with mangled spaces.  There are more diff around with wrapped
lines and perfectly fine diffs that fails to apply due to other changes
in between (fortunately now we have a diff3 merge for patches too.)

> Plus, with the:
> 
> "@@ -295,7 +305,7 @@ hunk contains mangled whitespace"
> 
> output, users can then rerun with -w.
> 
> That said, I really am ambivalent and can't decide which way I fall on
> this.

I'm still for having it enabled by default and not giving the option to
turn it off :)

An argument that can be made thought is that 'got patch' could try
harder before starting to ignore whitespaces.

At the moment 'got patch' works in a mostly linear fashion: the original
file is traversed from the start and we (almost never) jump back.  (in
reality we do some backward jumps, but we don't try as hard as patch(1)
does.)

I have something in mind for applying diffs with fuzz (i.e. ignoring up
to N lines of context) and I was thinking of doing some changes in the
patch machinery to accomodate multiple passes on the same file.  With
that we could move this whitespace-loose matching so it's only done in a
second pass.  The idea could be:

 first pass: try to apply the patch, eventually at offset
 second pass: try again with a whitespace loose matching
 third pass: whitespaces + fuzz

ideas?  (note that it'll take still a while for this changes, can't
provide an eta as i haven't started working on it yet :-)