"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Mark Jamsek <mark@jamsek.com>
Subject:
Re: got patch: add flag to ignore whitespace?
To:
Omar Polo <op@omarpolo.com>
Cc:
Christian Weisgerber <naddy@mips.inka.de>, gameoftrees@openbsd.org
Date:
Mon, 4 Jul 2022 00:08:55 +1000

Download raw body.

Thread
On 22-07-03 03:28pm, Omar Polo wrote:
> Mark Jamsek <mark@jamsek.com> wrote:
> > On 22-07-03 12:24am, Christian Weisgerber wrote:
> > > Omar Polo:
> > > 
> > Also, regarding "Ignore whitespaces in context lines."; how is "context
> > lines" defined?
> 
> patching a file is basically a search and replace operation on lines.
> you open the "old file", search around the line address mentioned in the
> hunk header a range of lines that matches the hunk, and apply it.
> 
> For a hunk to match it means that all the lines that start with a - or a
> space ' ' should match, in order.  This is called the context of the
> diff.  (I'm actually not sure wether "context line" only applies to the
> ' ' lines or to both ' ' and '-' lines.)
> 
> > For instance, -w made my diff apply cleanly, but the whitespace was on
> > a changed line.
> 
> Your diff failed to apply because the deleted lines were different from
> the ones in the file (because of the trailing whitespaces) and so got
> patch couldn't find the right offset in the file.  After it learned how
> to ignore the whitespaces, the context matched and the lines were
> changed (althought a few become no-op.)

We covered this on irc, but just to recap for posterity: I don't think
it's wrong to say "Ignore whitespace in context lines"--I was just
looking for how we're defining it in got as some implementations suggest
context is just the lines beginning with ' '.

> > > If I understand linecmp() correctly, it's like diff -w?
> > > 
> > > Meanwhile patch -l seems to be more like diff -b?
> > 
> > I like Fossil's semantics for this:
> > 
> >   -w|--ignore-all-space       Ignore white space when comparing lines
> >   -Z|--ignore-trailing-space  Ignore changes to end-of-line whitespace
> > 
> > but I think -w and -W might be better flags than -w and -Z.
> 
> I'm not sure how valuable is to provide both options actually.  I think
> that the less flag we have, the better.  Also, users are really
> interested in -w vs -Z?  I think they're more interested in got applying
> the patch.

I agree with the less flags the better. In fact, this is where I love
got: it provides all the functionality with a simple interface.  That
lack of complexity for the user is a stark contrast with some other
VCSs.

In the case of -w and -Z; I was thinking in terms of how lax we wanted
the default to be. But if we're happy with ignoring all ws, then, yes,
the one option is fine.

> I was going with -w with the same semantics as Fossil here until Stefan
> convinced me that there's no point in running the same command twice :)
> 
> > I'm not sure how I feel about this being the default though. I admit,
> > it's nice not having failed patches, but as far as I'm aware, no other
> > patch implementations do it by default so it might fail the Rule of
> > Least Surprise. And--ironically coming from the person who sent out the
> > mangled diff that instigated this--it shouldn't happen often. I can't
> > even remember the last time I did it.
> 
> I have to agree here, my experience is that there aren't _that_ many
> diffs with mangled spaces.  There are more diff around with wrapped
> lines and perfectly fine diffs that fails to apply due to other changes
> in between (fortunately now we have a diff3 merge for patches too.)
> 
> > Plus, with the:
> > 
> > "@@ -295,7 +305,7 @@ hunk contains mangled whitespace"
> > 
> > output, users can then rerun with -w.
> > 
> > That said, I really am ambivalent and can't decide which way I fall on
> > this.
> 
> I'm still for having it enabled by default and not giving the option to
> turn it off :)

The more thought I've given this, the more I'm coming around to stsp's
and your position. Invariably, the first thing a user will do after
seeing a failed patch with the "hunk contains mangled whitespace" report
is run `got patch -w ...` so why not do it the first time?

I think I'm sold :)

> An argument that can be made thought is that 'got patch' could try
> harder before starting to ignore whitespaces.
> 
> At the moment 'got patch' works in a mostly linear fashion: the original
> file is traversed from the start and we (almost never) jump back.  (in
> reality we do some backward jumps, but we don't try as hard as patch(1)
> does.)
> 
> I have something in mind for applying diffs with fuzz (i.e. ignoring up
> to N lines of context) and I was thinking of doing some changes in the
> patch machinery to accomodate multiple passes on the same file.  With
> that we could move this whitespace-loose matching so it's only done in a
> second pass.  The idea could be:
> 
>  first pass: try to apply the patch, eventually at offset
>  second pass: try again with a whitespace loose matching
>  third pass: whitespaces + fuzz
> 
> ideas?  (note that it'll take still a while for this changes, can't
> provide an eta as i haven't started working on it yet :-)

No rush. I haven't had a `got patch` fail yet (despite being the cause
of perhaps everyone on the list having had it happen to them :)

-- 
Mark Jamsek <fnc.bsdbox.org>
GPG: F2FF 13DE 6A06 C471 CA80  E6E2 2930 DC66 86EE CF68