"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Stefan Sperling <stsp@stsp.name>
Subject:
Re: diff vs CRLF
To:
Omar Polo <op@omarpolo.com>
Cc:
gameoftrees@openbsd.org
Date:
Tue, 26 Jul 2022 14:34:33 +0200

Download raw body.

Thread
  • Omar Polo:

    diff vs CRLF

    • Omar Polo:

      diff vs CRLF

    • Stefan Sperling:

      diff vs CRLF

On Mon, Jul 25, 2022 at 03:34:29PM +0200, Omar Polo wrote:
> `got diff' has some issues when dealing with CRLF files.  I stumbled on
> this when looking at the history of
> 
> 	https://github.com/fcorbelli/zpaqfranz
> 
> for example:
> 
> 	% got log -l1 -p -c 90eafd8 | head -n30
> 	...
> 	--- zpaqfranz.cpp
> 	+++ zpaqfranz.cpp
> 	@@ -1,87483 +1,45670 @@
> 	-///////// EXPERIMENTAL BUILD
> 	\ No newline at end of file
> 	+///////// EXPERIMENTAL BUILD
> 	...
> 	+///////// https://sourceforge.net/projects/zpaqfranz/
> 	
> 	-///////// The source is a mess
> 	\ No newline at end of file
> 	+///////// I had to reluctantly move parts of the comments ...
> 
> the "\ Now newline at end of file" is spammed very frequently across the
> diff.
> 
> i tracked this down to diff_data_atomize_text_lines_{fd,mmap}: there's
> an error in accounting for \r\n.  it's easy to reproduce with a smaller
> file:
> 
> 	% printf 'test\r\n' > crlf
> 	% got add crlf && got ci -m +crlf
> 	% printf 'test 2\r\n' > crlf
> 	% got di
> 	...
> 	@@ -1,2 +1,2 @@
> 	-test
> 	+test 2
> 
> 	%
> 
> notice the empty line after 'test 2' and the wrong line offsets in the
> hunk header, should be '@@ -1 +1 @@'.
> 
> (not that it matters that much; `got patch', Larry' patch(1) and GNU'
> patch don't work on CRLF files anyway, this is just aesthetic for the
> reader)
> 
> patch below fixes the accounting in `got di' so it doesn't produce these
> strange results on CRLF-delimited files.  Should apply in diff.git too.
> 
> ok?
> 
> P.S.: it's just me or the regress of diff.git is broken?  i can't get it
> to run cleanly after a fresh checkout.  got regress is happy though.

Yes, regress in diff.git is a bit broken.

> P.P.S: we care enough about CRLF to add some regress?

I don't see why not. It's always good to cover additional edge cases in tests.
 
Your patch makes sense, ok stsp@

> diff /home/op/w/got
> commit - 10a16316a03005a07c45b2bbf1b5644b64e846fb
> path + /home/op/w/got
> blob - 0531fabe30530664069034f1c3166fe9d18a6e54
> file + lib/diff_atomize_text.c
> --- lib/diff_atomize_text.c
> +++ lib/diff_atomize_text.c
> @@ -92,7 +92,7 @@ diff_data_atomize_text_lines_fd(struct diff_data *d)
>  			r = fread(buf, sizeof(char), sizeof(buf), d->root->f);
>  			if (r == 0 && ferror(d->root->f))
>  				return errno;
> -			if (r == 1 && buf[0] == '\n' )
> +			if (r > 0 && buf[0] == '\n')
>  				line_end++;
>  		}
>  
> @@ -151,11 +151,9 @@ diff_data_atomize_text_lines_mmap(struct diff_data *d)
>  
>  		/* When not at the end of data, the line ending char ('\r' or
>  		 * '\n') must follow */
> -		if (line_end < end)
> +		if (line_end < end && *line_end == '\r')
>  			line_end++;
> -		/* If that was an '\r', also pull in any following '\n' */
> -		if (line_end < end - 1 && line_end[0] == '\r' &&
> -		    line_end[1] == '\n')
> +		if (line_end < end && *line_end == '\n')
>  			line_end++;
>  
>  		/* Record the found line as diff atom */
> 
>