"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Omar Polo <op@omarpolo.com>
Subject:
diff vs CRLF
To:
gameoftrees@openbsd.org
Date:
Mon, 25 Jul 2022 15:34:29 +0200

Download raw body.

Thread
  • Omar Polo:

    diff vs CRLF

`got diff' has some issues when dealing with CRLF files.  I stumbled on
this when looking at the history of

	https://github.com/fcorbelli/zpaqfranz

for example:

	% got log -l1 -p -c 90eafd8 | head -n30
	...
	--- zpaqfranz.cpp
	+++ zpaqfranz.cpp
	@@ -1,87483 +1,45670 @@
	-///////// EXPERIMENTAL BUILD
	\ No newline at end of file
	+///////// EXPERIMENTAL BUILD
	...
	+///////// https://sourceforge.net/projects/zpaqfranz/
	
	-///////// The source is a mess
	\ No newline at end of file
	+///////// I had to reluctantly move parts of the comments ...

the "\ Now newline at end of file" is spammed very frequently across the
diff.

i tracked this down to diff_data_atomize_text_lines_{fd,mmap}: there's
an error in accounting for \r\n.  it's easy to reproduce with a smaller
file:

	% printf 'test\r\n' > crlf
	% got add crlf && got ci -m +crlf
	% printf 'test 2\r\n' > crlf
	% got di
	...
	@@ -1,2 +1,2 @@
	-test
	+test 2

	%

notice the empty line after 'test 2' and the wrong line offsets in the
hunk header, should be '@@ -1 +1 @@'.

(not that it matters that much; `got patch', Larry' patch(1) and GNU'
patch don't work on CRLF files anyway, this is just aesthetic for the
reader)

patch below fixes the accounting in `got di' so it doesn't produce these
strange results on CRLF-delimited files.  Should apply in diff.git too.

ok?

P.S.: it's just me or the regress of diff.git is broken?  i can't get it
to run cleanly after a fresh checkout.  got regress is happy though.

P.P.S: we care enough about CRLF to add some regress?

diff /home/op/w/got
commit - 10a16316a03005a07c45b2bbf1b5644b64e846fb
path + /home/op/w/got
blob - 0531fabe30530664069034f1c3166fe9d18a6e54
file + lib/diff_atomize_text.c
--- lib/diff_atomize_text.c
+++ lib/diff_atomize_text.c
@@ -92,7 +92,7 @@ diff_data_atomize_text_lines_fd(struct diff_data *d)
 			r = fread(buf, sizeof(char), sizeof(buf), d->root->f);
 			if (r == 0 && ferror(d->root->f))
 				return errno;
-			if (r == 1 && buf[0] == '\n' )
+			if (r > 0 && buf[0] == '\n')
 				line_end++;
 		}
 
@@ -151,11 +151,9 @@ diff_data_atomize_text_lines_mmap(struct diff_data *d)
 
 		/* When not at the end of data, the line ending char ('\r' or
 		 * '\n') must follow */
-		if (line_end < end)
+		if (line_end < end && *line_end == '\r')
 			line_end++;
-		/* If that was an '\r', also pull in any following '\n' */
-		if (line_end < end - 1 && line_end[0] == '\r' &&
-		    line_end[1] == '\n')
+		if (line_end < end && *line_end == '\n')
 			line_end++;
 
 		/* Record the found line as diff atom */