Download raw body.
diff vs CRLF
On Mon, Jul 25, 2022 at 03:34:29PM +0200, Omar Polo wrote:
> `got diff' has some issues when dealing with CRLF files. I stumbled on
> this when looking at the history of
>
> https://github.com/fcorbelli/zpaqfranz
>
> for example:
>
> % got log -l1 -p -c 90eafd8 | head -n30
> ...
> --- zpaqfranz.cpp
> +++ zpaqfranz.cpp
> @@ -1,87483 +1,45670 @@
> -///////// EXPERIMENTAL BUILD
> \ No newline at end of file
> +///////// EXPERIMENTAL BUILD
> ...
> +///////// https://sourceforge.net/projects/zpaqfranz/
>
> -///////// The source is a mess
> \ No newline at end of file
> +///////// I had to reluctantly move parts of the comments ...
>
> the "\ Now newline at end of file" is spammed very frequently across the
> diff.
>
> i tracked this down to diff_data_atomize_text_lines_{fd,mmap}: there's
> an error in accounting for \r\n. it's easy to reproduce with a smaller
> file:
>
> % printf 'test\r\n' > crlf
> % got add crlf && got ci -m +crlf
> % printf 'test 2\r\n' > crlf
> % got di
> ...
> @@ -1,2 +1,2 @@
> -test
> +test 2
>
> %
>
> notice the empty line after 'test 2' and the wrong line offsets in the
> hunk header, should be '@@ -1 +1 @@'.
>
> (not that it matters that much; `got patch', Larry' patch(1) and GNU'
> patch don't work on CRLF files anyway, this is just aesthetic for the
> reader)
>
> patch below fixes the accounting in `got di' so it doesn't produce these
> strange results on CRLF-delimited files. Should apply in diff.git too.
>
> ok?
>
> P.S.: it's just me or the regress of diff.git is broken? i can't get it
> to run cleanly after a fresh checkout. got regress is happy though.
Yes, regress in diff.git is a bit broken.
> P.P.S: we care enough about CRLF to add some regress?
I don't see why not. It's always good to cover additional edge cases in tests.
Your patch makes sense, ok stsp@
> diff /home/op/w/got
> commit - 10a16316a03005a07c45b2bbf1b5644b64e846fb
> path + /home/op/w/got
> blob - 0531fabe30530664069034f1c3166fe9d18a6e54
> file + lib/diff_atomize_text.c
> --- lib/diff_atomize_text.c
> +++ lib/diff_atomize_text.c
> @@ -92,7 +92,7 @@ diff_data_atomize_text_lines_fd(struct diff_data *d)
> r = fread(buf, sizeof(char), sizeof(buf), d->root->f);
> if (r == 0 && ferror(d->root->f))
> return errno;
> - if (r == 1 && buf[0] == '\n' )
> + if (r > 0 && buf[0] == '\n')
> line_end++;
> }
>
> @@ -151,11 +151,9 @@ diff_data_atomize_text_lines_mmap(struct diff_data *d)
>
> /* When not at the end of data, the line ending char ('\r' or
> * '\n') must follow */
> - if (line_end < end)
> + if (line_end < end && *line_end == '\r')
> line_end++;
> - /* If that was an '\r', also pull in any following '\n' */
> - if (line_end < end - 1 && line_end[0] == '\r' &&
> - line_end[1] == '\n')
> + if (line_end < end && *line_end == '\n')
> line_end++;
>
> /* Record the found line as diff atom */
>
>
diff vs CRLF