From: Mark Jamsek Subject: Re: got rebase or diff library bug To: Christian Weisgerber Cc: gameoftrees@openbsd.org Date: Mon, 05 Aug 2024 00:32:54 +1000 Christian Weisgerber wrote: > Weird things happen when you commit a change that adds a line without > terminating '\n' at the end of a file, and then rebase or histedit > that change. > > $ cd src # OpenBSD src checkout > $ got br -c master dummy > Switching work tree from refs/heads/master to refs/heads/dummy > $ tar xzvf ~/llvm18.tar.gz '*/GCNRegPressure.cpp' > gnu/llvm/llvm/lib/Target/AMDGPU/GCNRegPressure.cpp > $ hexdump -C gnu/llvm/llvm/lib/Target/AMDGPU/GCNRegPressure.cpp | tail -3 > 000053d0 20 66 61 6c 73 65 3b 0a 0a 23 75 6e 64 65 66 20 | false;..#undef | > 000053e0 50 46 58 0a 7d |PFX.}| > 000053e5 > $ got ci -m whatever > M gnu/llvm/llvm/lib/Target/AMDGPU/GCNRegPressure.cpp > Created commit 28d598d2212490682514aca54d70befb8e86e176 > $ got up -c master > U gnu/llvm/llvm/lib/Target/AMDGPU/GCNRegPressure.cpp > Updated to refs/heads/dummy: 141c93e2a20bfc66591c104e112b9ad964e23e22 > $ got he > [... pick commit...] > $ hexdump -C gnu/llvm/llvm/lib/Target/AMDGPU/GCNRegPressure.cpp | tail -3 > 000053e0 6e 20 66 61 6c 73 65 3b 0a 0a 23 75 6e 64 65 66 |n false;..#undef| > 000053f0 20 50 46 58 0a 7d 2e 0a 34 37 31 63 0a | PFX.}..471c.| > 000053fd > > In this case, ".\n471c\n" is reproducibly appended. Some sort of > internal leakage from the diff library? > > > I just tried a simpler case in a dummy repository with a file "hello" > > hello > world > > then "echo -n fubar >>hello", and after commit and histedit... the > file is empty. > > Commit diff before, from the histedit backup: > > --- hello > +++ hello > @@ -1,2 +1,3 @@ > hello > world > +fubar > \ No newline at end of file > > Commit diff after histedit: > > --- hello > +++ hello > @@ -1,2 +0,0 @@ > -hello > -world > > Ugh. This looks to be caused by how we are currently mishandling malformed text files (i.e., that aren't terminated with a newline) when generating our ed script for the 3-way merge. We don't propagate the error either so the failure goes unreported. I don't think we want to do what ed(1) does and add the terminating newline character to text files while leaving binary files unchanged; we shouldn't be modifying tracked files at all. The below diff is minimally invasive and very simple. We ensure a proper ed script is generated in the case of diffs involving a malformed line but when writing the merged file, we elide the added terminating newline. Regress is happy with the fix, which appears to address naddy's simple repro, but I'd like to confirm that it also addresses the llvm case. I want to add a regress of naddy's repro, too, but have run out of time tonight. diffstat f997e889eb0517a24a28f2f4000d6915683fef86 1f22924926648afd8014664183020e4eea66e823 M lib/diff3.c | 8+ 1- 1 file changed, 8 insertions(+), 1 deletion(-) diff f997e889eb0517a24a28f2f4000d6915683fef86 1f22924926648afd8014664183020e4eea66e823 commit - f997e889eb0517a24a28f2f4000d6915683fef86 commit + 1f22924926648afd8014664183020e4eea66e823 blob - c1ca2fdf44d461cda6adb950263ed6a9775971ac blob + 6787a5d4356eaa79b3f2a6ba81b92a2afbf82ef9 --- lib/diff3.c +++ lib/diff3.c @@ -155,6 +155,8 @@ struct diff3_state { char *buf; + int no_eofnl; /* set if the merged file has no eof newline */ + BUF *diffbuf; }; @@ -417,6 +419,9 @@ out: err = got_error_from_errno("fclose"); } if (err == NULL && diffb) { + if (d3s->no_eofnl == 1 && + diffb->cb_buf[diffb->cb_len - 1] == '\n') + --diffb->cb_len; if (buf_write_fd(diffb, outfd) < 0) err = got_error_from_errno("buf_write_fd"); *overlapcnt = d3s->overlapcnt; @@ -1063,7 +1068,9 @@ edscript(int n, struct diff3_state *d3s) } if (!d3s->overlap[n]) { - err = diff_output(d3s->diffbuf, ".\n"); + d3s->no_eofnl = line[linelen - 1] != '\n'; + err = diff_output(d3s->diffbuf, "%s.\n", + d3s->no_eofnl == 1 ? "\n" : ""); if (err) goto done; } else { -- Mark Jamsek GPG: F2FF 13DE 6A06 C471 CA80 E6E2 2930 DC66 86EE CF68