From: Mark Jamsek Subject: Re: diff -p function prototypes regression To: Stefan Sperling Cc: gameoftrees@openbsd.org, Tom Jones Date: Tue, 22 Aug 2023 21:41:12 +1000 Stefan Sperling wrote: > Since diff.git commit 8993f42562696079866fc2dec7191877b6cc1f18 which > intentionally changed the line index where we start searching upwards > for a function prototype, there is a difference in behaviour compared > to traditional usr.bin/diff -p. Unfortunately this difference can result > in confusing output, such as the following reported by tb@ and sthen@. > > Prototype line appears after changed lines, instead of before: > [[[ > (this blob exists in the OpenBSD src.git repo on Github) > blob - 25333f98510f8805602824e43e841f1df4f95769 > file + btrace.c > --- btrace.c > +++ btrace.c > @@ -117,6 +117,7 @@ uint64_t bt_filtered; /* # of events filtered out */ > struct dtioc_arg_info **dt_args; /* array of probe arguments */ > > struct dt_evt bt_devt; /* fake event for BEGIN/END */ > +#define FOO 0 > uint64_t bt_filtered; /* # of events filtered out */ > > struct syms *kelf, *uelf; > ]]] > > Prototype line appears within changed lines, instead of before: > [[[ > (this blob exists in the OpenBSD ports.git repo on Github) > blob - 62353bfe8ea8bc8fab8208fe41bf34edfdac30d2 > file + Makefile > --- Makefile > +++ Makefile > @@ -2,7 +2,7 @@ DISTNAME = nim-1.6.12 > > COMMENT = statically typed systems programming language > > -DISTNAME = nim-1.6.12 > +DISTNAME = nim-1.6.14 > EXTRACT_SUFX = .tar.xz > > CATEGORIES = lang > ]]] > > My patch below reverts commits 8993f42562696079866fc2dec7191877b6cc1f18, > such that our diff output matches the output of usr/bin/diff -p again: > [[[ > --- btrace.c > +++ btrace.c > @@ -117,6 +117,7 @@ size_t dt_ndtpi; /* # of elements in the array */ > struct dtioc_arg_info **dt_args; /* array of probe arguments */ > > struct dt_evt bt_devt; /* fake event for BEGIN/END */ > +#define FOO 0 > uint64_t bt_filtered; /* # of events filtered out */ > > struct syms *kelf, *uelf; > ]]] > [[[ > --- Makefile > +++ Makefile > @@ -2,7 +2,7 @@ ONLY_FOR_ARCHS = amd64 arm64 i386 > > COMMENT = statically typed systems programming language > > -DISTNAME = nim-1.6.12 > +DISTNAME = nim-1.6.14 > EXTRACT_SUFX = .tar.xz > > CATEGORIES = lang > ]]] > > Of course this re-introduces the original problem which Tom intended > to fix. I don't know if there is a way to prevent the above problematic > cases while still showing the prototype of a newly added function. > Is this possible? I haven't looked at it since around the time of this tweak, but it should be doable. For the first example above, I think it is showing the prototype line after the added line because the line indexes are based on the old file. If we look at what type of line the first change is (i.e., '+' or '-'), we might be able to adjust the index accordingly. For the second example, IIRC, this was one of the motivations behind the change; that is, if the change involves a function protoype, before this tweak, the info we'd show in the hunk header was the function above the change, which was misleading. The tweak was an attempt at ensuring such a change would display the changed function in the hunk header--as is done in the DISTNAME example. Although we would need to confirm this with Tom as I'm not certain that I'm remembering this right. > There is one diff test which needs adjustment to expected results > after fixing our deviation from usr.bin/diff behaviour. > > All Got regress tests keep passing as-is. > > ok? It's a tough trade-off, but I'm inclined to lean toward tb's and your suggestion: revert. Although both options can produce somewhat confusing info, the status quo deviates from other tools providing this feature and would seem to produce confusing output more often. That said, I'd really like to dig into this at some point as there's definitely room for improvement. In the same vein as your reply to tb, when I raised adding this to Fossil, drh ultimately rejected the idea unless we generalise it to more languages because Fossil is not a C-exclusive SCM. While I don't exactly share the same position--I see the advantage in its current limited form--I do agree that it would be good to expand its utility and accuracy. so ok for the diff and I'll add generalising this heuristic to my evergrowing list of items unless someone beats me to it :) > diff /home/stsp/src/diff > commit - f46fa9b5e78156c0360f508336dbe3dfa04b503f > path + /home/stsp/src/diff > blob - 6855889d9612f734106a574e7b7ff947b56823d5 > file + lib/diff_internal.h > --- lib/diff_internal.h > +++ lib/diff_internal.h > @@ -148,8 +148,7 @@ int diff_output_trailing_newline_msg(struct diff_outpu > int diff_output_match_function_prototype(char *prototype, size_t prototype_size, > int *last_prototype_idx, > const struct diff_result *result, > - const struct diff_chunk_context *cc, > - unsigned int ncontext); > + const struct diff_chunk_context *cc); > > struct diff_output_info *diff_output_info_alloc(void); > > blob - f44e799638f771a0588c11e3d5b6b1200b5c64f1 > file + lib/diff_output.c > --- lib/diff_output.c > +++ lib/diff_output.c > @@ -263,19 +263,17 @@ is_function_prototype(unsigned char ch) > int > diff_output_match_function_prototype(char *prototype, size_t prototype_size, > int *last_prototype_idx, const struct diff_result *result, > - const struct diff_chunk_context *cc, unsigned int ncontext) > + const struct diff_chunk_context *cc) > { > struct diff_atom *start_atom, *atom; > const struct diff_data *data; > unsigned char buf[DIFF_FUNCTION_CONTEXT_SIZE]; > const char *state = NULL; > - int rc, i, ch, idx; > + int rc, i, ch; > > - idx = MIN(cc->left.start + (ncontext ? ncontext : 0), cc->left.end - 1); > - > if (result->left->atoms.len > 0 && cc->left.start > 0) { > data = result->left; > - start_atom = &data->atoms.head[idx]; > + start_atom = &data->atoms.head[cc->left.start - 1]; > } else > return DIFF_RC_OK; > > blob - 4757099644a9c1eec21e1af9102ba13ba417d609 > file + lib/diff_output_unidiff.c > --- lib/diff_output_unidiff.c > +++ lib/diff_output_unidiff.c > @@ -237,7 +237,7 @@ output_unidiff_chunk(struct diff_output_info *outinfo, > const struct diff_input_info *info, > const struct diff_result *result, > bool print_header, bool show_function_prototypes, > - const struct diff_chunk_context *cc, unsigned int ncontext) > + const struct diff_chunk_context *cc) > { > int rc, left_start, left_len, right_start, right_len; > off_t outoff = 0, *offp; > @@ -304,7 +304,7 @@ output_unidiff_chunk(struct diff_output_info *outinfo, > if (show_function_prototypes) { > rc = diff_output_match_function_prototype(state->prototype, > sizeof(state->prototype), &state->last_prototype_idx, > - result, cc, ncontext); > + result, cc); > if (rc) > return rc; > } > @@ -433,7 +433,7 @@ diff_output_unidiff_chunk(struct diff_output_info **ou > } > > return output_unidiff_chunk(outinfo, dest, state, info, > - result, false, show_function_prototypes, cc, 0); > + result, false, show_function_prototypes, cc); > } > > int > @@ -586,7 +586,7 @@ diff_output_unidiff(struct diff_output_info **output_i > " print left %d-%d right %d-%d\n", > cc.left.start, cc.left.end, cc.right.start, cc.right.end); > output_unidiff_chunk(outinfo, dest, state, info, result, > - true, show_function_prototypes, &cc, context_lines); > + true, show_function_prototypes, &cc); > cc = next; > debug("new unprinted chunk is left %d-%d right %d-%d\n", > cc.left.start, cc.left.end, cc.right.start, cc.right.end); > @@ -594,7 +594,7 @@ diff_output_unidiff(struct diff_output_info **output_i > > if (!diff_chunk_context_empty(&cc)) > output_unidiff_chunk(outinfo, dest, state, info, result, > - true, show_function_prototypes, &cc, context_lines); > + true, show_function_prototypes, &cc); > diff_output_unidiff_state_free(state); > return DIFF_RC_OK; > } > blob - 2ae051a0a1e04a0e219eb0a362d5d74f2039a76f > file + test/expect124.diff > --- test/expect124.diff > +++ test/expect124.diff > @@ -1,6 +1,6 @@ > --- test124.left-p.txt > +++ test124.right-p.txt > -@@ -11,5 +11,5 @@ return_test(int test) { > +@@ -11,5 +11,5 @@ doSomethingThenPrintHello(int test) > > struct testfile * > return_test(int test) { -- Mark Jamsek GPG: F2FF 13DE 6A06 C471 CA80 E6E2 2930 DC66 86EE CF68