From: Mark Jamsek <mark@jamsek.com>
Subject: Re: diff -p function prototypes regression
To: Stefan Sperling <stsp@stsp.name>
Cc: gameoftrees@openbsd.org, Tom Jones <thj@freebsd.org>
Date: Tue, 22 Aug 2023 21:41:12 +1000

Stefan Sperling <stsp@stsp.name> wrote:
> Since diff.git commit 8993f42562696079866fc2dec7191877b6cc1f18 which
> intentionally changed the line index where we start searching upwards
> for a function prototype, there is a difference in behaviour compared
> to traditional usr.bin/diff -p. Unfortunately this difference can result
> in confusing output, such as the following reported by tb@ and sthen@.
> 
> Prototype line appears after changed lines, instead of before:
> [[[
> (this blob exists in the OpenBSD src.git repo on Github)
> blob - 25333f98510f8805602824e43e841f1df4f95769
> file + btrace.c
> --- btrace.c
> +++ btrace.c
> @@ -117,6 +117,7 @@ uint64_t             bt_filtered;   /* # of events filtered out */
>  struct dtioc_arg_info  **dt_args;      /* array of probe arguments */
> 
>  struct dt_evt           bt_devt;       /* fake event for BEGIN/END */
> +#define FOO    0
>  uint64_t                bt_filtered;   /* # of events filtered out */
> 
>  struct syms            *kelf, *uelf;
> ]]]
> 
> Prototype line appears within changed lines, instead of before:
> [[[
> (this blob exists in the OpenBSD ports.git repo on Github)
> blob - 62353bfe8ea8bc8fab8208fe41bf34edfdac30d2
> file + Makefile
> --- Makefile
> +++ Makefile
> @@ -2,7 +2,7 @@ DISTNAME =		nim-1.6.12
>  
>  COMMENT =		statically typed systems programming language
>  
> -DISTNAME =		nim-1.6.12
> +DISTNAME =		nim-1.6.14
>  EXTRACT_SUFX =		.tar.xz
>  
>  CATEGORIES =		lang
> ]]]
> 
> My patch below reverts commits 8993f42562696079866fc2dec7191877b6cc1f18,
> such that our diff output matches the output of usr/bin/diff -p again:
> [[[
> --- btrace.c
> +++ btrace.c
> @@ -117,6 +117,7 @@ size_t			 dt_ndtpi;	/* # of elements in the array */
>  struct dtioc_arg_info  **dt_args;	/* array of probe arguments */
>  
>  struct dt_evt		 bt_devt;	/* fake event for BEGIN/END */
> +#define FOO	0
>  uint64_t		 bt_filtered;	/* # of events filtered out */
>  
>  struct syms		*kelf, *uelf;
> ]]]
> [[[
> --- Makefile
> +++ Makefile
> @@ -2,7 +2,7 @@ ONLY_FOR_ARCHS =        amd64 arm64 i386
> 
>  COMMENT =              statically typed systems programming language
> 
> -DISTNAME =             nim-1.6.12
> +DISTNAME =             nim-1.6.14
>  EXTRACT_SUFX =         .tar.xz
> 
>  CATEGORIES =           lang
> ]]]
> 
> Of course this re-introduces the original problem which Tom intended
> to fix. I don't know if there is a way to prevent the above problematic
> cases while still showing the prototype of a newly added function.
> Is this possible?

I haven't looked at it since around the time of this tweak, but it
should be doable. For the first example above, I think it is showing the
prototype line after the added line because the line indexes are based
on the old file. If we look at what type of line the first change is
(i.e., '+' or '-'), we might be able to adjust the index accordingly.

For the second example, IIRC, this was one of the motivations behind the
change; that is, if the change involves a function protoype, before this
tweak, the info we'd show in the hunk header was the function above the
change, which was misleading. The tweak was an attempt at ensuring such
a change would display the changed function in the hunk header--as is
done in the DISTNAME example. Although we would need to confirm this
with Tom as I'm not certain that I'm remembering this right.

> There is one diff test which needs adjustment to expected results
> after fixing our deviation from usr.bin/diff behaviour.
> 
> All Got regress tests keep passing as-is.
> 
> ok?

It's a tough trade-off, but I'm inclined to lean toward tb's and your
suggestion: revert. Although both options can produce somewhat confusing
info, the status quo deviates from other tools providing this feature
and would seem to produce confusing output more often. That said, I'd
really like to dig into this at some point as there's definitely room
for improvement. In the same vein as your reply to tb, when I raised
adding this to Fossil, drh ultimately rejected the idea unless we
generalise it to more languages because Fossil is not a C-exclusive SCM.
While I don't exactly share the same position--I see the advantage in
its current limited form--I do agree that it would be good to expand its
utility and accuracy.

so ok for the diff and I'll add generalising this heuristic to my
evergrowing list of items unless someone beats me to it :)

> diff /home/stsp/src/diff
> commit - f46fa9b5e78156c0360f508336dbe3dfa04b503f
> path + /home/stsp/src/diff
> blob - 6855889d9612f734106a574e7b7ff947b56823d5
> file + lib/diff_internal.h
> --- lib/diff_internal.h
> +++ lib/diff_internal.h
> @@ -148,8 +148,7 @@ int diff_output_trailing_newline_msg(struct diff_outpu
>  int diff_output_match_function_prototype(char *prototype, size_t prototype_size,
>  					 int *last_prototype_idx,
>  					 const struct diff_result *result,
> -					 const struct diff_chunk_context *cc,
> -					 unsigned int ncontext);
> +					 const struct diff_chunk_context *cc);
>  
>  struct diff_output_info *diff_output_info_alloc(void);
>  
> blob - f44e799638f771a0588c11e3d5b6b1200b5c64f1
> file + lib/diff_output.c
> --- lib/diff_output.c
> +++ lib/diff_output.c
> @@ -263,19 +263,17 @@ is_function_prototype(unsigned char ch)
>  int
>  diff_output_match_function_prototype(char *prototype, size_t prototype_size,
>      int *last_prototype_idx, const struct diff_result *result,
> -    const struct diff_chunk_context *cc, unsigned int ncontext)
> +    const struct diff_chunk_context *cc)
>  {
>  	struct diff_atom *start_atom, *atom;
>  	const struct diff_data *data;
>  	unsigned char buf[DIFF_FUNCTION_CONTEXT_SIZE];
>  	const char *state = NULL;
> -	int rc, i, ch, idx;
> +	int rc, i, ch;
>  
> -	idx = MIN(cc->left.start + (ncontext ? ncontext : 0), cc->left.end - 1);
> -
>  	if (result->left->atoms.len > 0 && cc->left.start > 0) {
>  		data = result->left;
> -		start_atom = &data->atoms.head[idx];
> +		start_atom = &data->atoms.head[cc->left.start - 1];
>  	} else
>  		return DIFF_RC_OK;
>  
> blob - 4757099644a9c1eec21e1af9102ba13ba417d609
> file + lib/diff_output_unidiff.c
> --- lib/diff_output_unidiff.c
> +++ lib/diff_output_unidiff.c
> @@ -237,7 +237,7 @@ output_unidiff_chunk(struct diff_output_info *outinfo,
>  		     const struct diff_input_info *info,
>  		     const struct diff_result *result,
>  		     bool print_header, bool show_function_prototypes,
> -		     const struct diff_chunk_context *cc, unsigned int ncontext)
> +		     const struct diff_chunk_context *cc)
>  {
>  	int rc, left_start, left_len, right_start, right_len;
>  	off_t outoff = 0, *offp;
> @@ -304,7 +304,7 @@ output_unidiff_chunk(struct diff_output_info *outinfo,
>  	if (show_function_prototypes) {
>  		rc = diff_output_match_function_prototype(state->prototype,
>  		    sizeof(state->prototype), &state->last_prototype_idx,
> -		    result, cc, ncontext);
> +		    result, cc);
>  		if (rc)
>  			return rc;
>  	}
> @@ -433,7 +433,7 @@ diff_output_unidiff_chunk(struct diff_output_info **ou
>  	}
>  
>  	return output_unidiff_chunk(outinfo, dest, state, info,
> -	    result, false, show_function_prototypes, cc, 0);
> +	    result, false, show_function_prototypes, cc);
>  }
>  
>  int
> @@ -586,7 +586,7 @@ diff_output_unidiff(struct diff_output_info **output_i
>  		      " print left %d-%d right %d-%d\n",
>  		      cc.left.start, cc.left.end, cc.right.start, cc.right.end);
>  		output_unidiff_chunk(outinfo, dest, state, info, result,
> -		    true, show_function_prototypes, &cc, context_lines);
> +		    true, show_function_prototypes, &cc);
>  		cc = next;
>  		debug("new unprinted chunk is left %d-%d right %d-%d\n",
>  		      cc.left.start, cc.left.end, cc.right.start, cc.right.end);
> @@ -594,7 +594,7 @@ diff_output_unidiff(struct diff_output_info **output_i
>  
>  	if (!diff_chunk_context_empty(&cc))
>  		output_unidiff_chunk(outinfo, dest, state, info, result,
> -		    true, show_function_prototypes, &cc, context_lines);
> +		    true, show_function_prototypes, &cc);
>  	diff_output_unidiff_state_free(state);
>  	return DIFF_RC_OK;
>  }
> blob - 2ae051a0a1e04a0e219eb0a362d5d74f2039a76f
> file + test/expect124.diff
> --- test/expect124.diff
> +++ test/expect124.diff
> @@ -1,6 +1,6 @@
>  --- test124.left-p.txt
>  +++ test124.right-p.txt
> -@@ -11,5 +11,5 @@ return_test(int test) {
> +@@ -11,5 +11,5 @@ doSomethingThenPrintHello(int test)
>   
>   struct testfile *
>   return_test(int test) {


-- 
Mark Jamsek <https://bsdbox.org>
GPG: F2FF 13DE 6A06 C471 CA80  E6E2 2930 DC66 86EE CF68