"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Stefan Sperling <stsp@stsp.name>
Subject:
Re: diff: Function names appearing inside diff context
To:
Mark Jamsek <mark@jamsek.com>
Cc:
Tom Jones <thj@freebsd.org>, gameoftrees@openbsd.org
Date:
Fri, 23 Sep 2022 16:04:30 +0200

Download raw body.

Thread
On Fri, Sep 23, 2022 at 11:25:24PM +1000, Mark Jamsek wrote:
> Nice work! I think you're on the right track in that this is where the
> change is needed; however, we need the actual number of context lines
> because hardcoding +3 assumes the default. And if the user invokes, for
> example, `got diff -C0`, we segfault. And `got diff -C5` returns
> incorrect results, which we can see with your test case. We can't
> extrapolate this number from the cc->left.{start,end} range either
> because of the case where close-together-changes are combined into one
> hunk, so we need to pass the number of context lines to the routine.

Agreed, thanks Tom for tracking this down.

By the way, the left_start index points to the first line of context
because it is set up this way in diff_chunk_context_get(). So the fix
is to start scanning lines backwards from the first -/+ line of the diff
(left_start + ncontext), rather than the first context line (left_start).

> The below diff does this; we start scanning for the enclosing function
> at the first changed line. We could arguably scan from the line
> immediately preceding the first changed line, but that would miss the
> case where the line with the function name is changed.  This works with
> your test case using various numbers of context lines, but if you have
> other test cases, further testing would be good!

I think there is one problem in the proposed change:

> diff refs/heads/main refs/heads/fix/diffp
> commit - 2ad87fffe4c7553e1593ff3f3ece274cf3a48931
> commit + cb2ad2671364d70b673526a58e992791b2acf929
> blob - 396f264aae799c02ce62d47c6762b7dc5bd05331
> blob + be72763dedcb12e2baabb6fda64e06d2fcf5f598
> --- lib/diff_internal.h
> +++ lib/diff_internal.h
> @@ -177,7 +177,8 @@ int diff_output_match_function_prototype(char *prototy
>  int diff_output_match_function_prototype(char *prototype, size_t prototype_size,
>  					 int *last_prototype_idx,
>  					 const struct diff_result *result,
> -					 const struct diff_chunk_context *cc);
> +					 const struct diff_chunk_context *cc,
> +					 unsigned int ncontext);
>  
>  struct diff_output_info *diff_output_info_alloc(void);
>  
> blob - 3047221aa5a15e4fc2a5dea8267e97864da21264
> blob + f166811cc62a9943a09ddc21bb3cdfd9ac510a5d
> --- lib/diff_output.c
> +++ lib/diff_output.c
> @@ -270,17 +270,17 @@ diff_output_match_function_prototype(char *prototype, 
>  int
>  diff_output_match_function_prototype(char *prototype, size_t prototype_size,
>      int *last_prototype_idx, const struct diff_result *result,
> -    const struct diff_chunk_context *cc)
> +    const struct diff_chunk_context *cc, unsigned int ncontext)
>  {
>  	struct diff_atom *start_atom, *atom;
>  	const struct diff_data *data;
>  	unsigned char buf[DIFF_FUNCTION_CONTEXT_SIZE];
>  	const char *state = NULL;
> -	int rc, i, ch;
> +	int rc, i, ch, idx = cc->left.start + (ncontext ? ncontext : 0);

You are adding the number of context lines to left.start without any clamping.

In diff_chunk_get_left_start(), while setting up cc->left.start as ...

  diff_atom_root_idx(r->left, c->left_start) - ncontext

... we clamp any negative result of this subtraction to zero, in case the
desired amount of context lines would otherwise make us search backwards
beyond the beginning of the file.

In case cc->left.start was clamped you might end up with an idx that
points outside the current diff chunk, and potentially even beyond the
end of the diff atoms array.

I would suggest doing something like this instead (untested):

  idx = MIN(cc->left.start + (ncontext ? ncontext : 0), cc->left.end);

>  	if (result->left->atoms.len > 0 && cc->left.start > 0) {
>  		data = result->left;
> -		start_atom = &data->atoms.head[cc->left.start - 1];
> +		start_atom = &data->atoms.head[idx];
>  	} else
>  		return DIFF_RC_OK;