"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Josh Rickmar <joshrickmar@outlook.com>
Subject:
Re: fix 'got blame' segfault
To:
Stefan Sperling <stsp@stsp.name>
Cc:
gameoftrees@openbsd.org
Date:
Thu, 17 Dec 2020 19:59:16 -0500

Download raw body.

Thread
On Thu, Dec 17, 2020 at 09:35:46PM +0100, Stefan Sperling wrote:
> On IRC jrick reported that this command triggers a segfault:
> 
>   $ got blame distrib/miniroot/install.sub
> 
>   assertion "blame->f2" failed: file "/home/jrick/src/got/tog/../lib/blame.c", line 480, function "close_file2_and_reuse_file1"
> 
> In fact, both file pointers in the 'blame' struct are NULL.
> 
> The problem is that the commit graph traverses one commit too far.
> This can be more easily reproduced with 'got log':
> 
>  got log -c 05f568 -P distrib/miniroot/install.sub
> 
> This lists two commits:
>   05f568ecc6aadefa1aff9064a29e798874a71409
>   7c0d87f00e480cdf004324dad6f3e6f4418f8f42 
> 
> commit 05f568ecc6aade... added the file distrib/miniroot/install.sub.
> The above 'got log' command returns not just this commit as expected,
> it additionally returns commit 7c0d87f00e480cd...
> 
> The problem is a bug in the logic of got_object_tree_path_changed(),
> which causes the commit graph to add 7c0d87f00e480...  to the list of
> commits which must still be traversed, even though the history of the
> logged path has already ended.
> 
> Note that "distrib/miniroot" exists in 7c0d87f00e480cdf004324... but not
> in the parent of this commit.
> 
> At the point where are traversing "distrib/miniroot" the tree entry te1
> exists and corresponds to this tree listing the contents of 'miniroot'
> in 7c0d87f00e480cdf004324...:
> 
> $ got cat 180aa33df8d1

Where does object come from?

> 32da652eda37ea59d2f1a37b2a289a95aff7aa2a 0100644 Makefile
> 40d8ab1f48c46b8ff676812ed68cce2de57f5aa4 0100644 list
> 8fce89bffd7b91b1d4e679f05f707c7981744560 0100644 list2sh.awk
> a0af0e56d446e56a5253d7ae25721ba405ac1898 0100644 makeconf.awk
> 27bcebb1e71174a0bbcc6d24495ce0894c02e54c 0100644 mtree.conf
> 1ea9ff121981ba5cf3042227aea3c455924e779d 0100644 runlist.sh
> $
> 
> And now try to open te2, which fails because "distrib/miniroot" does not
> exist in the parent commit of 7c0d87f00e480cdf004324...:
> 
> 		te2 = find_entry_by_name(tree2, seg, seglen);
> 		if (te2 == NULL) {
> 			*changed = 1;
> 			goto done;
> 		}
> 
> Because te2 cannot be opened, the code assumes success.
> We now return with err == NULL and *changed = 1; the commit graph interprets
> this as 'install.sub' having changed. What's obviously missing here is an
> existence check for 'install.sub' in te1! (Alternatively, we could allow
> the caller to differentiate between the cases where the file was 'added'
> and where the file was 'changed', so they can stop asking for more commits
> if the file was added. But that would require changing all callers...)
> 
> The patch below ensures that we traverse the full path in tree1 even if an
> intermediate tree2 cannot be opened. Now we eventually hit this attempt to
> open the file 'install.sub' in tree object 180aa33df8d1 (tree1):
> 
> 		te1 = find_entry_by_name(tree1, seg, seglen);
> 		if (te1 == NULL) {
> 			err = got_error(GOT_ERR_NO_OBJ);
> 			goto done;
> 		}
> 
> The result is now ERR_NO_OBJ and *changed = 0. The commit graph will not
> add this commit to its iteration list. The command:
> 
>  $ got log -c 05f568 -P distrib/miniroot/install.sub
> 
> will only list commit 05f568ecc6aade... which added the file.
> 
> ok?
> 
> diff e8f0f2f60d1eb5651a01b13607ca04014f092753 e8e36d3a9d44ec3db6e43abebe3db47e062f49a9
> blob - 7c0aac05143e35448209231ec4347e3ad6e2252b
> blob + 7bd55922b5b83ac06fbadac4c69364802b1a6179
> --- lib/object.c
> +++ lib/object.c
> @@ -1771,22 +1771,21 @@ got_object_tree_path_changed(int *changed,
>  			goto done;
>  		}
>  
> -		te2 = find_entry_by_name(tree2, seg, seglen);
> -		if (te2 == NULL) {
> -			*changed = 1;
> -			goto done;
> -		}
> +		if (tree2)
> +			te2 = find_entry_by_name(tree2, seg, seglen);
>  
> -		mode1 = normalize_mode_for_comparison(te1->mode);
> -		mode2 = normalize_mode_for_comparison(te2->mode);
> -		if (mode1 != mode2) {
> -			*changed = 1;
> -			goto done;
> -		}
> +		if (te2) {
> +			mode1 = normalize_mode_for_comparison(te1->mode);
> +			mode2 = normalize_mode_for_comparison(te2->mode);
> +			if (mode1 != mode2) {
> +				*changed = 1;
> +				goto done;
> +			}
>  
> -		if (got_object_id_cmp(&te1->id, &te2->id) == 0) {
> -			*changed = 0;
> -			goto done;
> +			if (got_object_id_cmp(&te1->id, &te2->id) == 0) {
> +				*changed = 0;
> +				goto done;
> +			}
>  		}
>  
>  		if (*s == '\0') { /* final path element */
> @@ -1807,14 +1806,20 @@ got_object_tree_path_changed(int *changed,
>  				got_object_tree_close(tree1);
>  			tree1 = next_tree1;
>  
> -			err = got_object_open_as_tree(&next_tree2, repo,
> -			    &te2->id);
> -			te2 = NULL;
> -			if (err)
> -				goto done;
> -			if (tree2 != tree02)
> -				got_object_tree_close(tree2);
> -			tree2 = next_tree2;
> +			if (te2) {
> +				err = got_object_open_as_tree(&next_tree2, repo,
> +				    &te2->id);
> +				te2 = NULL;
> +				if (err)
> +					goto done;
> +				if (tree2 != tree02)
> +					got_object_tree_close(tree2);
> +				tree2 = next_tree2;
> +			} else if (tree2) {
> +				if (tree2 != tree02)
> +					got_object_tree_close(tree2);
> +				tree2 = NULL;
> +			}
>  		}
>  	}
>  done:
> 

Other than the question above (which is just me trying to understand
how this was debugged), this makes sense to me and the patch looks
right.  I confirmed that the segfault and the debug assert i added is
no longer being hit, and got log -P only shows the 05f568ecc6aade
commit.