"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Stefan Sperling <stsp@stsp.name>
Subject:
Got sometimes merges unrelated changes
To:
gameoftrees@openbsd.org
Date:
Sat, 22 May 2021 18:21:18 +0200

Download raw body.

Thread
The merge logic used by 'got cherrypick' has a flaw that causes unrelated
changes to show up in merge results sometimes. This also affects 'rebase'
and 'histedit' which rely on the same code to merge changes. It is a very
annoying issue. I expect that some of you will already have seen it.
Though perhaps you've seen it but didn't realize that it is a bug?

I have recently found such a case in Got's own repository. So far I have
not managed to build a small reproduction case. The test that I am adding
below reproduces the case I have observed on the smallest possible set of
files that I could trigger the issue with. I hope there will be a smaller
recipe eventually as this adds more than 200 lines of test data which is
C code to a shell script... But having a working test for this bug is
important. If you know of a smaller test case, please let me know.

In the case I have observed, a file starts out with identical content on
two branches. Now two changes are committed to one branch in sequence,
and the second of those changes will be cherrypicked to the other branch.

First change:
[[[
@@ -89,6 +89,9 @@
 
         *ref = NULL;
 
+        if (!is_valid_ref_name(name))
+                return got_error_path(name, GOT_ERR_BAD_REF_NAME);
+
         if (ref_is_absolute || ref_is_well_known) {
                 if (asprintf(&path, "%s/%s", path_refs, name) == -1)
]]]

Second change:
[[[
@@ -253,6 +253,7 @@
 
         while ((re = TAILQ_FIRST(refs))) {
                 TAILQ_REMOVE(refs, re, entry);
+                got_ref_close(re->ref);
                 free(re);
         }
]]] 

The expected merged diff contains just the second change, except the
hunk header will align this change at line 250 instead of line 253.
The target file of the merge lacks the first change, so it won't have
the 3 additional lines of content above the content touched by the
second change.

Now, here comes the problem:

The result of 'got cherrypick $commit2' contains both of the
changes from $commit1 and $commit2:
[[[
@@ -89,6 +89,9 @@
 
         *ref = NULL;
 
+        if (!is_valid_ref_name(name))
+                return got_error_path(name, GOT_ERR_BAD_REF_NAME);
+
         if (ref_is_absolute || ref_is_well_known) {
                 if (asprintf(&path, "%s/%s", path_refs, name) == -1)
                         return got_error_from_errno("asprintf");
@@ -250,6 +253,7 @@
 
         while ((re = TAILQ_FIRST(refs))) {
                 TAILQ_REMOVE(refs, re, entry);
+                got_ref_close(re->ref);
                 free(re);
         }
]]]

I don't yet understand the issue completely. So far, I've have figured
out the following:

The lines added in the first change also appear elsewhere in the file.
This seems to be necessary to trigger the issue.

There are a lot of other lines between the two changes. I have tried to
remove additional lines from my reproduction recipe below but so far I
have not found a way to reduce the number of lines significantly without
having the problem disappear.

The problem is fixed (and the test added below is passing) if I change the
order of the files passed to the diff3 algorithm with the hack patch below.
I think this means we're not feeding the 3-way merge algorithm with the
proper base version and derived versions of the file during a cherrypick
operation. The current order is correct for 'update' style merges, though,
which is why this issue does not affect 'got update'.

hack patch:
[[[
diff refs/heads/main refs/heads/cherrypick-bug
blob - d2012785f43070b2587e5ab750848b6dbce1f336
blob + ee87a07e237c0cc64f44535f17bf010bbf0becdc
--- lib/worktree.c
+++ lib/worktree.c
@@ -871,9 +871,19 @@ merge_file(int *local_changes_subsumed, struct got_wor
 		}
 	}
 
+#if 1
+	err = got_merge_diff3(&overlapcnt, merged_fd,
+
+	    symlink_path ? symlink_path : ondisk_path,
+	    blob_orig_path,
+	    deriv_path,
+
+	    NULL, label_orig, label_deriv);
+#else
 	err = got_merge_diff3(&overlapcnt, merged_fd, deriv_path,
 	    blob_orig_path, symlink_path ? symlink_path : ondisk_path,
 	    label_deriv, label_orig, NULL);
+#endif
 	if (err)
 		goto done;
 
]]] 

The patch below adds a test case called test_cherrypick_unrelated_changes.
This test fails unless the above worktree.c hack patch is applied.
However, an unrelated test, test_cherrypick_symlink_conflicts, starts to
fail with the above hack patch applied because the expected order of
conflict markers changes.

I still need to find a proper solution to this. We will probably need two
calling conventions for got_merge_diff3 depending on the type of merge we
are performing (update or cherrypick).

As a first step, I'd like to commit the test case below as an 'xfail' test,
even though the embedded test data is C code which looks ugly and confusing
when embedded into a shell script like this...

diff refs/heads/main refs/heads/cherrypick-bug
blob - a11bcd4d9ca6b55a8f744cf2afe8a8c6820e9cc7
blob + a39cf404594745efae2bbb820b20d6b40d446d25
--- regress/cmdline/cherrypick.sh
+++ regress/cmdline/cherrypick.sh
@@ -861,6 +861,343 @@ test_cherrypick_conflict_no_eol2() {
 	test_done "$testroot" "$ret"
 }
 
+test_cherrypick_unrelated_changes() {
+	local testroot=`test_init cherrypick_unrelated_changes`
+
+	# Sorry about the large HERE document but I have not found
+	# a smaller reproduction recipe yet...
+	cat > $testroot/repo/reference.c <<EOF
+const struct got_error *
+got_ref_alloc(struct got_reference **ref, const char *name,
+    struct got_object_id *id)
+{
+        if (!is_valid_ref_name(name))
+                return got_error_path(name, GOT_ERR_BAD_REF_NAME);
+
+        return alloc_ref(ref, name, id, 0);
+}
+
+static const struct got_error *
+parse_packed_ref_line(struct got_reference **ref, const char *abs_refname,
+    const char *line)
+{
+        struct got_object_id id;
+        const char *name;
+
+        *ref = NULL;
+
+        if (line[0] == '#' || line[0] == '^')
+                return NULL;
+
+        if (!got_parse_sha1_digest(id.sha1, line))
+                return got_error(GOT_ERR_BAD_REF_DATA);
+
+        if (abs_refname) {
+                if (strcmp(line + SHA1_DIGEST_STRING_LENGTH, abs_refname) != 0)
+                        return NULL;
+                name = abs_refname;
+        } else
+                name = line + SHA1_DIGEST_STRING_LENGTH;
+
+        return alloc_ref(ref, name, &id, GOT_REF_IS_PACKED);
+}
+
+static const struct got_error *
+open_packed_ref(struct got_reference **ref, FILE *f, const char **subdirs,
+    int nsubdirs, const char *refname)
+{
+        const struct got_error *err = NULL;
+        char *abs_refname;
+        char *line = NULL;
+        size_t linesize = 0;
+        ssize_t linelen;
+        int i, ref_is_absolute = (strncmp(refname, "refs/", 5) == 0);
+
+        *ref = NULL;
+
+        if (ref_is_absolute)
+                abs_refname = (char *)refname;
+        do {
+                linelen = getline(&line, &linesize, f);
+                if (linelen == -1) {
+                        if (feof(f))
+                                break;
+                        err = got_ferror(f, GOT_ERR_BAD_REF_DATA);
+                        break;
+                }
+                if (linelen > 0 && line[linelen - 1] == '\n')
+                        line[linelen - 1] = '\0';
+                for (i = 0; i < nsubdirs; i++) {
+                        if (!ref_is_absolute &&
+                            asprintf(&abs_refname, "refs/%s/%s", subdirs[i],
+                            refname) == -1)
+                                return got_error_from_errno("asprintf");
+                        err = parse_packed_ref_line(ref, abs_refname, line);
+                        if (!ref_is_absolute)
+                                free(abs_refname);
+                        if (err || *ref != NULL)
+                                break;
+                }
+                if (err)
+                        break;
+        } while (*ref == NULL);
+        free(line);
+
+        return err;
+}
+
+static const struct got_error *
+open_ref(struct got_reference **ref, const char *path_refs, const char *subdir,
+    const char *name, int lock)
+{
+        const struct got_error *err = NULL;
+        char *path = NULL;
+        char *absname = NULL;
+        int ref_is_absolute = (strncmp(name, "refs/", 5) == 0);
+        int ref_is_well_known = (subdir[0] == '\0' && is_well_known_ref(name));
+
+        *ref = NULL;
+
+        if (ref_is_absolute || ref_is_well_known) {
+                if (asprintf(&path, "%s/%s", path_refs, name) == -1)
+                        return got_error_from_errno("asprintf");
+                absname = (char *)name;
+        } else {
+                if (asprintf(&path, "%s/%s%s%s", path_refs, subdir,
+                    subdir[0] ? "/" : "", name) == -1)
+                        return got_error_from_errno("asprintf");
+
+                if (asprintf(&absname, "refs/%s%s%s",
+                    subdir, subdir[0] ? "/" : "", name) == -1) {
+                        err = got_error_from_errno("asprintf");
+                        goto done;
+                }
+        }
+
+        err = parse_ref_file(ref, name, absname, path, lock);
+done:
+        if (!ref_is_absolute && !ref_is_well_known)
+                free(absname);
+        free(path);
+        return err;
+}
+
+const struct got_error *
+got_ref_open(struct got_reference **ref, struct got_repository *repo,
+   const char *refname, int lock)
+{
+        const struct got_error *err = NULL;
+        char *path_refs = NULL;
+        const char *subdirs[] = {
+            GOT_REF_HEADS, GOT_REF_TAGS, GOT_REF_REMOTES
+        };
+        size_t i;
+        int well_known = is_well_known_ref(refname);
+        struct got_lockfile *lf = NULL;
+
+        *ref = NULL;
+
+        path_refs = get_refs_dir_path(repo, refname);
+        if (path_refs == NULL) {
+                err = got_error_from_errno2("get_refs_dir_path", refname);
+                goto done;
+        }
+
+        if (well_known) {
+                err = open_ref(ref, path_refs, "", refname, lock);
+        } else {
+                char *packed_refs_path;
+                FILE *f;
+
+                /* Search on-disk refs before packed refs! */
+                for (i = 0; i < nitems(subdirs); i++) {
+                        err = open_ref(ref, path_refs, subdirs[i], refname,
+                            lock);
+                        if ((err && err->code != GOT_ERR_NOT_REF) || *ref)
+                                goto done;
+                }
+
+                packed_refs_path = got_repo_get_path_packed_refs(repo);
+                if (packed_refs_path == NULL) {
+                        err = got_error_from_errno(
+                            "got_repo_get_path_packed_refs");
+                        goto done;
+                }
+
+                if (lock) {
+                        err = got_lockfile_lock(&lf, packed_refs_path);
+                        if (err)
+                                goto done;
+                }
+                f = fopen(packed_refs_path, "rb");
+                free(packed_refs_path);
+                if (f != NULL) {
+                        err = open_packed_ref(ref, f, subdirs, nitems(subdirs),
+                            refname);
+                        if (!err) {
+                                if (fclose(f) == EOF) {
+                                        err = got_error_from_errno("fclose");
+                                        got_ref_close(*ref);
+                                        *ref = NULL;
+                                } else if (*ref)
+                                        (*ref)->lf = lf;
+                        }
+                }
+        }
+done:
+        if (!err && *ref == NULL)
+                err = got_error_not_ref(refname);
+        if (err && lf)
+                got_lockfile_unlock(lf);
+        free(path_refs);
+        return err;
+}
+
+struct got_reference *
+got_ref_dup(struct got_reference *ref)
+{
+        struct got_reference *ret;
+
+        ret = calloc(1, sizeof(*ret));
+        if (ret == NULL)
+                return NULL;
+
+        ret->flags = ref->flags;
+        if (ref->flags & GOT_REF_IS_SYMBOLIC) {
+                ret->ref.symref.name = strdup(ref->ref.symref.name);
+                if (ret->ref.symref.name == NULL) {
+                        free(ret);
+                        return NULL;
+                }
+                ret->ref.symref.ref = strdup(ref->ref.symref.ref);
+                if (ret->ref.symref.ref == NULL) {
+                        free(ret->ref.symref.name);
+                        free(ret);
+                        return NULL;
+                }
+        } else {
+                ret->ref.ref.name = strdup(ref->ref.ref.name);
+                if (ret->ref.ref.name == NULL) {
+                        free(ret);
+                        return NULL;
+                }
+                memcpy(ret->ref.ref.sha1, ref->ref.ref.sha1,
+                    sizeof(ret->ref.ref.sha1));
+        }
+
+        return ret;
+}
+
+const struct got_error *
+got_reflist_entry_dup(struct got_reflist_entry **newp,
+    struct got_reflist_entry *re)
+{
+        const struct got_error *err = NULL;
+        struct got_reflist_entry *new;
+
+        *newp = NULL;
+
+        new = malloc(sizeof(*new));
+        if (new == NULL)
+                return got_error_from_errno("malloc");
+
+        new->ref = got_ref_dup(re->ref);
+        if (new->ref == NULL) {
+                err = got_error_from_errno("got_ref_dup");
+                free(new);
+                return err;
+        }
+
+        *newp = new;
+        return NULL;
+}
+
+void
+got_ref_list_free(struct got_reflist_head *refs)
+{
+        struct got_reflist_entry *re;
+
+        while ((re = TAILQ_FIRST(refs))) {
+                TAILQ_REMOVE(refs, re, entry);
+                free(re);
+        }
+
+}
+EOF
+	(cd $testroot/repo && git add reference.c)
+	git_commit $testroot/repo -m "added reference.c file"
+	local base_commit=`git_show_head $testroot/repo`
+
+	got checkout $testroot/repo $testroot/wt > /dev/null
+	ret="$?"
+	if [ "$ret" != "0" ]; then
+		test_done "$testroot" "$ret"
+		return 1
+	fi
+
+	(cd $testroot/repo && git checkout -q -b newbranch)
+	ed -s $testroot/repo/reference.c <<EOF
+91a
+        if (!is_valid_ref_name(name))
+                return got_error_path(name, GOT_ERR_BAD_REF_NAME);
+
+.
+w
+q
+EOF
+	git_commit $testroot/repo -m "added lines on newbranch"
+	local branch_rev1=`git_show_head $testroot/repo`
+
+	ed -s $testroot/repo/reference.c <<EOF
+255a
+                got_ref_close(re->ref);
+.
+w
+q
+EOF
+	git_commit $testroot/repo -m "more lines on newbranch"
+
+	local branch_rev2=`git_show_head $testroot/repo`
+
+	(cd $testroot/wt && got cherrypick $branch_rev2 > $testroot/stdout)
+
+	echo "G  reference.c" > $testroot/stdout.expected
+	echo "Merged commit $branch_rev2" >> $testroot/stdout.expected
+
+	cmp -s $testroot/stdout.expected $testroot/stdout
+	ret="$?"
+	if [ "$ret" != "0" ]; then
+		diff -u $testroot/stdout.expected $testroot/stdout
+		test_done "$testroot" "$ret"
+		return 1
+	fi
+
+	cat > $testroot/diff.expected <<EOF
+--- reference.c
++++ reference.c
+@@ -250,6 +250,7 @@ got_ref_list_free(struct got_reflist_head *refs)
+ 
+         while ((re = TAILQ_FIRST(refs))) {
+                 TAILQ_REMOVE(refs, re, entry);
++                got_ref_close(re->ref);
+                 free(re);
+         }
+ 
+EOF
+	(cd $testroot/wt && got diff |
+		egrep -v '^(diff|blob|file)' > $testroot/diff)
+	cmp -s $testroot/diff.expected $testroot/diff
+	ret="$?"
+	if [ "$ret" != "0" ]; then
+		diff -u $testroot/diff.expected $testroot/diff
+		ret="xfail cherrypick results in unexpected diff"
+		test_done "$testroot" "$ret"
+		return 1
+	fi
+
+	test_done "$testroot" "$ret"
+}
+
 test_parseargs "$@"
 run_test test_cherrypick_basic
 run_test test_cherrypick_root_commit
@@ -873,3 +1210,4 @@ run_test test_cherrypick_symlink_conflicts
 run_test test_cherrypick_with_path_prefix_and_empty_tree
 run_test test_cherrypick_conflict_no_eol
 run_test test_cherrypick_conflict_no_eol2
+run_test test_cherrypick_unrelated_changes