From: Stefan Sperling Subject: Got sometimes merges unrelated changes To: gameoftrees@openbsd.org Date: Sat, 22 May 2021 18:21:18 +0200 The merge logic used by 'got cherrypick' has a flaw that causes unrelated changes to show up in merge results sometimes. This also affects 'rebase' and 'histedit' which rely on the same code to merge changes. It is a very annoying issue. I expect that some of you will already have seen it. Though perhaps you've seen it but didn't realize that it is a bug? I have recently found such a case in Got's own repository. So far I have not managed to build a small reproduction case. The test that I am adding below reproduces the case I have observed on the smallest possible set of files that I could trigger the issue with. I hope there will be a smaller recipe eventually as this adds more than 200 lines of test data which is C code to a shell script... But having a working test for this bug is important. If you know of a smaller test case, please let me know. In the case I have observed, a file starts out with identical content on two branches. Now two changes are committed to one branch in sequence, and the second of those changes will be cherrypicked to the other branch. First change: [[[ @@ -89,6 +89,9 @@ *ref = NULL; + if (!is_valid_ref_name(name)) + return got_error_path(name, GOT_ERR_BAD_REF_NAME); + if (ref_is_absolute || ref_is_well_known) { if (asprintf(&path, "%s/%s", path_refs, name) == -1) ]]] Second change: [[[ @@ -253,6 +253,7 @@ while ((re = TAILQ_FIRST(refs))) { TAILQ_REMOVE(refs, re, entry); + got_ref_close(re->ref); free(re); } ]]] The expected merged diff contains just the second change, except the hunk header will align this change at line 250 instead of line 253. The target file of the merge lacks the first change, so it won't have the 3 additional lines of content above the content touched by the second change. Now, here comes the problem: The result of 'got cherrypick $commit2' contains both of the changes from $commit1 and $commit2: [[[ @@ -89,6 +89,9 @@ *ref = NULL; + if (!is_valid_ref_name(name)) + return got_error_path(name, GOT_ERR_BAD_REF_NAME); + if (ref_is_absolute || ref_is_well_known) { if (asprintf(&path, "%s/%s", path_refs, name) == -1) return got_error_from_errno("asprintf"); @@ -250,6 +253,7 @@ while ((re = TAILQ_FIRST(refs))) { TAILQ_REMOVE(refs, re, entry); + got_ref_close(re->ref); free(re); } ]]] I don't yet understand the issue completely. So far, I've have figured out the following: The lines added in the first change also appear elsewhere in the file. This seems to be necessary to trigger the issue. There are a lot of other lines between the two changes. I have tried to remove additional lines from my reproduction recipe below but so far I have not found a way to reduce the number of lines significantly without having the problem disappear. The problem is fixed (and the test added below is passing) if I change the order of the files passed to the diff3 algorithm with the hack patch below. I think this means we're not feeding the 3-way merge algorithm with the proper base version and derived versions of the file during a cherrypick operation. The current order is correct for 'update' style merges, though, which is why this issue does not affect 'got update'. hack patch: [[[ diff refs/heads/main refs/heads/cherrypick-bug blob - d2012785f43070b2587e5ab750848b6dbce1f336 blob + ee87a07e237c0cc64f44535f17bf010bbf0becdc --- lib/worktree.c +++ lib/worktree.c @@ -871,9 +871,19 @@ merge_file(int *local_changes_subsumed, struct got_wor } } +#if 1 + err = got_merge_diff3(&overlapcnt, merged_fd, + + symlink_path ? symlink_path : ondisk_path, + blob_orig_path, + deriv_path, + + NULL, label_orig, label_deriv); +#else err = got_merge_diff3(&overlapcnt, merged_fd, deriv_path, blob_orig_path, symlink_path ? symlink_path : ondisk_path, label_deriv, label_orig, NULL); +#endif if (err) goto done; ]]] The patch below adds a test case called test_cherrypick_unrelated_changes. This test fails unless the above worktree.c hack patch is applied. However, an unrelated test, test_cherrypick_symlink_conflicts, starts to fail with the above hack patch applied because the expected order of conflict markers changes. I still need to find a proper solution to this. We will probably need two calling conventions for got_merge_diff3 depending on the type of merge we are performing (update or cherrypick). As a first step, I'd like to commit the test case below as an 'xfail' test, even though the embedded test data is C code which looks ugly and confusing when embedded into a shell script like this... diff refs/heads/main refs/heads/cherrypick-bug blob - a11bcd4d9ca6b55a8f744cf2afe8a8c6820e9cc7 blob + a39cf404594745efae2bbb820b20d6b40d446d25 --- regress/cmdline/cherrypick.sh +++ regress/cmdline/cherrypick.sh @@ -861,6 +861,343 @@ test_cherrypick_conflict_no_eol2() { test_done "$testroot" "$ret" } +test_cherrypick_unrelated_changes() { + local testroot=`test_init cherrypick_unrelated_changes` + + # Sorry about the large HERE document but I have not found + # a smaller reproduction recipe yet... + cat > $testroot/repo/reference.c < 0 && line[linelen - 1] == '\n') + line[linelen - 1] = '\0'; + for (i = 0; i < nsubdirs; i++) { + if (!ref_is_absolute && + asprintf(&abs_refname, "refs/%s/%s", subdirs[i], + refname) == -1) + return got_error_from_errno("asprintf"); + err = parse_packed_ref_line(ref, abs_refname, line); + if (!ref_is_absolute) + free(abs_refname); + if (err || *ref != NULL) + break; + } + if (err) + break; + } while (*ref == NULL); + free(line); + + return err; +} + +static const struct got_error * +open_ref(struct got_reference **ref, const char *path_refs, const char *subdir, + const char *name, int lock) +{ + const struct got_error *err = NULL; + char *path = NULL; + char *absname = NULL; + int ref_is_absolute = (strncmp(name, "refs/", 5) == 0); + int ref_is_well_known = (subdir[0] == '\0' && is_well_known_ref(name)); + + *ref = NULL; + + if (ref_is_absolute || ref_is_well_known) { + if (asprintf(&path, "%s/%s", path_refs, name) == -1) + return got_error_from_errno("asprintf"); + absname = (char *)name; + } else { + if (asprintf(&path, "%s/%s%s%s", path_refs, subdir, + subdir[0] ? "/" : "", name) == -1) + return got_error_from_errno("asprintf"); + + if (asprintf(&absname, "refs/%s%s%s", + subdir, subdir[0] ? "/" : "", name) == -1) { + err = got_error_from_errno("asprintf"); + goto done; + } + } + + err = parse_ref_file(ref, name, absname, path, lock); +done: + if (!ref_is_absolute && !ref_is_well_known) + free(absname); + free(path); + return err; +} + +const struct got_error * +got_ref_open(struct got_reference **ref, struct got_repository *repo, + const char *refname, int lock) +{ + const struct got_error *err = NULL; + char *path_refs = NULL; + const char *subdirs[] = { + GOT_REF_HEADS, GOT_REF_TAGS, GOT_REF_REMOTES + }; + size_t i; + int well_known = is_well_known_ref(refname); + struct got_lockfile *lf = NULL; + + *ref = NULL; + + path_refs = get_refs_dir_path(repo, refname); + if (path_refs == NULL) { + err = got_error_from_errno2("get_refs_dir_path", refname); + goto done; + } + + if (well_known) { + err = open_ref(ref, path_refs, "", refname, lock); + } else { + char *packed_refs_path; + FILE *f; + + /* Search on-disk refs before packed refs! */ + for (i = 0; i < nitems(subdirs); i++) { + err = open_ref(ref, path_refs, subdirs[i], refname, + lock); + if ((err && err->code != GOT_ERR_NOT_REF) || *ref) + goto done; + } + + packed_refs_path = got_repo_get_path_packed_refs(repo); + if (packed_refs_path == NULL) { + err = got_error_from_errno( + "got_repo_get_path_packed_refs"); + goto done; + } + + if (lock) { + err = got_lockfile_lock(&lf, packed_refs_path); + if (err) + goto done; + } + f = fopen(packed_refs_path, "rb"); + free(packed_refs_path); + if (f != NULL) { + err = open_packed_ref(ref, f, subdirs, nitems(subdirs), + refname); + if (!err) { + if (fclose(f) == EOF) { + err = got_error_from_errno("fclose"); + got_ref_close(*ref); + *ref = NULL; + } else if (*ref) + (*ref)->lf = lf; + } + } + } +done: + if (!err && *ref == NULL) + err = got_error_not_ref(refname); + if (err && lf) + got_lockfile_unlock(lf); + free(path_refs); + return err; +} + +struct got_reference * +got_ref_dup(struct got_reference *ref) +{ + struct got_reference *ret; + + ret = calloc(1, sizeof(*ret)); + if (ret == NULL) + return NULL; + + ret->flags = ref->flags; + if (ref->flags & GOT_REF_IS_SYMBOLIC) { + ret->ref.symref.name = strdup(ref->ref.symref.name); + if (ret->ref.symref.name == NULL) { + free(ret); + return NULL; + } + ret->ref.symref.ref = strdup(ref->ref.symref.ref); + if (ret->ref.symref.ref == NULL) { + free(ret->ref.symref.name); + free(ret); + return NULL; + } + } else { + ret->ref.ref.name = strdup(ref->ref.ref.name); + if (ret->ref.ref.name == NULL) { + free(ret); + return NULL; + } + memcpy(ret->ref.ref.sha1, ref->ref.ref.sha1, + sizeof(ret->ref.ref.sha1)); + } + + return ret; +} + +const struct got_error * +got_reflist_entry_dup(struct got_reflist_entry **newp, + struct got_reflist_entry *re) +{ + const struct got_error *err = NULL; + struct got_reflist_entry *new; + + *newp = NULL; + + new = malloc(sizeof(*new)); + if (new == NULL) + return got_error_from_errno("malloc"); + + new->ref = got_ref_dup(re->ref); + if (new->ref == NULL) { + err = got_error_from_errno("got_ref_dup"); + free(new); + return err; + } + + *newp = new; + return NULL; +} + +void +got_ref_list_free(struct got_reflist_head *refs) +{ + struct got_reflist_entry *re; + + while ((re = TAILQ_FIRST(refs))) { + TAILQ_REMOVE(refs, re, entry); + free(re); + } + +} +EOF + (cd $testroot/repo && git add reference.c) + git_commit $testroot/repo -m "added reference.c file" + local base_commit=`git_show_head $testroot/repo` + + got checkout $testroot/repo $testroot/wt > /dev/null + ret="$?" + if [ "$ret" != "0" ]; then + test_done "$testroot" "$ret" + return 1 + fi + + (cd $testroot/repo && git checkout -q -b newbranch) + ed -s $testroot/repo/reference.c <ref); +. +w +q +EOF + git_commit $testroot/repo -m "more lines on newbranch" + + local branch_rev2=`git_show_head $testroot/repo` + + (cd $testroot/wt && got cherrypick $branch_rev2 > $testroot/stdout) + + echo "G reference.c" > $testroot/stdout.expected + echo "Merged commit $branch_rev2" >> $testroot/stdout.expected + + cmp -s $testroot/stdout.expected $testroot/stdout + ret="$?" + if [ "$ret" != "0" ]; then + diff -u $testroot/stdout.expected $testroot/stdout + test_done "$testroot" "$ret" + return 1 + fi + + cat > $testroot/diff.expected <ref); + free(re); + } + +EOF + (cd $testroot/wt && got diff | + egrep -v '^(diff|blob|file)' > $testroot/diff) + cmp -s $testroot/diff.expected $testroot/diff + ret="$?" + if [ "$ret" != "0" ]; then + diff -u $testroot/diff.expected $testroot/diff + ret="xfail cherrypick results in unexpected diff" + test_done "$testroot" "$ret" + return 1 + fi + + test_done "$testroot" "$ret" +} + test_parseargs "$@" run_test test_cherrypick_basic run_test test_cherrypick_root_commit @@ -873,3 +1210,4 @@ run_test test_cherrypick_symlink_conflicts run_test test_cherrypick_with_path_prefix_and_empty_tree run_test test_cherrypick_conflict_no_eol run_test test_cherrypick_conflict_no_eol2 +run_test test_cherrypick_unrelated_changes