"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Stefan Sperling <stsp@stsp.name>
Subject:
Re: merge chokes, creates bogus conflicts
To:
Mark Jamsek <mark@jamsek.com>
Cc:
Christian Weisgerber <naddy@mips.inka.de>, gameoftrees@openbsd.org
Date:
Sat, 18 Feb 2023 14:47:24 +0100

Download raw body.

Thread
On Sat, Feb 18, 2023 at 11:50:16PM +1100, Mark Jamsek wrote:
> There was a problem I found in the original -C diff where after forcing
> a commit with unresolved conflicts using 'got commit -C', the diff of
> that commit with 'got diff -c', 'got log -p', or tog, presented
> a completely empty diff!

Oops. The test didn't catch this, so I guess it should be expanded?

> This made me look at other similar checks where I have also added
> GOT_STATUS_CONFLICT; however, I'm not as certain about these so I want
> to bring them to your attention.

See my comments inline in the diff below.

> Another artifact of this conflict
> management tweak that I'm not certain about is the change in update.sh
> where a binary file that already had conflict markers is now reported as
> "M  foo" instead of "C  foo", but I think this comports with the desired
> behaviour?

Yes, it is in line with the new expected behaviour.
 
> Otherwise, I think the diff implements your suggestion for the commit,
> histedit, merge, and rebase commands, and regress is passing.  I also
> took the liberty to tweak the output slightly from:
> 
>   got: cannot commit file in conflicted status
> 
> to something admittedly more verbose, but also potentially more accurate
> and informing as we now can commit files in conflicted status, albeit
> with an extra flag:
> 
>   C  conflicted/file/path
>   resolve conflicts or use -C to commit with unresolved conflicts
>   got: cannot commit file in conflicted status
> 
> That said, I really don't mind whether we keep or nix it, or report
> something else.

I would rather avoid suggesting use of -C in error messages.

This could lead to people using -C as a lazy alternative instead
of resolving conflicts. It is better to bury the existence of -C
in the man page where it is documented with an appropriate warning.
It will be discovered by users who have a real need for it and search
the documentation for clues.

If we are going to adjust the message then I would suggest we change
the error message to explcitly suggest that conflicts must be resolved.
This would guide users towards the correct way out.

  got: cannot commit file in conflicted status; conflicts must be resolved before changes can be committed


> +Conflicted files committed with
> +.Fl C
> +will always appear to be in conflicted status once modified in a work tree.
> +This prevents automatic merging of changes to such files during
> +.Cm got update ,
> +.Cm got rebase ,
> +.Cm got histedit ,
> +.Cm got merge ,
> +.Cm got cherrypick ,
> +.Cm got backout ,
> +and
> +.Cm got patch .

The sentence above is now wrong and should be deleted.

>  If merge conflicts occur, the rebase operation is interrupted and may
>  be continued once conflicts have been resolved.
> +If for some legitimate reason the conflicts cannot be resolved, the
> +.Fl C
> +option can be used to allow the rebase operation to continue despite
> +unresolved conflicts.
> +Such cases are exceedingly rare and should only be used as a last resort.

I would drop the above parapraph to avoid promoting use of -C, and
also drop it from the histedit and merge sections.

It is technically correct to document all possible features and
behaviours. But the more we advertise this particular feature the
higher the probability becomes that -C will be misused.

> +.It Fl C
> +Allow a rebase operation to continue with files in conflicted status.
> +This option can only be used with the
> +.Fl c
> +option.

Let's try to discourage use a bit more:

This option should generally be avoided, and can only be used with the
.Fl c
option.

Same for histedit and merge.

>  done:
> +	if (error && error->code == GOT_ERR_COMMIT_CONFLICT)
> +		printf("resolve conflicts or use -C to commit with "
> +		    "unresolved conflicts\n");

I would drop the above chunk as explained above.

Two related nits:

Errors should go to stderr.

When amending an error with static information like this, consider
whether editing the string constant in lib/error.c would do the job.

> -/* Upgrade STATUS_MODIFY to STATUS_CONFLICT if a conflict marker is found. */
> +/*
> + * Upgrade STATUS_MODIFY to STATUS_CONFLICT if a
> + * conflict marker is found in new added lines only.

Grammar: Should this say "newly added lines"?

> +	err = got_diff_blob_file(blob, f1, sz1, NULL, ondisk_file, 1, sb,
> +	    path, GOT_DIFF_ALGORITHM_MYERS, 0, 0, 0, NULL, f);

Myers is the right choice in this case indeed. Patience diff would
imply some additional performane overhead we don't want to pay
during the status crawl.

> +	if (err)
> +		goto done;
> +
> +	if (fflush(f) == EOF) {
> +		err = got_error_from_errno("fflush");
> +		goto done;
> +	}
> +	if (fseeko(f, 0L, SEEK_SET) == -1) {
> +		err = got_error_from_errno("fseek");
> +		goto done;
> +	}
> +
>  	while (*status == GOT_STATUS_MODIFY) {
>  		linelen = getline(&line, &linesize, f);
>  		if (linelen == -1) {
> @@ -1537,7 +1574,8 @@ get_modified_file_content_status(unsigned char *status
>  			break;
>  		}
>  
> -		if (strncmp(line, markers[i], strlen(markers[i])) == 0) {
> +		if (*line == '+' &&
> +		    strncmp(line + 1, markers[i], strlen(markers[i])) == 0) {

I suspect using the diff_result structure would be more efficient here.

It should be possible to skip the unidiff output step entirely by exposing
got_diffreg_result and jumping to relevant lines in the working version of
the file, available in diffreg_result->diff_result's right side.
But we can parse the text diff for now and experiment with diff_result later.

> @@ -5018,12 +5064,14 @@ append_ct_diff(struct got_commitable *ct, int *diff_he
>  	if (diff_staged) {
>  		if (ct->staged_status != GOT_STATUS_MODIFY &&
>  		    ct->staged_status != GOT_STATUS_ADD &&
> -		    ct->staged_status != GOT_STATUS_DELETE)
> +		    ct->staged_status != GOT_STATUS_DELETE &&
> +		    ct->staged_status != GOT_STATUS_CONFLICT)

We do not allow STATUS_CONFLICT for staged files. I see no good reason
to allow staging of changes with conflict markers, that would just
complicate things even more.

> @@ -5051,6 +5099,7 @@ append_ct_diff(struct got_commitable *ct, int *diff_he
>  		const char *label1 = NULL, *label2 = NULL;
>  		switch (ct->staged_status) {
>  		case GOT_STATUS_MODIFY:
> +		case GOT_STATUS_CONFLICT:

As above, staged_status should only be one of A, D, or M.
We should not allow files in 'C' status to be staged in any way.

> @@ -5217,7 +5269,8 @@ collect_commitables(void *arg, unsigned char status,
>  	}
>  
>  	if (staged_status == GOT_STATUS_ADD ||
> -	    staged_status == GOT_STATUS_MODIFY) {
> +	    staged_status == GOT_STATUS_MODIFY ||
> +	    staged_status == GOT_STATUS_CONFLICT) {

Same.

> @@ -5297,7 +5350,8 @@ collect_commitables(void *arg, unsigned char status,
>  		}
>  	}
>  	if (ct->staged_status == GOT_STATUS_ADD ||
> -	    ct->staged_status == GOT_STATUS_MODIFY) {
> +	    ct->staged_status == GOT_STATUS_MODIFY ||
> +	    ct->staged_status == GOT_STATUS_CONFLICT) {

Same.

> @@ -5533,14 +5587,21 @@ match_deleted_or_modified_ct(struct got_commitable **c
>  		char *ct_name = NULL;
>  		int path_matches;
>  
> +		/*
> +		 * XXX Files with GOT_STATUS_CONFLICT must be allowed else
> +		 * the caller write_tree() will fail to add a new tree entry
> +		 * and subsequent diffs of the file will show up empty.
> +		 */

Yes, this is fine. Above XXX comment can be removed.

>  		if (ct->staged_status == GOT_STATUS_NO_CHANGE) {
>  			if (ct->status != GOT_STATUS_MODIFY &&
>  			    ct->status != GOT_STATUS_MODE_CHANGE &&
> -			    ct->status != GOT_STATUS_DELETE)
> +			    ct->status != GOT_STATUS_DELETE &&
> +			    ct->status != GOT_STATUS_CONFLICT)
>  				continue;
>  		} else {
>  			if (ct->staged_status != GOT_STATUS_MODIFY &&
> -			    ct->staged_status != GOT_STATUS_DELETE)
> +			    ct->staged_status != GOT_STATUS_DELETE &&
> +			    ct->staged_status != GOT_STATUS_CONFLICT)

But the rules for commit of staged files should not be changed.

>  				continue;
>  		}
>  
> @@ -5744,7 +5805,9 @@ write_tree(struct got_object_id **new_tree_id, int *ne
>  				/* NB: Deleted entries get dropped here. */
>  				if (ct->status == GOT_STATUS_MODIFY ||
>  				    ct->status == GOT_STATUS_MODE_CHANGE ||
> -				    ct->staged_status == GOT_STATUS_MODIFY) {
> +				    ct->status == GOT_STATUS_CONFLICT ||
> +				    ct->staged_status == GOT_STATUS_MODIFY ||
> +				    ct->staged_status == GOT_STATUS_CONFLICT) {

Same.

>  					err = alloc_modified_blob_tree_entry(
>  					    &new_te, te, ct);
>  					if (err)
> @@ -5804,7 +5867,8 @@ update_fileindex_after_commit(struct got_worktree *wor
>  			    ct->staged_status == GOT_STATUS_DELETE) {
>  				got_fileindex_entry_remove(fileindex, ie);
>  			} else if (ct->staged_status == GOT_STATUS_ADD ||
> -			    ct->staged_status == GOT_STATUS_MODIFY) {
> +			    ct->staged_status == GOT_STATUS_MODIFY ||
> +			    ct->staged_status == GOT_STATUS_CONFLICT) {

Same.

>  				got_fileindex_entry_stage_set(ie,
>  				    GOT_FILEIDX_STAGE_NONE);
>  				got_fileindex_entry_staged_filetype_set(ie, 0);
> @@ -5948,12 +6012,14 @@ commit_worktree(struct got_object_id **new_commit_id,
>  
>  		/* Blobs for staged files already exist. */
>  		if (ct->staged_status == GOT_STATUS_ADD ||
> -		    ct->staged_status == GOT_STATUS_MODIFY)
> +		    ct->staged_status == GOT_STATUS_MODIFY ||
> +		    ct->staged_status == GOT_STATUS_CONFLICT)

Same.