From: Stefan Sperling Subject: Re: [rfc] got diff -v|--verbose To: Mark Jamsek Cc: Omar Polo , gameoftrees@openbsd.org Date: Mon, 31 Oct 2022 13:40:22 +0100 On Sun, Jun 19, 2022 at 02:07:20AM +1000, Mark Jamsek wrote: > On 22-06-16 06:18pm, Stefan Sperling wrote: > > On Thu, Jun 16, 2022 at 05:43:24PM +0200, Omar Polo wrote: > > > Mark Jamsek wrote: > > > > On 22-06-17 12:59am, Mark Jamsek wrote: > > > > > I habitually use {fossil,git} diff -v to see the diff I'm committing in > > > > > vim when editing the commit message. It saves me all the time from > > > > > missing things (see my previous commits for example). > > > > > > > > > > Any objections to adding this? > > > > > > > > > > > > > s/diff/commit > > > > > > > > got commit -v > > > > > > I didn't know about `fossil/git diff -v', but in general I'd like the > > > idea. > > > > I am a bit afraid that such a feature could lead to people accidentally > > committing pf.c as their log message. The Git documentation claims that > > the diff won't be part of the committed log message, but how can we filter > > the diff out of the file again with 100% accuracy? > > I don't think there is a reliable way. Once the user has had access to the > > file and gives it back to us, we have no idea what the file contents mean > > and we cannot assume that any meta-data we have about the log message is > > still valid. > > You're right, we can't control what happens in the editor. If the user > decides to delete whatever string we use to demarcate log message from > diff, the diff will end up in the message. In that sense, I guess it's > the same as deleting the leading '#' for the commit changeset that is > currently displayed; but with greater consequences. That said, we added > this to Fossil about 18 months ago, and we haven't had a reported > incident yet. (Now I've said that, I bet we get one :) The only > protection we have is a hash line, and everything after that is ignored. > > ############################################################################## > # The following diff is excluded from the commit message: > > ... > > > > what i'd really like, even more than putting the diff in the log edit > > > buffer with all the consequences it has (i.e. that special line to > > > differentiate between the log message and the diff), would be for `got > > > diff' to work while `got commit' is waiting for the editor. > > > > > > (blatantly a feature request since I haven't looked into how hard would > > > be and which consequences it would have.) > > > > The locks we use for the work tree are very simple exclusive locks. > > As soon as a lock exists, everyone else is locked out. flock(2) does > > not support a concurrent multi-reader single-writer model. > > > > And there is no separate staging area like in Git. Staged changes just > > add fileindex entries which point at blobs already written to the repo. > > > > If we don't lock the work tree during commit we are effectively running > > a transaction without protecting the integrity of transaction state. > > Which could lead to file index corruption or unexpected changes creeping > > into commits (e.g. a 'got add' command could modify the work tree while > > the log message is being written). > > > > Unlocking earlier than we do now is not an option because the work tree > > needs to remain locked until we have updated its base commit ID to that > > of the newly created commit. Which we obviously won't know until the > > very end of the commit process. > > This is precisely why I implemented this in Fossil: the checkout db is > locked and you can't run 'fossil diff' in this state. I don't think > trying to circumvent the locks is the answer either. > > > > to be honest thought I got used to open an xterm with `got di | less' in > > > it before doing a commit from the command line. > > > > This is what I do as well. Or save the diff to a file, and load it > > into a separate editor buffer to view log message and diff side-by-side. > > This is what I'll get in the habit of doing too. Today, landry@ asked about a 'git commit -v' style feature as well. We rehashed the above discussion in person, and came up with another idea to implement something like this. With the patch below, a temporary file which contains the diff will be created by 'got commit' while collecting changes to include in the commit (meaning the extra overhead is minimal, since we avoid a second status crawl to gather changed files for diffing). In the log message buffer we add a comment below the list of changed paths which displays the path to this file: # changes to be committed on branch main: # M alpha # D beta # detailed changes can be viewed in /tmp/got-diff-YOHkcUe0 This way, users can load the diff into a separate editor buffer to review their changes, without risk of cluttering the log message buffer and potentially ending up with diff text in their log message. The file is deleted automatically once the log message editor exits. If needed, a new commit -n flag disables this behaviour. This could be useful when the diff to be committed is very large, for example. ok? diff 21f077265038a0c30c857035d2ab286f293573b1 a56fb2ebf8ca172097aadf16a3fff6e1485e528c commit - 21f077265038a0c30c857035d2ab286f293573b1 commit + a56fb2ebf8ca172097aadf16a3fff6e1485e528c blob - 7aaa7e057f73895b5da6fc8f9f74a6dc5e45b4c3 blob + 71405def3d174253f89a13042be66595725c22d3 --- got/got.1 +++ got/got.1 @@ -1621,7 +1621,7 @@ is a directory. .Tg ci .It Xo .Cm commit -.Op Fl NS +.Op Fl NnS .Op Fl A Ar author .Op Fl F Ar path .Op Fl m Ar message @@ -1732,6 +1732,11 @@ option and is intended for non-interactive use such as It has no effect unless it is used together with the .Fl F option and is intended for non-interactive use such as scripting. +.It Fl n +This option prevents +.Cm got commit +from generating a diff of the to-be-committed changes in a temporary file +which can be viewed while editing a commit message. .It Fl S Allow the addition of symbolic links which point outside of the path space that is under version control. blob - d21399f4a8f84abbee2384d2577953581215edd6 blob + cbfb69469133d43ff6d3185d620a4cfc74627a59 --- got/got.c +++ got/got.c @@ -8416,8 +8416,8 @@ collect_commit_logmsg(struct got_pathlist_head *commit } static const struct got_error * -collect_commit_logmsg(struct got_pathlist_head *commitable_paths, char **logmsg, - void *arg) +collect_commit_logmsg(struct got_pathlist_head *commitable_paths, + const char *diff_path, char **logmsg, void *arg) { char *initial_content = NULL; struct got_pathlist_entry *pe; @@ -8479,6 +8479,11 @@ collect_commit_logmsg(struct got_pathlist_head *commit got_commitable_get_path(ct)); } + if (diff_path) { + dprintf(fd, "# detailed changes can be viewed in %s\n", + diff_path); + } + err = edit_logmsg(logmsg, a->editor, a->logmsg_path, initial_content, initial_content_len, a->prepared_log ? 0 : 1); done: @@ -8513,13 +8518,14 @@ cmd_commit(int argc, char *argv[]) char *gitconfig_path = NULL, *editor = NULL, *committer = NULL; int ch, rebase_in_progress, histedit_in_progress, preserve_logmsg = 0; int allow_bad_symlinks = 0, non_interactive = 0, merge_in_progress = 0; + int show_diff = 1; struct got_pathlist_head paths; int *pack_fds = NULL; TAILQ_INIT(&paths); cl_arg.logmsg_path = NULL; - while ((ch = getopt(argc, argv, "A:F:m:NS")) != -1) { + while ((ch = getopt(argc, argv, "A:F:m:NnS")) != -1) { switch (ch) { case 'A': author = optarg; @@ -8543,6 +8549,9 @@ cmd_commit(int argc, char *argv[]) case 'N': non_interactive = 1; break; + case 'n': + show_diff = 0; + break; case 'S': allow_bad_symlinks = 1; break; @@ -8649,7 +8658,7 @@ cmd_commit(int argc, char *argv[]) } cl_arg.repo_path = got_repo_get_path(repo); error = got_worktree_commit(&id, worktree, &paths, author, committer, - allow_bad_symlinks, collect_commit_logmsg, &cl_arg, + allow_bad_symlinks, show_diff, collect_commit_logmsg, &cl_arg, print_status, NULL, repo); if (error) { if (error->code != GOT_ERR_COMMIT_MSG_EMPTY && blob - e4bb25a6d8b1e1b7189a15dca86d54453d48cf39 blob + 3d11d1ed4657d8fa1f4809c2b04fe5b891fea563 --- include/got_worktree.h +++ include/got_worktree.h @@ -220,12 +220,13 @@ const struct got_error *got_worktree_revert(struct got /* * A callback function which is invoked when a commit message is requested. * Passes a pathlist with a struct got_commitable * in the data pointer of - * each element, a pointer to the log message that must be set by the - * callback and will be freed after committing, and an argument passed - * through to the callback. + * each element, the path to a file which contains a diff of changes to be + * committed (may be NULL), and a pointer to the log message that must be + * set by the callback and will be freed after committing, and an argument + * passed through to the callback. */ typedef const struct got_error *(*got_worktree_commit_msg_cb)( - struct got_pathlist_head *, char **, void *); + struct got_pathlist_head *, const char *, char **, void *); /* * Create a new commit from changes in the work tree. @@ -241,7 +242,7 @@ const struct got_error *got_worktree_commit(struct got */ const struct got_error *got_worktree_commit(struct got_object_id **, struct got_worktree *, struct got_pathlist_head *, const char *, - const char *, int, got_worktree_commit_msg_cb, void *, + const char *, int, int, got_worktree_commit_msg_cb, void *, got_worktree_status_cb, void *, struct got_repository *); /* Get the path of a commitable worktree item. */ blob - 438973a5f89e071064f01cb39023043b4337946a blob + acbca91c5f5f7a0048daf2e2b2baffaf08c347d5 --- lib/worktree.c +++ lib/worktree.c @@ -4952,9 +4952,211 @@ struct collect_commitables_arg { struct got_fileindex *fileindex; int have_staged_files; int allow_bad_symlinks; + int diff_header_shown; + FILE *diff_outfile; + FILE *f1; + FILE *f2; }; +/* + * Create a file which contains the target path of a symlink so we can feed + * it as content to the diff engine. + */ static const struct got_error * +get_symlink_target_file(int *fd, int dirfd, const char *de_name, + const char *abspath) +{ + const struct got_error *err = NULL; + char target_path[PATH_MAX]; + ssize_t target_len, outlen; + + *fd = -1; + + if (dirfd != -1) { + target_len = readlinkat(dirfd, de_name, target_path, PATH_MAX); + if (target_len == -1) + return got_error_from_errno2("readlinkat", abspath); + } else { + target_len = readlink(abspath, target_path, PATH_MAX); + if (target_len == -1) + return got_error_from_errno2("readlink", abspath); + } + + *fd = got_opentempfd(); + if (*fd == -1) + return got_error_from_errno("got_opentempfd"); + + outlen = write(*fd, target_path, target_len); + if (outlen == -1) { + err = got_error_from_errno("got_opentempfd"); + goto done; + } + + if (lseek(*fd, 0, SEEK_SET) == -1) { + err = got_error_from_errno2("lseek", abspath); + goto done; + } +done: + if (err) { + close(*fd); + *fd = -1; + } + return err; +} + +static const struct got_error * +append_ct_diff(struct got_commitable *ct, int *diff_header_shown, + FILE *diff_outfile, FILE *f1, FILE *f2, int dirfd, const char *de_name, + struct got_repository *repo, struct got_worktree *worktree) +{ + const struct got_error *err = NULL; + struct got_blob_object *blob1 = NULL; + int fd = -1, fd1 = -1, fd2 = -1; + FILE *ondisk_file = NULL; + char *label1 = NULL; + struct stat sb; + off_t size1 = 0; + int f2_exists = 0; + int diff_staged = (ct->staged_status != GOT_STATUS_NO_CHANGE); + char *id_str = NULL; + + memset(&sb, 0, sizeof(sb)); + + err = got_opentemp_truncate(f1); + if (err) + return got_error_from_errno("got_opentemp_truncate"); + err = got_opentemp_truncate(f2); + if (err) + return got_error_from_errno("got_opentemp_truncate"); + + if (!*diff_header_shown) { + err = got_object_id_str(&id_str, worktree->base_commit_id); + if (err) + return err; + fprintf(diff_outfile, "diff %s%s\n", diff_staged ? "-s " : "", + got_worktree_get_root_path(worktree)); + fprintf(diff_outfile, "commit - %s\n", id_str); + fprintf(diff_outfile, "path + %s%s\n", + got_worktree_get_root_path(worktree), + diff_staged ? " (staged changes)" : ""); + *diff_header_shown = 1; + } + + if (diff_staged) { + const char *label1 = NULL, *label2 = NULL; + switch (ct->staged_status) { + case GOT_STATUS_MODIFY: + label1 = ct->path; + label2 = ct->path; + break; + case GOT_STATUS_ADD: + label2 = ct->path; + break; + case GOT_STATUS_DELETE: + label1 = ct->path; + break; + default: + return got_error(GOT_ERR_FILE_STATUS); + } + fd1 = got_opentempfd(); + if (fd1 == -1) { + err = got_error_from_errno("got_opentempfd"); + goto done; + } + fd2 = got_opentempfd(); + if (fd2 == -1) { + err = got_error_from_errno("got_opentempfd"); + goto done; + } + err = got_diff_objects_as_blobs(NULL, NULL, f1, f2, + fd1, fd2, ct->base_blob_id, ct->staged_blob_id, + label1, label2, GOT_DIFF_ALGORITHM_PATIENCE, 3, 0, 0, + repo, diff_outfile); + goto done; + } + + fd1 = got_opentempfd(); + if (fd1 == -1) { + err = got_error_from_errno("got_opentempfd"); + goto done; + } + + if (ct->status != GOT_STATUS_ADD) { + err = got_object_open_as_blob(&blob1, repo, ct->base_blob_id, + 8192, fd1); + if (err) + goto done; + } + + if (ct->status != GOT_STATUS_DELETE) { + if (dirfd != -1) { + fd = openat(dirfd, de_name, + O_RDONLY | O_NOFOLLOW | O_CLOEXEC); + if (fd == -1) { + if (!got_err_open_nofollow_on_symlink()) { + err = got_error_from_errno2("openat", + ct->ondisk_path); + goto done; + } + err = get_symlink_target_file(&fd, dirfd, + de_name, ct->ondisk_path); + if (err) + goto done; + } + } else { + fd = open(ct->ondisk_path, + O_RDONLY | O_NOFOLLOW | O_CLOEXEC); + if (fd == -1) { + if (!got_err_open_nofollow_on_symlink()) { + err = got_error_from_errno2("open", + ct->ondisk_path); + goto done; + } + err = get_symlink_target_file(&fd, dirfd, + de_name, ct->ondisk_path); + if (err) + goto done; + } + } + if (fstatat(fd, ct->ondisk_path, &sb, + AT_SYMLINK_NOFOLLOW) == -1) { + err = got_error_from_errno2("fstatat", ct->ondisk_path); + goto done; + } + ondisk_file = fdopen(fd, "r"); + if (ondisk_file == NULL) { + err = got_error_from_errno2("fdopen", ct->ondisk_path); + goto done; + } + fd = -1; + f2_exists = 1; + } + + if (blob1) { + err = got_object_blob_dump_to_file(&size1, NULL, NULL, + f1, blob1); + if (err) + goto done; + } + + err = got_diff_blob_file(blob1, f1, size1, label1, + ondisk_file ? ondisk_file : f2, f2_exists, &sb, ct->path, + GOT_DIFF_ALGORITHM_PATIENCE, 3, 0, 0, diff_outfile); +done: + if (fd1 != -1 && close(fd1) == -1 && err == NULL) + err = got_error_from_errno("close"); + if (fd2 != -1 && close(fd2) == -1 && err == NULL) + err = got_error_from_errno("close"); + if (blob1) + got_object_blob_close(blob1); + if (fd != -1 && close(fd) == -1 && err == NULL) + err = got_error_from_errno("close"); + if (ondisk_file && fclose(ondisk_file) == EOF && err == NULL) + err = got_error_from_errno("fclose"); + return err; +} + +static const struct got_error * collect_commitables(void *arg, unsigned char status, unsigned char staged_status, const char *relpath, struct got_object_id *blob_id, struct got_object_id *staged_blob_id, @@ -5103,6 +5305,16 @@ done: goto done; } err = got_pathlist_insert(&new, a->commitable_paths, ct->path, ct); + if (err) + goto done; + + if (a->diff_outfile && ct && new != NULL) { + err = append_ct_diff(ct, &a->diff_header_shown, + a->diff_outfile, a->f1, a->f2, dirfd, de_name, + a->repo, a->worktree); + if (err) + goto done; + } done: if (ct && (err || new == NULL)) free_commitable(ct); @@ -5681,7 +5893,7 @@ commit_worktree(struct got_object_id **new_commit_id, struct got_object_id *head_commit_id, struct got_object_id *parent_id2, struct got_worktree *worktree, - const char *author, const char *committer, + const char *author, const char *committer, char *diff_path, got_worktree_commit_msg_cb commit_msg_cb, void *commit_arg, got_worktree_status_cb status_cb, void *status_arg, struct got_repository *repo) @@ -5713,7 +5925,8 @@ commit_worktree(struct got_object_id **new_commit_id, goto done; if (commit_msg_cb != NULL) { - err = commit_msg_cb(commitable_paths, &logmsg, commit_arg); + err = commit_msg_cb(commitable_paths, diff_path, + &logmsg, commit_arg); if (err) goto done; } @@ -5886,7 +6099,7 @@ got_worktree_commit(struct got_object_id **new_commit_ got_worktree_commit(struct got_object_id **new_commit_id, struct got_worktree *worktree, struct got_pathlist_head *paths, const char *author, const char *committer, int allow_bad_symlinks, - got_worktree_commit_msg_cb commit_msg_cb, void *commit_arg, + int show_diff, got_worktree_commit_msg_cb commit_msg_cb, void *commit_arg, got_worktree_status_cb status_cb, void *status_arg, struct got_repository *repo) { @@ -5898,10 +6111,12 @@ got_worktree_commit(struct got_object_id **new_commit_ struct got_pathlist_entry *pe; struct got_reference *head_ref = NULL; struct got_object_id *head_commit_id = NULL; + char *diff_path = NULL; int have_staged_files = 0; *new_commit_id = NULL; + memset(&cc_arg, 0, sizeof(cc_arg)); TAILQ_INIT(&commitable_paths); err = lock_worktree(worktree, LOCK_EX); @@ -5936,6 +6151,24 @@ got_worktree_commit(struct got_object_id **new_commit_ cc_arg.repo = repo; cc_arg.have_staged_files = have_staged_files; cc_arg.allow_bad_symlinks = allow_bad_symlinks; + cc_arg.diff_header_shown = 0; + if (show_diff) { + err = got_opentemp_named(&diff_path, &cc_arg.diff_outfile, + GOT_TMPDIR_STR "/got-diff"); + if (err) + goto done; + cc_arg.f1 = got_opentemp(); + if (cc_arg.f1 == NULL) { + err = got_error_from_errno("got_opentemp"); + goto done; + } + cc_arg.f2 = got_opentemp(); + if (cc_arg.f2 == NULL) { + err = got_error_from_errno("got_opentemp"); + goto done; + } + } + TAILQ_FOREACH(pe, paths, entry) { err = worktree_status(worktree, pe->path, fileindex, repo, collect_commitables, &cc_arg, NULL, NULL, 0, 0); @@ -5943,6 +6176,13 @@ got_worktree_commit(struct got_object_id **new_commit_ goto done; } + if (show_diff) { + if (fflush(cc_arg.diff_outfile) == EOF) { + err = got_error_from_errno("fflush"); + goto done; + } + } + if (TAILQ_EMPTY(&commitable_paths)) { err = got_error(GOT_ERR_COMMIT_NO_CHANGES); goto done; @@ -5969,7 +6209,7 @@ got_worktree_commit(struct got_object_id **new_commit_ } err = commit_worktree(new_commit_id, &commitable_paths, - head_commit_id, NULL, worktree, author, committer, + head_commit_id, NULL, worktree, author, committer, diff_path, commit_msg_cb, commit_arg, status_cb, status_arg, repo); if (err) goto done; @@ -5991,6 +6231,12 @@ done: free_commitable(ct); } got_pathlist_free(&commitable_paths); + if (diff_path && unlink(diff_path) == -1 && err == NULL) + err = got_error_from_errno2("unlink", diff_path); + free(diff_path); + if (cc_arg.diff_outfile && fclose(cc_arg.diff_outfile) == EOF && + err == NULL) + err = got_error_from_errno("fclose"); return err; } @@ -6284,7 +6530,7 @@ collect_rebase_commit_msg(struct got_pathlist_head *co static const struct got_error * collect_rebase_commit_msg(struct got_pathlist_head *commitable_paths, - char **logmsg, void *arg) + const char *diff_path, char **logmsg, void *arg) { *logmsg = arg; return NULL; @@ -6481,6 +6727,7 @@ rebase_commit(struct got_object_id **new_commit_id, struct got_object_id *head_commit_id = NULL; char *logmsg = NULL; + memset(&cc_arg, 0, sizeof(cc_arg)); TAILQ_INIT(&commitable_paths); *new_commit_id = NULL; @@ -6554,7 +6801,7 @@ rebase_commit(struct got_object_id **new_commit_id, err = commit_worktree(new_commit_id, &commitable_paths, head_commit_id, NULL, worktree, got_object_commit_get_author(orig_commit), committer ? committer : - got_object_commit_get_committer(orig_commit), + got_object_commit_get_committer(orig_commit), NULL, collect_rebase_commit_msg, logmsg, rebase_status, NULL, repo); if (err) goto done; @@ -7563,8 +7810,8 @@ merge_commit_msg_cb(struct got_pathlist_head *commitab }; static const struct got_error * -merge_commit_msg_cb(struct got_pathlist_head *commitable_paths, char **logmsg, - void *arg) +merge_commit_msg_cb(struct got_pathlist_head *commitable_paths, + const char *diff_path, char **logmsg, void *arg) { struct merge_commit_msg_arg *a = arg; @@ -7623,6 +7870,7 @@ got_worktree_merge_commit(struct got_object_id **new_c struct merge_commit_msg_arg mcm_arg; char *fileindex_path = NULL; + memset(&cc_arg, 0, sizeof(cc_arg)); *new_commit_id = NULL; TAILQ_INIT(&commitable_paths); @@ -7682,7 +7930,7 @@ got_worktree_merge_commit(struct got_object_id **new_c mcm_arg.worktree = worktree; mcm_arg.branch_name = branch_name; err = commit_worktree(new_commit_id, &commitable_paths, - head_commit_id, branch_tip, worktree, author, committer, + head_commit_id, branch_tip, worktree, author, committer, NULL, merge_commit_msg_cb, &mcm_arg, status_cb, status_arg, repo); if (err) goto done;