"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Mark Jamsek <mark@jamsek.com>
Subject:
Re: got {diff,log,update} -c KEYWORD (cf. svn revision keywords)
To:
Stefan Sperling <stsp@stsp.name>
Cc:
gameoftrees@openbsd.org
Date:
Thu, 13 Jul 2023 13:31:22 +1000

Download raw body.

Thread
Stefan Sperling <stsp@stsp.name> wrote:
> On Thu, Jul 13, 2023 at 01:42:42AM +1000, Mark Jamsek wrote:
> > As discussed not too long ago, the below diff implements the special
> > keywords BASE and HEAD, which can be used as the <commit> argument to
> > the -c option in the diff, log, and update commands. This is similar to
> > svn's revision keywords and fossil's special tags, but with a bit more
> > functionality.
> > 
> > An optional modifier can be appended in the form:
> > 
> >   KEYWORD(+|-)[N]
> > 
> > where '+' and '-' indicates [the Nth] descendant and antecedent
> > of KEYWORD, respectively; for example:
> > 
> >   got diff -cBASE -cHEAD	# work tree base and head commit
> >   got update -cHEAD-2		# 2nd generation ancestor of HEAD
> >   got log -cBASE+		# 1st generation descendant of BASE
> >   got update -cHEAD-		# 1st generation ancestor of HEAD
> >   got diff -cBASE
> > 
> > if the optional integer N is omitted, a "1" is implicitly appended
> > (i.e., BASE+ == BASE+1).
> > 
> > See the man page changes for the complete interface.
> > 
> > Inspiration for the syntax came from mblaze; I think +/-N is really
> > intuitive--and I've always found git's ^^ too abstruse!
> > 
> > The implementation is pretty simple, and I think this will be quite a
> > convenient feature. The diff includes coverage for the new keywords, and
> > regress is still happy.
> 
> I like the approach, but I reckon we will need to build on the
> basics you are putting in place here to cover some non-trivial
> cases people will inevitably come across.
> 
> What are use cases that you have in mind for this? It would be good to
> see how commands will look before and after this patch, for use cases
> where this change is supposed to help.

Yes, I plan to add to this in stages. For now, primary use cases for me
are to make histedit, bisect^, and diff operations simpler; for example:

got up -cHEAD- ; got he -m
got up -cHEAD-2 ; got he -d
got up -cHEAD-4 ; got he -f
got up -cBASE-16; check if bug exists; got up -cBASE+8; ...
got diff -cBASE -cHEAD

I'm sure there are other use cases this will make more convenient
(e.g., got log -cBASE path ...), but these are the motivation for me.

> Would the newly added (+|-)[N] syntax also work on regular commit hashes
> and revision arguments? The characters + and - are not forbidden in
> reference names. If we used forbidden characters it would be easier
> to apply the notation to references as well.

I did think about this and decided I don't want it to work with regular
commit hashes: if you're at the point to get a hash id, it would make
more sense to get the hash id of the actual commit rather than use
relative modifiers like this. I do, however, think such sugar makes
sense with other moving targets, so I plan to add support for use with
reference names.

> We store per-path base commits in the work tree, in addition to the
> work tree's global base commit which corresponds to the target commit
> of the most recent checkout/update operation.
> A user can change the global BASE by updating just one file:
> 
>   got update -c fedcba43210		# sets work tree's global BASE
>   got update -c abcdef01234 foo.c	# also sets work tree's global BASE
> 
> After this sequence, BASE is abcdef01234, yet all files on disk other
> than foo.c are still based on fedcba43210.
> When diffing a specific path with -cBASE the global work tree's base
> commit may differ from the path's base commit. Consequently the displayed
> diff might not correspond to unmodified files as seen on disk. Which may
> be expected since the diff being displayed is a repository-side diff.
> But I am afraid that users could be misled to believe that a strong
> relation to file contents in the work tree was implied by BASE.
> This becomes even more complicated if multiple paths are being diffed,
> all of which could in theory have distinct BASE commits...
> Should we require a single-commit work tree for keyword resolution to
> avoid this issue? Or should we accept the risk that users who don't
> understand such subtle nuances might be surprised?

I considered this, too, and played around with some examples. With the
current keywords, as you say, this is a repository side diff so I think
it is fine. Although, this should probably be noted in the docs.

However, I also plan for more keywords, which will add to this. I'll
elaborate below.

> I believe it would be good to have a syntax that resolves a timestamp/date
> string to a commit hash, e.g. got diff -c monday -c tuesday
> It seems this implementation could later be extended to allow for
> timestamps, with relatively little effort?

Yes, this is also on the agenda. We have something similar in fossil
too (e.g., fossil timeline before 2023-07-01), which is where I got the
idea. But I need to consider this more before settling on a syntax for
temporal keywords; for example:

  got diff -cTIME:monday -cTIME:tuesday
  got log -cTIME:july

> Git's ^^ syntax is complicated indeed. However, it is flexible enough
> to traverse the commit graph through a path that might involve the
> second, third, etc. parent of merge commits. I don't believe we will
> really need this capability in our CLI (I would rather spend effort
> on making interactive browsing of branched histories easier in tog).
> But we should probably mention the restriction to first-parent traversal
> in the documentation?

I agree. For such use cases it makes more sense to use hash ids.
But, yes, this should be documented too.

> Do we need any keywords in addition to BASE/HEAD? Could additional keywords
> perhaps be used to address some of the problems I outlined above?

I think what I have planned next will.

The next stage is to add a "WORKTREE" or "CKOUT" or "." keyword to refer
to the (possibly modified) files on disk in the work tree. Then we can
diff local changes against an arbitrary version; for example:
'got diff -cBASE-2 -c.'

After this, I want to add support for reference names. I think the
current syntax will work with refs. We backscan the keyword for the last
+ or - and parse it plus any optional trailing number as the modifier.
We can still do this: first, check if the whole string matches a branch,
if so, resolve its id and return the branch id string; if it isn't a
branch name, parse the modifier and see if the string minus the modifier
matches a branch name and resolve it accordingly. For example:

# "branch-ref-abc-1" resolves to branch: return branch id
got up -c branch-ref-abc-1

# "branch-ref-abc-1-2" does not resolve: parse -2 as the modifier
# "branch-ref-abc-1" resolves to branch: return 2nd gen ancestor's id
got up -c branch-ref-abc-1-2

# "branch-ref-def+" does not resolve to branch: parse + as the modifier
# branch-ref-def does not resolve:
# return "reference branch-ref-def+ not found"
got up -c branch-ref-def+

That said, while I find +/- to be very intuitive, we can always use
something else; I was flirting with ":" and "^" at first, but I started
using mblaze recently and it uses +/- for its mmsg(7) syntax to refer to
the next and previous message of the current message, which is ".".
And I thought it made perfect sense!

The third stage is to add the TIME syntax for temporal resolution.

^bisect: do we want a 'got bisect' command? This is something I often
use in fossil. This new syntax makes it simple to do manually and I will
likely write a bisect script with it, but I think a 'got bisect' command
is a good idea.

-- 
Mark Jamsek <https://bsdbox.org>
GPG: F2FF 13DE 6A06 C471 CA80  E6E2 2930 DC66 86EE CF68