From: Stefan Sperling <stsp@stsp.name>
Subject: Re: got-archive(1) (now with a patch)
To: Benjamin Stürz <benni+got@stuerz.xyz>
Cc: gameoftrees@openbsd.org
Date: Fri, 29 Dec 2023 01:40:55 +0100

On Thu, Dec 28, 2023 at 06:17:05PM +0100, Benjamin Stürz wrote:
> On 12/28/23 12:51, Benjamin Stürz wrote:
> > I'll take a look into the code of got, and see if I can do something,
> > if no one is already working on it.
> Here's a patch implementing a WIP archive command.
> There are still a few things to do,
> like adding options and a section in the man page.
> But I think it's ready for testing.
> Most of the code is copied from the checkout command,
> the rest is either written by me or stolen from a man page.

I am not sure about my overall end-goals for the design, and I do
not have a lot of time to think about it now, but here are some
quick thoughts:

The archive command could work with a packing list file that contains
a list of files to include in archives. The file could be versioned in
some fully automatic or semi-automatic way. It could be visible, under
some well-known name such as got-archive.conf or got-archive-list.txt,
or be hidden somewhere in meta-data.

If the command assumed an existing work tree as a starting point, rather
than checking out an entire repository, then it could be more flexible.
Consider multi-project repositories (fairly rare with Git, but they
do exist) which could be checked out with a path-prefix via
'got checkout -p' to obtain the subtree of the repository that needs
to be archived.

A mixed-commit work tree would be rejected with an error, similar to
how some other commands already do it. Local modifications to versioned
files that are not yet committed would be tolerated but perhaps shown
to the user for verification. E.g. it would be OK to locally tweak
the version string in a Makefile, but having some non-committed changes
in the code or in the packing list would probably be bad.

One advantage of using an existing work tree and a packing list is
that invoking the command could potentially become as simple as
running something like this in a work tree:

  got archive got-0.96.tar.gz

Version numbers are tricky. There are many conventions, your script
already has a specific flag to deal with tag names that have a 'v'
prefix, but this seems overly specific. Leaving the version entirely
up to the user as a user-specified string might be the most general
solution, rather than trying to be clever about it. Perhaps just let
the version be a part of a user-specified archive name and leave it
at that? I understand that your wrapper tool does its work based on
tags in the repository but is it strictly necessary to integrate
this new feature with tags so tightly? A wrapper script which checks
out all tags and runs 'got archive' on each would still be possible.

Instead of fts_open(3) this feature could use the work tree status
crawl to pick up the versioned and unversioned files to package.

I would prefer to ignore file types like fifos and device nodes.
Such files should not appear in a reasonable source code tarball.
Unless I am missing something, any file types other than regular
files, directories, or symlinks would be out of the ordinary,
wouldn't they? Besides, Git doesn't version such files either.

Writing the tar headers directly is clever and avoids having to
run an external 'tar' program. I hope though that this won't run
into subtle bugs that result in incompatibilities with some
implementations of tar. The header looks simple enough but I do
not have enough experience with the tar file format to judge this.

If a packing list doesn't exist yet then 'got archive' could create it
based on the contents of the work tree. Users could then edit the list
as needed. The list would contain both versioned and unversioned files
that are expected to be present in the archive. It could also contain
per-file annotations (like those used in OpenBSD's port tree's PLIST)
in case that helps the design somehow.