"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Stefan Sperling <stsp@stsp.name>
Subject:
Re: gotd: allow repository directives in global scope
To:
Martijn van Duren <openbsd+got@list.imperialat.at>
Cc:
gameoftrees@openbsd.org
Date:
Sat, 4 Nov 2023 13:07:09 +0100

Download raw body.

Thread
On Sat, Nov 04, 2023 at 12:42:58PM +0100, Martijn van Duren wrote:
> On Sat, 2023-11-04 at 11:33 +0100, Stefan Sperling wrote:
> > On Fri, Nov 03, 2023 at 09:19:23PM +0100, Martijn van Duren wrote:
> > > Hello,
> > > 
> > > I just started playing with got{,d} for the first time and after
> > > wrapping my head around the fact that the repository needs to live
> > > outside the working tree I'm starting to like it.
> > > 
> > > In setting up a gotd server, I quickly became bored with the copy-paste
> > > work of the different per repository settings.
> > > 
> > > Diff below allows to declare repository directives in the global scope
> > > and are stored in a repo_template. If a directive is not specified in
> > > the repository it will be copied over from the template. The path
> > > directive will be used as a root-directory and the name of the
> > > repository will be appended to complete the per repository path.
> > > 
> > > thoughts?
> > 
> > Hey Martijn,
> > 
> > I understand the need to avoid repetition. This requirement will probably
> > come up over and over again so we should definitely do something about this.
> > 
> > I have doubts about appending the repository name to an on-disk path because
> > the repository name is in a "virtual" path space. It is not an on-disk path.
> > Consider this example from gotd.conf:
> > 
> >            # This repository can be accessed via
> >            # ssh://user@example.com/openbsd/ports
> >            repository "openbsd/ports" {
> >                    path "/var/git/ports.git"
> 
> I thought about this, but it's new functionality, so we won't break
> anything here. If people want to use this new feature on their existing
> gotd.conf they can either keep the path directive and remove the rest,
> or move the above example to /var/git/openbsd/ports.git.

It seems we would then be using the repository name as on-disk path in
some cases but not in others? That's too subtle for my taste. I would
prefer to keep a very clear distinction between what is an on-disk path
and what is a URL component which will be exposed to clients.

> > The repository name is used to construct the repository URL which clients can
> > fetch from. This URL has no relation to where the repository appears on disk.
> > I would prefer to keep the URL and on-disk path spaces completely separate.
> > 
> > While at it, we should support multiple repository directories because
> > people might want to add more partitions when existing ones start running
> > out of space.
> > 
> > So how about implementing something like this instead:
> > 
> >  directory "/git"  # some repositories are here
> >  directory "/var/git"  # some more repositories are here
> 
> Well, my original thinking was to make it possible to completely
> skip the repository keyword and let gotd scan the entire template
> path. I'm not sure how far your suggestion is feasible with the
> current pledge/unveil structure, but my original idea was more
> trouble then it's worth for me.

If a repository isn't explicitly listed in gotd.conf we must not use it.
This is also required to keep gitwrapper working as expected.

It's generally safer to avoid automatic discovery of repositories
and rely on explicit configuration instead. Especially if the server
allows cloning by anyone on the internet.

> Then comes the second problem: what if by some oversight a
> repository-specific path is specified twice, under different repo
> names?

It's generally not a problem to expose the same repository via
different URLs. But indeed it's most likely by accident, and may imply
other issues such as access permissions being wrong on one of the instances.

It seems at present we only reject duplicate repository nanes.
To be safe we can reject duplicate repository paths when parsing the
config, after realpath() gets applied to the value given in the config
file to ensure we don't misdetect cases where people play with symlinks.

> Another option I was thinking about, which builds upon my diff is
> that we could add a template keyword, so we would have something
> like this:
> repository template "martijn" {
> 	path "/var/got/martijn"
> 	permit rw martijn
> 	...
> }
> repository template "stsp" {
> 	path "/var/got/stsp"
> 	permit rw stsp
> 	...
> }
> repository got {
> 	template stsp
> }
> repository snmpd {
> 	template martijn
> }
> 
> I didn't implement this in the diff, because it's big enough as it
> is and I don't see an immediate need for it on top of this
> functionality, but the code was written with this option in mind.

This results in a much more syntactic changes than my suggestion.
If possible we should try to keep syntax changes at a minimum.

> > Regarding the new code, could the new protect_set flag be replaced with
> > TAILQ_EMPTY checks, for example via a helper function like this:
> > 
> > static int
> > repo_has_protected_refs(struct gotd_repo *repo)
> > {
> >   return (TAILQ_EMPTY(&repo->protected_tag_namespaces) &&
> >       TAILQ_EMPTY(&repo->protected_branch_namespaces) &&
> >       TAILQ_EMPTY(&repo->protected_branches));
> > }
> > 
> > I suspect this might reduce the size of the diff and would make the code
> > easier to follow because there would be less state flags to keep in mind.
> 
> This doesn't work. Consider the following example:
> protect {
> 	branch release
> }
> repository garbage {
> 	protect {}
> }
> garbage would pass repo_has_protected_refs(), but the intent here is to
> remove all protections from the global scope.

Right, fair enough then.