"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Stefan Sperling <stsp@stsp.name>
Subject:
Re: WIP: read-only https support
To:
Omar Polo <op@omarpolo.com>
Cc:
gameoftrees@openbsd.org
Date:
Sun, 20 Nov 2022 12:11:58 +0100

Download raw body.

Thread
On Sun, Nov 20, 2022 at 10:45:14AM +0100, Omar Polo wrote:
> just scratching an itch; there are plenty of web forges online and I
> don't have an account for every one of them (not that I want to).  I
> can use git to clone, but then i'd miss all the niceties :)
> 
> Diff belows adds an initial read-only HTTP/S support for got fetch and
> clone.  The code is incomplete, wip, etc... use it at your own risk.
> Sharing just in case somebody wants to play along.
> 
> This is done with a new libexec helper "got-http" (an alternative name
> could be "got-dial-http"?)  To minimize the changes needed to the dial

I would prefer got-http-fetch. We're probably never going to implement
send via HTTP. But even if we did, we could add a got-http-send program
and share some code between the two.

> and fetch_pack API I decided to write an helper that behaves like
> ssh(1) as far as got is concerned.  Under the hood, it transforms what
> got asks into HTTP requests.  Only the "smart" HTTP protocol is
> supported, the "dumb" one not.  (as of now at least)
> 
> The "smart" HTTP protocol behaves almost as git over ssh, but needs
> two HTTP requests:
> 
>  - a first one to do the "discovery" (see if the remote server is
>    "smart") and fetch the refs
>  - a POST where we send our have/want line and fetch the packfile
> 
> The dumb one is just a bare git repo served via a web server (could be
> httpd(8)) and needs us to fetch all the objects manually and do the
> resolving by ourselves.  To be fair I'm not thrilled at the idea of
> implementing it.

While big forges tend to implement the smart protocol for efficiency,
many self-hosted HTTP server setups use the dumb protocol because
this is much easier to set up on the server side.

This includes git.gameoftrees.org, which uses dumb HTTP only ;)
Feel free to use this server for manual testing.
 
> got-http is pledged "stdio inet dns" and is not unveiled by default
> unlike the other libexec helpers.  It also can't be sandboxed with
> capsicum(4) on FreeBSD and I don't want to go thru the pain of trying
> to sandbox it with landlock on linux (needs to access certs.pem and
> probably more stuff there?)

Don't worry too much about this, just do what is feasible.
We want to push people towards SSH anyway. I hope that big forges will
eventually make anonymous SSH possible, though they probably don't care.

> At the moment it "works."  I managed to clone repos from github
> (including ports.git) and from sr.ht.  Incremental fetches also seems
> to work, in part at least.  There's still some bits of how the server
> replies that I'm not following.  For example, here's an excerpt of a
> partial fetch:
> 
> 00000000  30 30 30 38 4e 41 4b 0a  30 30 32 39 02 45 6e 75  |0008NAK.0029.Enu|
> 00000010  6d 65 72 61 74 69 6e 67  20 6f 62 6a 65 63 74 73  |merating objects|
> 00000020  3a 20 31 37 37 33 39 32  34 2c 20 64 6f 6e 65 2e  |: 1773924, done.|
> 00000030  0a 30 30 32 36 02 43 6f  75 6e 74 69 6e 67 20 6f  |.0026.Counting o|
> ...
> 000021a0  30 25 20 28 37 39 34 30  2f 37 39 34 30 29 2c 20  |0% (7940/7940), |
> 000021b0  64 6f 6e 65 2e 0a 30 30  31 30 01 50 41 43 4b 00  |done..0010.PACK.|
> 000021c0  00 00 02 00 1b 11 32 30  30 35 01 64 93 16 78 9c  |......2005.d..x.|
> 
> We get a NAK and then side-band info, which seems to confuse
> got-fetch-pack that excepts at least one ACK.  (see the XXX below.)

ACK means the server has found a common ancestor commit, usually based on
"have" lines the client has sent. But it could be possible that a server
generates an ACK just to let the client know it can stop sending "have" lines,
e.g. if the server is using the multi_ack capability (which got-fetch-pack
does not support). You should ensure that Git-protocol capability
announcements are exchanged properly between the client and the server.

Git's HTTP protocol is a beast compared to the simpler Git TCP protocol.
I am not sure if your approach of implementing this as an ssh-style helper
will work. I suspect you'll want more control over HTTP protocol specifics,
given that we should support the dumb protocol, and that some http-specific
Git-protocol capabilities exist. The got-http-fetch helper will likely need
to replace got-fetch-pack entirely during http fetches, rather than wrap it.
Otherwise, it will be too difficult to debug and fix issues we will encounter.