From: Stefan Sperling Subject: Re: WIP: read-only https support To: Omar Polo Cc: gameoftrees@openbsd.org Date: Sun, 20 Nov 2022 12:11:58 +0100 On Sun, Nov 20, 2022 at 10:45:14AM +0100, Omar Polo wrote: > just scratching an itch; there are plenty of web forges online and I > don't have an account for every one of them (not that I want to). I > can use git to clone, but then i'd miss all the niceties :) > > Diff belows adds an initial read-only HTTP/S support for got fetch and > clone. The code is incomplete, wip, etc... use it at your own risk. > Sharing just in case somebody wants to play along. > > This is done with a new libexec helper "got-http" (an alternative name > could be "got-dial-http"?) To minimize the changes needed to the dial I would prefer got-http-fetch. We're probably never going to implement send via HTTP. But even if we did, we could add a got-http-send program and share some code between the two. > and fetch_pack API I decided to write an helper that behaves like > ssh(1) as far as got is concerned. Under the hood, it transforms what > got asks into HTTP requests. Only the "smart" HTTP protocol is > supported, the "dumb" one not. (as of now at least) > > The "smart" HTTP protocol behaves almost as git over ssh, but needs > two HTTP requests: > > - a first one to do the "discovery" (see if the remote server is > "smart") and fetch the refs > - a POST where we send our have/want line and fetch the packfile > > The dumb one is just a bare git repo served via a web server (could be > httpd(8)) and needs us to fetch all the objects manually and do the > resolving by ourselves. To be fair I'm not thrilled at the idea of > implementing it. While big forges tend to implement the smart protocol for efficiency, many self-hosted HTTP server setups use the dumb protocol because this is much easier to set up on the server side. This includes git.gameoftrees.org, which uses dumb HTTP only ;) Feel free to use this server for manual testing. > got-http is pledged "stdio inet dns" and is not unveiled by default > unlike the other libexec helpers. It also can't be sandboxed with > capsicum(4) on FreeBSD and I don't want to go thru the pain of trying > to sandbox it with landlock on linux (needs to access certs.pem and > probably more stuff there?) Don't worry too much about this, just do what is feasible. We want to push people towards SSH anyway. I hope that big forges will eventually make anonymous SSH possible, though they probably don't care. > At the moment it "works." I managed to clone repos from github > (including ports.git) and from sr.ht. Incremental fetches also seems > to work, in part at least. There's still some bits of how the server > replies that I'm not following. For example, here's an excerpt of a > partial fetch: > > 00000000 30 30 30 38 4e 41 4b 0a 30 30 32 39 02 45 6e 75 |0008NAK.0029.Enu| > 00000010 6d 65 72 61 74 69 6e 67 20 6f 62 6a 65 63 74 73 |merating objects| > 00000020 3a 20 31 37 37 33 39 32 34 2c 20 64 6f 6e 65 2e |: 1773924, done.| > 00000030 0a 30 30 32 36 02 43 6f 75 6e 74 69 6e 67 20 6f |.0026.Counting o| > ... > 000021a0 30 25 20 28 37 39 34 30 2f 37 39 34 30 29 2c 20 |0% (7940/7940), | > 000021b0 64 6f 6e 65 2e 0a 30 30 31 30 01 50 41 43 4b 00 |done..0010.PACK.| > 000021c0 00 00 02 00 1b 11 32 30 30 35 01 64 93 16 78 9c |......2005.d..x.| > > We get a NAK and then side-band info, which seems to confuse > got-fetch-pack that excepts at least one ACK. (see the XXX below.) ACK means the server has found a common ancestor commit, usually based on "have" lines the client has sent. But it could be possible that a server generates an ACK just to let the client know it can stop sending "have" lines, e.g. if the server is using the multi_ack capability (which got-fetch-pack does not support). You should ensure that Git-protocol capability announcements are exchanged properly between the client and the server. Git's HTTP protocol is a beast compared to the simpler Git TCP protocol. I am not sure if your approach of implementing this as an ssh-style helper will work. I suspect you'll want more control over HTTP protocol specifics, given that we should support the dumb protocol, and that some http-specific Git-protocol capabilities exist. The got-http-fetch helper will likely need to replace got-fetch-pack entirely during http fetches, rather than wrap it. Otherwise, it will be too difficult to debug and fix issues we will encounter.