"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Omar Polo <op@omarpolo.com>
Subject:
Re: WIP: read-only https support
To:
Stefan Sperling <stsp@stsp.name>
Cc:
gameoftrees@openbsd.org
Date:
Sun, 20 Nov 2022 18:59:05 +0100

Download raw body.

Thread
On 2022/11/20 12:11:58 +0100, Stefan Sperling <stsp@stsp.name> wrote:
> On Sun, Nov 20, 2022 at 10:45:14AM +0100, Omar Polo wrote:
> > [...]
> > This is done with a new libexec helper "got-http" (an alternative name
> > could be "got-dial-http"?)  To minimize the changes needed to the dial
> 
> I would prefer got-http-fetch.

haven't thought about that name, i like it, thanks!

> We're probably never going to implement
> send via HTTP. But even if we did, we could add a got-http-send program
> and share some code between the two.

sending via http doesn't seem much different with the "smart"
protocol, it "just" needs some extra headers for the authentication
(with the usual complications of managing these authentication data, a
nightmare.)

I don't plan to do that anyway.  Fetching via http can be useful, and
in some degree it's better than the plain Git TCP protocol (at least
you have tls), for pushing it's better ssh.  (which is better in all
cases, but anonssh is not -yet- popular ;)

> > [...]
> > The dumb one is just a bare git repo served via a web server (could be
> > httpd(8)) and needs us to fetch all the objects manually and do the
> > resolving by ourselves.  To be fair I'm not thrilled at the idea of
> > implementing it.
> 
> While big forges tend to implement the smart protocol for efficiency,
> many self-hosted HTTP server setups use the dumb protocol because
> this is much easier to set up on the server side.
> 
> This includes git.gameoftrees.org, which uses dumb HTTP only ;)

yep; I'm actually doing the same too :)

(but i'm really looking forward to run gotd and do anonssh...)

it's nice to have, and would allow to use got to track itself.
However it's way more complex than the smart protocol, so it'll take
me some more time.

> > [...]
> > At the moment it "works."  I managed to clone repos from github
> > (including ports.git) and from sr.ht.  Incremental fetches also seems
> > to work, in part at least.  There's still some bits of how the server
> > replies that I'm not following.  For example, here's an excerpt of a
> > partial fetch:
> > 
> > 00000000  30 30 30 38 4e 41 4b 0a  30 30 32 39 02 45 6e 75  |0008NAK.0029.Enu|
> > 00000010  6d 65 72 61 74 69 6e 67  20 6f 62 6a 65 63 74 73  |merating objects|
> > 00000020  3a 20 31 37 37 33 39 32  34 2c 20 64 6f 6e 65 2e  |: 1773924, done.|
> > 00000030  0a 30 30 32 36 02 43 6f  75 6e 74 69 6e 67 20 6f  |.0026.Counting o|
> > ...
> > 000021a0  30 25 20 28 37 39 34 30  2f 37 39 34 30 29 2c 20  |0% (7940/7940), |
> > 000021b0  64 6f 6e 65 2e 0a 30 30  31 30 01 50 41 43 4b 00  |done..0010.PACK.|
> > 000021c0  00 00 02 00 1b 11 32 30  30 35 01 64 93 16 78 9c  |......2005.d..x.|
> > 
> > We get a NAK and then side-band info, which seems to confuse
> > got-fetch-pack that excepts at least one ACK.  (see the XXX below.)
> 
> ACK means the server has found a common ancestor commit, usually based on
> "have" lines the client has sent. But it could be possible that a server
> generates an ACK just to let the client know it can stop sending "have" lines,
> e.g. if the server is using the multi_ack capability (which got-fetch-pack
> does not support). You should ensure that Git-protocol capability
> announcements are exchanged properly between the client and the server.
> 
> Git's HTTP protocol is a beast compared to the simpler Git TCP protocol.
> I am not sure if your approach of implementing this as an ssh-style helper
> will work. I suspect you'll want more control over HTTP protocol specifics,
> given that we should support the dumb protocol, and that some http-specific
> Git-protocol capabilities exist. The got-http-fetch helper will likely need
> to replace got-fetch-pack entirely during http fetches, rather than wrap it.
> Otherwise, it will be too difficult to debug and fix issues we will encounter.

My initial plan was to somehow hide all the complexity of the dumb
protocol via this libexec helper, to avoid making fetch_pack more
complex.  I set on this design before knowing about http-specific
capabilities!  Not too bad however, the code is still quite short and
something like that would be needed anyway so it's not a loss.

I rethinked it a bit and I think I can make got-fetch-http more "dumb"
and run the logic in the main got process.  This is more in line with
how the rest of got works, and allows also to make the libexec helper
as strict as the others (pledge stdio recvfd only).

So, while I'm happy that a quick hack was actually working and I can
finally fetch via http with got, I'll rework this all.  But first I
need to study in more detail the pack protocol.


Thanks for the comments and suggestions!


P.S.:

> On Sun, Nov 20, 2022 at 10:45:14AM +0100, Omar Polo wrote:
> > +#define      GOT_USERAGENT   "got/" GOT_VERSION_STR
> 
> Is this still a problem? https://github.com/jelmer/dulwich/issues/562

I cloned a few repos and did various fetches from both github and
source hut (sr.ht) and had no issues due to the useragent.