"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Omar Polo <op@omarpolo.com>
Subject:
Re: make 'got send' detect closed connections
To:
Stefan Sperling <stsp@stsp.name>
Cc:
gameoftrees@openbsd.org
Date:
Wed, 13 Sep 2023 00:18:12 +0200

Download raw body.

Thread
On 2023/09/12 21:17:38 +0200, Stefan Sperling <stsp@stsp.name> wrote:
> I am seeing an issue while trying to send changes to gothub, where
> the gotsh process on the server runs into an error (cause unknown).
> 
> This in turn triggers a problem where 'got send' does not realize
> that unnecessary work is being done:
> 
> $ got send -b qwx gothub
> Connecting to "gothub" ssh://stsp@gothub.org/src.git
> Host key fingerprint is SHA256:HT7ohAIc4knlmUm3vFUyyK9/DW/sFccZuPG0oNwBWYA
> 16419 commits coloredgotsh: operation timed out
> 228961 commits colored; 48 objects found; 7581 trees scanned
> packing 1 reference; 48 objects; deltify: 100%; writing pack:  109K 100%
> got: privsep peer process closed pipe
> $ 
> 
> There are two errors in the above trace:
> 
>  1) gotsh: operation timed out
>  2) got: privsep peer process closed pipe
> 
> The first error happens while we are coloring commits. We do not notice the
> problem at that point, however. We keep building the pack file, upload it
> (not sure where uploaded data goes -- perhaps ssh discards it?) and then
> we finally see a problem.
> 
> An ssh debug trace shows that ssh detects the connection being closed
> when gotsh reports error 1:
> 
> [[[
> got-send-pack: updating refs/heads/qwx c351eb807cdb2a9571bc3dde12654f23949e7c45 -> 903db0c03af248b610c1338fc1340e55bfa9074a
> got-send-pack: writepkt: 0000
> 16785 commits coloreddebug2: channel 0: rcvd ext data 27
> gotsh: operation timed out
> debug2: channel 0: written 27 to efd 6
> debug2: channel 0: rcvd eof
> debug2: channel 0: output open -> drain
> debug2: channel 0: obuf empty
> ]]]
> 
> The patch below makes 'got send' detect the dead connection while
> generating the pack file.
> Is there a better way than regularly polling the file descriptor to
> detect a POLLHUP? I don't have a better idea. It is effective though:
> 
> $ got send -b qwx gothub
> Connecting to "gothub" ssh://stsp@gothub.org/src.git
> Host key fingerprint is SHA256:HT7ohAIc4knlmUm3vFUyyK9/DW/sFccZuPG0oNwBWYA
> 15062 commits coloredgotsh: operation timed out
> 18982 commits colored
> got: server unexpectedly closed the connection
> $
> 
> Tests keep passing with this diff. I don't think it makes sense to add
> test coverage for this corner case. It is largely a cosmetic problem.
> 
> ok?

No idea why gotd fails like that.

Diff reads fine but I'm not sure it's worth to add polling in
send_progress() for this fringe case to be honest.

Alternatively, if ssh dies (not sure it does from the trace), maybe we
can catch the SIGCHLD, set a flag, and treat it like an interrupt to
abort the operation?