From: Omar Polo Subject: Re: make 'got send' detect closed connections To: Stefan Sperling Cc: gameoftrees@openbsd.org Date: Wed, 13 Sep 2023 00:18:12 +0200 On 2023/09/12 21:17:38 +0200, Stefan Sperling wrote: > I am seeing an issue while trying to send changes to gothub, where > the gotsh process on the server runs into an error (cause unknown). > > This in turn triggers a problem where 'got send' does not realize > that unnecessary work is being done: > > $ got send -b qwx gothub > Connecting to "gothub" ssh://stsp@gothub.org/src.git > Host key fingerprint is SHA256:HT7ohAIc4knlmUm3vFUyyK9/DW/sFccZuPG0oNwBWYA > 16419 commits coloredgotsh: operation timed out > 228961 commits colored; 48 objects found; 7581 trees scanned > packing 1 reference; 48 objects; deltify: 100%; writing pack: 109K 100% > got: privsep peer process closed pipe > $ > > There are two errors in the above trace: > > 1) gotsh: operation timed out > 2) got: privsep peer process closed pipe > > The first error happens while we are coloring commits. We do not notice the > problem at that point, however. We keep building the pack file, upload it > (not sure where uploaded data goes -- perhaps ssh discards it?) and then > we finally see a problem. > > An ssh debug trace shows that ssh detects the connection being closed > when gotsh reports error 1: > > [[[ > got-send-pack: updating refs/heads/qwx c351eb807cdb2a9571bc3dde12654f23949e7c45 -> 903db0c03af248b610c1338fc1340e55bfa9074a > got-send-pack: writepkt: 0000 > 16785 commits coloreddebug2: channel 0: rcvd ext data 27 > gotsh: operation timed out > debug2: channel 0: written 27 to efd 6 > debug2: channel 0: rcvd eof > debug2: channel 0: output open -> drain > debug2: channel 0: obuf empty > ]]] > > The patch below makes 'got send' detect the dead connection while > generating the pack file. > Is there a better way than regularly polling the file descriptor to > detect a POLLHUP? I don't have a better idea. It is effective though: > > $ got send -b qwx gothub > Connecting to "gothub" ssh://stsp@gothub.org/src.git > Host key fingerprint is SHA256:HT7ohAIc4knlmUm3vFUyyK9/DW/sFccZuPG0oNwBWYA > 15062 commits coloredgotsh: operation timed out > 18982 commits colored > got: server unexpectedly closed the connection > $ > > Tests keep passing with this diff. I don't think it makes sense to add > test coverage for this corner case. It is largely a cosmetic problem. > > ok? No idea why gotd fails like that. Diff reads fine but I'm not sure it's worth to add polling in send_progress() for this fringe case to be honest. Alternatively, if ssh dies (not sure it does from the trace), maybe we can catch the SIGCHLD, set a flag, and treat it like an interrupt to abort the operation?