Download raw body.
tog and unprintable characters
On Mon, Dec 14, 2020 at 07:04:06PM +0100, Christian Weisgerber wrote: > I have looked again into how tog(1) needs to deal with unprintable > characters, and I think in the end it is actually quite simple: > > tog MUST NOT pass unprintable characters to the curses output > routines. Period. > > Otherwise, curses will replace them with some printable representation > whose width we can't predict. > > The wcwidth() check in format_line() already tells us which characters > are printable. Those with width == -1 are not. > > The diff below replaces all unprintable characters (other than '\t') > with '.'. I'm not proposing this for commit, but as a demonstration. > With this, "tog log" on my evil repository (latest version attached) > shows no misformatting. It works just fine across the "C", > "en_US.ISO8859-1" and "C.UTF-8" locales on FreeBSD, as well as "C" > and "C.UTF-8" on OpenBSD. > > It copes with these examples: > * Byte sequences 0xc3 0xb6 and 0xc3 0xa4. (Unicode U+00F6 and U+00E4 > in UTF-8 encoding.) > * Byte sequence 0xc2 0x9b. (Control character U+009B in UTF-8.) > * Control character 0x0c (^L, FF). > * Control character 0x1b (^[, ESC). > > In UTF-8 locales, each UTF-8 sequence is handled as a single character > and the printability of the Unicode character is determined. > > In the C and ISO8859-1 locales, each byte is handled as a single > character and the printability of this character in ASCII or > ISO8859-1, respectively, is determined. > > So, I think this shows what needs to be done and we can now argue > over which printable replacement representations we want to choose... This patch is fine with me. Your reasoning makes sense, and L'.' is as good any other replacement character we could choose. Thank you for spending so much effort on figuring out a solution for this! > diff f5a09613ce18eb49de0d07d7f7a1dbd5dcac25c8 /home/naddy/got > blob - ca7f33b17ed856f59a4e0fc0cd5d1339a9ed6d93 > file + tog/tog.c > --- tog/tog.c > +++ tog/tog.c > @@ -1190,10 +1190,13 @@ format_line(wchar_t **wlinep, int *widthp, const char > if (wline[i] == L'\t') { > width = TABSIZE - > ((cols + col_tab_align) % TABSIZE); > - if (cols + width > wlimit) > - break; > - cols += width; > + } else { > + width = 1; > + wline[i] = L'.'; > } > + if (cols + width > wlimit) > + break; > + cols += width; > i++; > } else { > err = got_error_from_errno("wcwidth"); > -- > Christian "naddy" Weisgerber naddy@mips.inka.de
tog and unprintable characters