tog and unprintable characters

I have looked again into how tog(1) needs to deal with unprintable characters, and I think in the end it is actually quite simple: tog MUST NOT pass unprintable characters to the curses output routines. Period. Otherwise, curses will replace them with some printable representation whose width we can't predict. The wcwidth() check in format_line() already tells us which characters are printable. Those with width == -1 are not. The diff below replaces all unprintable characters (other than '\t') with '.'. I'm not proposing this for commit, but as a demonstration. With this, "tog log" on my evil repository (latest version attached) shows no misformatting. It works just fine across the "C", "en_US.ISO8859-1" and "C.UTF-8" locales on FreeBSD, as well as "C" and "C.UTF-8" on OpenBSD. It copes with these examples: * Byte sequences 0xc3 0xb6 and 0xc3 0xa4. (Unicode U+00F6 and U+00E4 in UTF-8 encoding.) * Byte sequence 0xc2 0x9b. (Control character U+009B in UTF-8.) * Control character 0x0c (^L, FF). * Control character 0x1b (^[, ESC). In UTF-8 locales, each UTF-8 sequence is handled as a single character and the printability of the Unicode character is determined. In the C and ISO8859-1 locales, each byte is handled as a single character and the printability of this character in ASCII or ISO8859-1, respectively, is determined. So, I think this shows what needs to be done and we can now argue over which printable replacement representations we want to choose... diff f5a09613ce18eb49de0d07d7f7a1dbd5dcac25c8 /home/naddy/got blob - ca7f33b17ed856f59a4e0fc0cd5d1339a9ed6d93 file + tog/tog.c --- tog/tog.c +++ tog/tog.c @@ -1190,10 +1190,13 @@ format_line(wchar_t **wlinep, int *widthp, const char if (wline[i] == L'\t') { width = TABSIZE - ((cols + col_tab_align) % TABSIZE); - if (cols + width > wlimit) - break; - cols += width; + } else { + width = 1; + wline[i] = L'.'; } + if (cols + width > wlimit) + break; + cols += width; i++; } else { err = got_error_from_errno("wcwidth"); -- Christian "naddy" Weisgerber naddy@mips.inka.de

2020-12-14 18:04 Christian Weisgerber:
tog and unprintable characters
- 2020-12-14 19:41 Stefan Sperling:
  tog and unprintable characters