"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Christian Weisgerber <naddy@mips.inka.de>
Subject:
tog and unprintable characters
To:
gameoftrees@openbsd.org
Date:
Mon, 14 Dec 2020 19:04:06 +0100

Download raw body.

Thread
I have looked again into how tog(1) needs to deal with unprintable
characters, and I think in the end it is actually quite simple:

  tog MUST NOT pass unprintable characters to the curses output
  routines.  Period.

Otherwise, curses will replace them with some printable representation
whose width we can't predict.

The wcwidth() check in format_line() already tells us which characters
are printable.  Those with width == -1 are not.

The diff below replaces all unprintable characters (other than '\t')
with '.'.  I'm not proposing this for commit, but as a demonstration.
With this, "tog log" on my evil repository (latest version attached)
shows no misformatting.  It works just fine across the "C",
"en_US.ISO8859-1" and "C.UTF-8" locales on FreeBSD, as well as "C"
and "C.UTF-8" on OpenBSD.

It copes with these examples:
* Byte sequences 0xc3 0xb6 and 0xc3 0xa4. (Unicode U+00F6 and U+00E4
  in UTF-8 encoding.)
* Byte sequence 0xc2 0x9b.  (Control character U+009B in UTF-8.)
* Control character 0x0c (^L, FF).
* Control character 0x1b (^[, ESC).

In UTF-8 locales, each UTF-8 sequence is handled as a single character
and the printability of the Unicode character is determined.

In the C and ISO8859-1 locales, each byte is handled as a single
character and the printability of this character in ASCII or
ISO8859-1, respectively, is determined.

So, I think this shows what needs to be done and we can now argue
over which printable replacement representations we want to choose...


diff f5a09613ce18eb49de0d07d7f7a1dbd5dcac25c8 /home/naddy/got
blob - ca7f33b17ed856f59a4e0fc0cd5d1339a9ed6d93
file + tog/tog.c
--- tog/tog.c
+++ tog/tog.c
@@ -1190,10 +1190,13 @@ format_line(wchar_t **wlinep, int *widthp, const char 
 			if (wline[i] == L'\t') {
 				width = TABSIZE -
 				    ((cols + col_tab_align) % TABSIZE);
-				if (cols + width > wlimit)
-					break;
-				cols += width;
+			} else {
+				width = 1;
+				wline[i] = L'.';
 			}
+			if (cols + width > wlimit)
+				break;
+			cols += width;
 			i++;
 		} else {
 			err = got_error_from_errno("wcwidth");
-- 
Christian "naddy" Weisgerber                          naddy@mips.inka.de