tog: fix display of lines ending in \r\n

From:: Stefan Sperling <stsp@stsp.name>
Subject:: Re: tog: fix display of lines ending in \r\n
To:: Christian Weisgerber <naddy@mips.inka.de>
Cc:: gameoftrees@openbsd.org
Date:: Sun, 13 Dec 2020 12:07:17 +0100

Download raw body.

Thread

- 2020-12-12 18:58 Stefan Sperling:
  tog: fix display of lines ending in \r\n
- 2020-12-13 00:35 Christian Weisgerber:
  tog: fix display of lines ending in \r\n
- 2020-12-13 11:07 Stefan Sperling:
  tog: fix display of lines ending in \r\n

2020-12-10 23:07 joshrickmar@outlook.com:

On Sun, Dec 13, 2020 at 01:35:03AM +0100, Christian Weisgerber wrote:
> Stefan Sperling:
> 
> > I'd say we can go back to using a custom single-char replacement hack.
> > I.e. replace the control char with '?' or something like that, and
> > whitelist the ones known to work (e.g. Tab). That is still better
> > whan what we have right now.
> 
> It's part of a bigger problem that printability and width also vary
> by locale.  E.g., let's say I have the byte sequence 0xc3 0xb6 in
> a commit message (U+00F6 in UTF-8).  On FreeBSD, I get different
> display results if I run tog in LC_CTYPE=C.UTF-8 versus LC_CTYPE=C.
> The latter is misformatted, even in a non-UTF8 xterm (!?).

Not sure what you are seeing exactly, but it doesn't seem surprising
that xterm does weird things in the C locale. Recall what Ingo uncovered
here: https://undeadly.org/cgi?action=article&sid=20160308204011
"Printing sanitized UTF-8 to a US-ASCII terminal is *NOT* safe."
(Needless to say, Ingo's article contains _all_ the details ;)

I guess tog could run everything through iswprint() and hope that any
characters flagged as printable in locale data will also have correct
width information. Locale definitions can be buggy. Though I see that
in 2018 (r340491) FreeBSD switched to using Unicode tables as source data
for the UTF-8 locale, replacing the old buggy UTF-8 data, as OpenBSD did
in 2015.
And FreeBSD's DefaultRuneLocale table defines 0x7e (~) as the highest
printable character for the C locale, as far as I can tell.
So results from iswprint() should be correct in either case.

- 2020-12-12 18:58 Stefan Sperling:
  tog: fix display of lines ending in \r\n
- 2020-12-13 00:35 Christian Weisgerber:
  tog: fix display of lines ending in \r\n
- 2020-12-13 11:07 Stefan Sperling:
  tog: fix display of lines ending in \r\n

2020-12-10 23:07 joshrickmar@outlook.com:

Histedit fold shortcut