Download raw body.
got-notify-http: fix unicode handling
On Thu, Mar 28, 2024 at 08:21:17AM +0100, Omar Polo wrote: > JSON strings are made of UNICODE codepoints, of which only the control > characters, \ and " need to be escaped. Furthermore, per RFC8259: > > : JSON text exchanged between systems that are not part of a closed > : ecosystem MUST be encoded using UTF-8. > > so when POSTing the notifications the JSON text has to be encoded in > UTF-8. > > The current code is wrong because it escapes with \uXXXX *byte* over > 0x7F, and this causes mis-decodings issues. > > isu8cont() as far as I can see will happily accept surrogate pairs and > overlong sequences (since it doesn't parse), which will cause an error > on the receiving side while decoding the JSON. Right, such sequences should be filtered and/or replaced. Eventually we should do this for STMP notifications, too. > I don't think I can reasonably use mbtowc() either since it will use the > current locale which is problematic in -portable. > > So, I'm bundling my favourite utf8 decoder (DFAs are lovely) and using > that to read the text. Upon decoding error the replacement character > U+FFFD is emitted in the JSON string, all the bytes considered so far > discarded and the decoder restarted with the next byte. (Not the only > technique, just the simpler to implement.) I didn't know about this decoder, it is interesting! Does the use of U+FFFD do something specific in JSON? What about using '?' like we do in openbsd base tools? In any case, ok by me. We can keep tweaking in-tree.
got-notify-http: fix unicode handling