[comp.unix.questions] An Essay on Handling Multi-Byte Keystrokes.

tony@oha.UUCP (Tony Olekshy) (08/28/90)

This topic was discussed at length here a couple of years ago, and the result
was that thou shalt not use time-outs to handle multi-byte sequences.  The
technique that does work is as follows.

    - Build a list of the multi-byte sequences that you want to decode.
      Use the termcap or terminfo functions to get these for function keys,
      arrow keys, &c.

    - Access the keyboard via a single function that looks like this:

	if accumulated bytes remain return next accumulated byte
	else reset accumulated bytes buffer

	while read one byte in raw mode

	    append byte to accumulated bytes
	    if accumulated bytes match any multi-byte sequence
		clear accumulated bytes
		return corresponding extended keycode (you make these up)
	    if accumulated bytes match any anchored substring thereof
		loop
	    if byte[0] equals byte[1]
		clear accumulated bytes and return byte[0]

	    return first accumulated byte
		-or-
	    if singe byte accumulated return it
	    else error and clear accumulated bytes

	    end of processing a byte

This means that to enter any keystroke that is the first character of any
multi-byte sequence, you have to strike the key twice.  Also, no multi-byte
sequence can be an anchored substring of any other such sequence, and no
multi-byte sequence can start with two identical bytes.  A timeout on the
read can be added, but it should return a no-keystroke-ready indication,
rather than any bytes which partially match a multi-byte sequence.

Vi can't do this because it wants escape to be a single keystroke, even though
it is the first byte of (most) multi-byte escape sequences, which is why arrow
keys cause trouble for vi.  There just is no way to account for system-imposed
delays during the read one byte in raw mode operation in the middle of a
multi-byte sequence.  It might be coming in by smoke signal on a gusty day.

The above technique should be used by any program that may ever have to be
ported to any system with any old or broken curses (and curses that use a
timeout are broken).  I just checked the definition of all function keys in
our termcap, and most start with escape or control-A.  There are a few
control-B/D/E/N/R/U/W/Zs, but you should be fairly safe if you don't define
single-keystroke uses for escape or control-A.

We actually scrapped curses some time ago because it can't handle 16
foreground and 16 background colours.  We still use terminfo or termcap
as the underlying database and cursor movement parser.

--
Yours etc., Tony Olekshy.               Internet: tony%oha@CS.UAlberta.CA
					  BITNET: tony%oha.uucp@UALTAMTS.BITNET
					    uucp: alberta!oha!tony