[mod.protocols.tcp-ip] Telnet 8th bit: a good use for that bit...

gnu@cgl.ucsf.edu@hoptoad.UUCP (02/14/87)

JBVB%MX.LCS.MIT.EDU@MC.LCS.MIT.EDU ("James B. VanBokkelen") writes:
> A couple of weeks ago, I posted a query about Telnet implementations
> that sent Ascii with bit 8 set (parity, or whatever).  The question
> was brought on when I ran across code that religiously masked off
> the high bit before passing it to a PC's display.

I think that it would be good to specify that 8-bit values passed
on Telnet connections are in ISO Latin I (essentially, extend NETASCII
to 8 bits using the ISO character set that contains all the graphics
for all the Latin languages).

If the number of programs that actually pass 8-bit data is small enough
(James' list only showed about 5 programs) then this change can be feasible.

Note that since SMTP uses telnet encoding, this would allow international
characters in mail, entered in a local character set, translated to ISO 
Latin I while flying over the wire, then translated to the recipient's local
character set.

This would have implications for FTP, too, if it was adopted.

mark@cbosgd.mis.oh.att.com.UUCP (02/16/87)

>>I think that it would be good to specify that 8-bit values passed
>>on Telnet connections are in ISO Latin I (essentially, extend NETASCII
>>to 8 bits using the ISO character set that contains all the graphics
>>for all the Latin languages).
>
>That would leave all the non-Latin languages, like Japanese, Chinese,
>Korean, etc., out in the cold.  It would be a mistake to require that
>8-bit values (i.e, GR characters, with the 8th bit set) passed over
>TELNET connections be in one particular character set.  If need be,
>there could be TELNET options to indicate which character set is
>being sent over the wire.

Good point.

The Japanese standard (or at least one of them) is in some sense upward
compatible with ASCII and European character sets.  Two byte sequences
with both high order bits set are Kanji, single bytes with the high
bit set are European.  Anything that might be a control character is
always a control char, no matter what else surrounds it.

I don't have the details, and I don't know if this extends to Korean.
I know it won't handle Chinese, because there are more characters in
the Chinese language.

However, TELNET option negotiation is very good at this sort of thing,
all we'd have to do is standardize the character sets (or provide an
open ended option that can be grown as needed.)

I suspect that if we just say that TELNET has to be 8 bit transparent
(except for a couple of things like 377 and CR) then most of the rest
of this won't matter - we could apply a default character set (which
might be ASCII, or European) unless options are negotiated otherwise.

	Mark

guy@SUN.COM.UUCP (02/16/87)

> The Japanese standard (or at least one of them) is in some sense upward
> compatible with ASCII and European character sets.  Two byte sequences
> with both high order bits set are Kanji, single bytes with the high
> bit set are European.  Anything that might be a control character is
> always a control char, no matter what else surrounds it.

One of them - the UJIS code, as proposed by AT&T - is definitely
*not* compatible in this fashion; any byte with the 8th bit on,
unless preceded by an SS2, is either the first or second byte of a
two-byte Kanji (or Gaiji, but making *that* work would require TELNET
options to send fonts!) character.  Does the other one - Shift-JIS -
avoid all of the code points used by ISO Latin 1?  The UJIS code is
specified by the Sigma Project, and is being adopted by a number of
UNIX systems, at least, in Japan.