[comp.protocols.iso] CCITT T.51

Markku.Savela@tel.vtt.fi (Markku Savela) (01/28/91)

  Have I understood the standard right? Assuming the following in
8-bit environment:

   1) I have designated single byte code set (like normal "ASCII")
      into G0 and invoked it into GL with LS0.
   2) I have designated multi-byte code set (like JISC 6226 (kanji))
      into G1 and invoked it into GR with LS1R. 

  With this setup ASCII and Kanji can be mixed without any additional
escape sequences? Every single code that does not have the 8th bit
set will be normal ascii, and each consecutive pair of bytes that
have 8th bit set will be single kanji character.

  What is the correct procedure to handle error cases, like when
there is only one byte with 8th bit set? Do I blindly take the
next byte regardless of the 8th bit setting or ignore this single
byte?
--
Markku Savela (savela@tel.vtt.fi), Technical Research Centre of Finland
Telecommunications Laboratory, Otakaari 7 B, SF-02150 ESPOO, Finland

lasko@regent.dec.com (Tim Lasko, Digital Equipment Corp., Westford, MA) (01/29/91)

In article <5356@hemuli.tik.vtt.fi>, Markku.Savela@tel.vtt.fi (Markku Savela) writes...
>  With this setup ASCII and Kanji can be mixed without any additional
>escape sequences? Every single code that does not have the 8th bit
>set will be normal ascii, and each consecutive pair of bytes that
>have 8th bit set will be single kanji character.

Yes. You have interpreted the standard correctly.

>  What is the correct procedure to handle error cases, like when
>there is only one byte with 8th bit set? Do I blindly take the
>next byte regardless of the 8th bit setting or ignore this single
>byte?

ISO 2022 (nor T.51, when I last looked) cover error cases.  Usually there's
text such as "this is outside the scope of the standard". 

All of the implementations that I have seen take the next byte and treat the
1xxxxxxx 0xxxxxxx unit as a character and do *something*. Some vendors,
including one with which I'm familiar, actually hide a private character set in
this virtual coded space, therefore the 1xxxxxxx 0xxxxxxx combination is a
valid graphic character.

For general information: There have been proposals to extend ISO 2022 to make
the 1xxxxxxx 0xxxxxxx space a new graphic set into which two-byte character
sets may be designated, validating certain existing implementation, but this
hasn't been approved yet.  ISO JTC1/SC2 will be considering extensions to ISO
2022 at their Plenary this October.  ECMA TC1 is already considering various
proposals to recommend to the ISO subcommittee. 

Tim Lasko, Digital Equipment Corporation, Westford MA  (lasko@regent.dec.com)
Disclaimer: My opinions are my own; the facts can speak for themselves.

inaba@snoopy.src.ricoh.co.jp (Kiyo Inaba) (01/29/91)

In article <5356@hemuli.tik.vtt.fi> savela@tel.vtt.fi (Markku Savela) writes:
#
#  Have I understood the standard right? Assuming the following in
#8-bit environment:
#
#   1) I have designated single byte code set (like normal "ASCII")
#      into G0 and invoked it into GL with LS0.
#   2) I have designated multi-byte code set (like JISC 6226 (kanji))
#      into G1 and invoked it into GR with LS1R. 
#
#  With this setup ASCII and Kanji can be mixed without any additional
#escape sequences? Every single code that does not have the 8th bit
#set will be normal ascii, and each consecutive pair of bytes that
#have 8th bit set will be single kanji character.

Unfortunately, I don't have 2022 in hand, refering JIS X0202 (which
is Japanese translation of 2022), this interpretation is correct.

#  What is the correct procedure to handle error cases, like when
#there is only one byte with 8th bit set? Do I blindly take the
#next byte regardless of the 8th bit setting or ignore this single
#byte?

According to the JIS X0202, 'Error recovery for transmitting incorrect
multi byte character set is not defined in this standard'.
But I usually, just get two byte even if the next byte's MSB is set to
0. 

BTW, JISC 6226 became JIS X0208 several years ago.

Kiyo Inaba