[net.internat] On code extensions and ANSI.

kjelle@erialfa.UUCP (Kjell Eriksson) (11/13/85)

Here is a very short summary of what the ANSI standars says on code extention.
I hope it's correct. I'm sure you'll let me know if it isn't.

ANSI X3.41-1974 (Code extension techniques for use with the 7-bit .... ASCII)
and
ANSI X3.64-1979 (Additional controls for .... ), on which the VT100 is based,
provides for the following ways of extending the graphic character set.

1) Locking shifts between G0 and G1 with SI in SO.
   Appendix A also mentions ESI and ESO, Extended Shift In and Out for
   locking shifts to the G2 and G3 sets, but no encoding is given neither
   there nor in X3.64.
   
2) Nonlocking shifts to G2 and G3 with SS2 and SS3. Encoded in X3.64 as
   <esc> 'N' and <esc> 'O'
   
3) Changeing characters sets with <esc> '(' F for G0 and
   and <esc> ')' F for G1. F is a byte between hex(40) and hex(7E).
   In appendix A, <esc> '*' F and <esc> '+' F is mentioned for selecting
   G2 and G3. They are not part of the standard.

Question: Is there any standard for the meaning of F above.
    In the VT100, A = USASCII and B = Brittish.

4) Most interesting here is <esc> '$' F, that selects a multi byte
   character set to be used as the G0-set. All characters in the set
   must have the same number of bytes and must all be in the range
   hex(20) to hex(7E). A two byte set would give 9025 graphic characters
   per set.

-- 
--------------------------------+---------------------------------------
Kjell Eiksson			I Phone: + 46 8 7520000
Eriksson Information Systems AB	I Telex: 15968 ericki
S-163 98  Stockholm		I UUCP:  mcvax!enea!erix!erialfa!kjelle
SWEDEN				I

dik@zuring.UUCP (11/14/85)

In article <110@erialfa.UUCP> kjelle@erialfa.UUCP (Kjell Eriksson) writes:
>3) Changeing characters sets with <esc> '(' F for G0 and
>   and <esc> ')' F for G1. F is a byte between hex(40) and hex(7E).
>   In appendix A, <esc> '*' F and <esc> '+' F is mentioned for selecting
>   G2 and G3. They are not part of the standard.
>
>Question: Is there any standard for the meaning of F above.
>    In the VT100, A = USASCII and B = Brittish.
>
The standardisation of character sets is a combined effort of ISO
(the international standards organisation), ECMA (the European
computer manifacturers) and CCITT (the international telephone and
telegraph association).  Through ISO participated the national
standards institutes like ANSI.

Either ANSI X3.41 or the associated ISO standard defines the
sequences to designate alternate character sets (I do not know
which from memory; but probably both):

DG0	ESC ( s		Designate G0 to G3
DG1	ESC ) s		s is from the table below.
DG2	ESC * s
DG3	ESC + s

XDG0	ESC , s		Designate G0 to G3
XDG1	ESC - s		s is not yet standardised.
XDG2	ESC . s
XDG3	ESC ? s

MBC	ESC $ s		Designate multibyte character set.

Character sets are put in a registry by AFNOR (the french standardisation
institute) on behalf of ISO.

As of 1977 the following character sets were registered:

Symbol	Organisation	Year	Use

@	ISO		1975	Reference version
A	BSI		1975	British use
B	ANSI		1975	ASCII
C	NATS		1975	Swedish/Finnish for newspaper use
D	NATS		1975	Additional symbols to C
E	NATS		1975	Danish/Norwegian for newspaper use
F	NATS		1975	Additional symbols to E
G	SIS		1975	Swedish use
H	SIS		1975	Swedish use for names
I	JISC		1975	Katakana
J	JISC		1975	Latin set for Japanese use
K	DIN		1975	German use
L	ECMA		1977	Portugese use
R	AFNOR		1975	French use
U	ECMA		1976	Mixed latin greek (also LC latin)
V	ISO		????	additional for bibliographic use (draft)
W	ISO		????	Cyrillic for bibliographic use (draft)
X	ISO		1976	Greek for bibliographic use
Y	ECMA		1976	Italian use
Z	ECMA		1976	Spanish use
[	ECMA		1976	Greek use
\	ECMA		1976	Mixed latin greek (UC only)

Organisations:
AFNOR	French standards institute
ANSI	American National Standards Institute
BSI	British Standards Institute
DIN	German standards institute
ECMA	European Computer Manifacturers Association
ISO	International Standards Organisation
JISC	Japanese Industrial Standards Committee
NATS	Scandinavian Newspaper Technical Cooperation Council
SIS	Swedish standards institute

This list is not complete of course (it is 8 years old); missing is
for example the multibyte japanese kanji character set.
Any additional information to this list is welcome.

Also missing is RUSCII (eh GOST 13052), the russian cyrillic character
set.
For those interested in its encoding, send me mail and I will send it
back.
-- 
dik t. winter, cwi, amsterdam, nederland
UUCP: {seismo|decvax|philabs}!mcvax!dik