kjelle@erialfa.UUCP (Kjell Eriksson) (11/13/85)
Here is a very short summary of what the ANSI standars says on code extention. I hope it's correct. I'm sure you'll let me know if it isn't. ANSI X3.41-1974 (Code extension techniques for use with the 7-bit .... ASCII) and ANSI X3.64-1979 (Additional controls for .... ), on which the VT100 is based, provides for the following ways of extending the graphic character set. 1) Locking shifts between G0 and G1 with SI in SO. Appendix A also mentions ESI and ESO, Extended Shift In and Out for locking shifts to the G2 and G3 sets, but no encoding is given neither there nor in X3.64. 2) Nonlocking shifts to G2 and G3 with SS2 and SS3. Encoded in X3.64 as <esc> 'N' and <esc> 'O' 3) Changeing characters sets with <esc> '(' F for G0 and and <esc> ')' F for G1. F is a byte between hex(40) and hex(7E). In appendix A, <esc> '*' F and <esc> '+' F is mentioned for selecting G2 and G3. They are not part of the standard. Question: Is there any standard for the meaning of F above. In the VT100, A = USASCII and B = Brittish. 4) Most interesting here is <esc> '$' F, that selects a multi byte character set to be used as the G0-set. All characters in the set must have the same number of bytes and must all be in the range hex(20) to hex(7E). A two byte set would give 9025 graphic characters per set. -- --------------------------------+--------------------------------------- Kjell Eiksson I Phone: + 46 8 7520000 Eriksson Information Systems AB I Telex: 15968 ericki S-163 98 Stockholm I UUCP: mcvax!enea!erix!erialfa!kjelle SWEDEN I
dik@zuring.UUCP (11/14/85)
In article <110@erialfa.UUCP> kjelle@erialfa.UUCP (Kjell Eriksson) writes: >3) Changeing characters sets with <esc> '(' F for G0 and > and <esc> ')' F for G1. F is a byte between hex(40) and hex(7E). > In appendix A, <esc> '*' F and <esc> '+' F is mentioned for selecting > G2 and G3. They are not part of the standard. > >Question: Is there any standard for the meaning of F above. > In the VT100, A = USASCII and B = Brittish. > The standardisation of character sets is a combined effort of ISO (the international standards organisation), ECMA (the European computer manifacturers) and CCITT (the international telephone and telegraph association). Through ISO participated the national standards institutes like ANSI. Either ANSI X3.41 or the associated ISO standard defines the sequences to designate alternate character sets (I do not know which from memory; but probably both): DG0 ESC ( s Designate G0 to G3 DG1 ESC ) s s is from the table below. DG2 ESC * s DG3 ESC + s XDG0 ESC , s Designate G0 to G3 XDG1 ESC - s s is not yet standardised. XDG2 ESC . s XDG3 ESC ? s MBC ESC $ s Designate multibyte character set. Character sets are put in a registry by AFNOR (the french standardisation institute) on behalf of ISO. As of 1977 the following character sets were registered: Symbol Organisation Year Use @ ISO 1975 Reference version A BSI 1975 British use B ANSI 1975 ASCII C NATS 1975 Swedish/Finnish for newspaper use D NATS 1975 Additional symbols to C E NATS 1975 Danish/Norwegian for newspaper use F NATS 1975 Additional symbols to E G SIS 1975 Swedish use H SIS 1975 Swedish use for names I JISC 1975 Katakana J JISC 1975 Latin set for Japanese use K DIN 1975 German use L ECMA 1977 Portugese use R AFNOR 1975 French use U ECMA 1976 Mixed latin greek (also LC latin) V ISO ???? additional for bibliographic use (draft) W ISO ???? Cyrillic for bibliographic use (draft) X ISO 1976 Greek for bibliographic use Y ECMA 1976 Italian use Z ECMA 1976 Spanish use [ ECMA 1976 Greek use \ ECMA 1976 Mixed latin greek (UC only) Organisations: AFNOR French standards institute ANSI American National Standards Institute BSI British Standards Institute DIN German standards institute ECMA European Computer Manifacturers Association ISO International Standards Organisation JISC Japanese Industrial Standards Committee NATS Scandinavian Newspaper Technical Cooperation Council SIS Swedish standards institute This list is not complete of course (it is 8 years old); missing is for example the multibyte japanese kanji character set. Any additional information to this list is welcome. Also missing is RUSCII (eh GOST 13052), the russian cyrillic character set. For those interested in its encoding, send me mail and I will send it back. -- dik t. winter, cwi, amsterdam, nederland UUCP: {seismo|decvax|philabs}!mcvax!dik