[comp.unix] New Extended ASCII on UNIX

gs732@uxe.cso.uiuc.edu (Ghie-Hugh Song ) (05/12/88)

Hello, everyone,  

Have you ever dreamed that TeX were more WYSWYG or that you could type
Greek characters in the text mode directly?  If we had an extended
256 8-bit ASCII character set such as IBM PC's. (See Appendix of 
PC DOS Manual), things would be much easier.

  Then why not use WordPerfect or MS Word?  First, all the Greek 
charaters and the math symbols are not supported by them, unless we 
buy extra software and hardware.  In IBM's extended ASCII, there is 
no 'Greek tau', 'Greek nu', or inverted Greek capital delta symbol 
for partial differential equations. Even the registered trade mark 
sign 'R in a circle' doesn't exist. Then you might ask why not ChiWriter 
or T-cube?  Simply they are not portable!  They are graphics programs.  
They are not public-domain.  One of them is really expensive.
So TeX has been thought to be a better choice for technical writers.  But 
without a laser printer or VorTeX and a graphics workstation, TeX is 
just a text file for the sake of portability.  So something should be 
an ASCII text file for portability.  Then how about having Greek characters 
and math symbols in the ASCII character set itself?

   I've got an idea for all of us.  And I wish to write a letter to
the ANSI people about a new 256 8-bit extended ASCII character standard.  
But I don't know the ANSI's address.  So if you agree with my idea, 
please forward this message to ANSI with your opinion.  

   Let's have the Ext-key in our keyboard at the same location as the 
'Alt' key in IBM's Enhanced keyboard. I am using the term 'Ext' to 
distinguish it from GNU-Emacs' meta-editing keys.  However, the real name
of the latter half of this extended ASCII set should be
'meta', since they call it in that way in the termcap files.
Ext-p (F0-hexdec) will give us a printable character Greek-pi, 
and Ext-shift-p (D0-hexdec) will give us a printable character 
Greek-Pi (captial-pi) directly.
IBM's Greek-pi is in E3 in hexadecimal which matches 'c' (63-Hexdec) among 
128 7-bit ASCII codes. So every word processor is different in its way of 
producing pi.  It lowers the portability of word-processed texts.

   At the end of this posting, I propose my draft. Please see and
examine it.

   One may oppose this draft because the existing printers might not be used.
We can use those with a mere printer driver software with a translater
software as long as we do not type in the original text any one of the fonts 
not supported by the printer.

   I understand that the standardization of 8-bit extended ASCII
is too late.  However I know that once this is implemented on 
the new version of UNIX or POSIX, everyone will follow
this slowly.  Now people are gathering to standardize UNIX, POSIX, SVID,
or whatever.  Now is the time to express our opinion to ANSI people.  
If we lose this chance we will never have a standard 8-bit ASCII.
If you agree with my idea, write a letter to ANSI, POSIX committee
(IEEE3001?), and the acting System V.4 committee members of
AT&T-Sun-Unisys immediately for their 
prompt action.  Unfortunately, I do not know any of those addresses. 

   Thank you for your attention.

                               G. Hugh Song

                               Coordinated Science Lab.
                               Univ. of Illinois at Urbana-Champaign
                               1101 W. Springfield Av.
                               Urbana, IL 61801
                               song@uispg.csl.uiuc.edu

============================================================

   Here is my draft of 256 new 8-bit ASCII character set.  I place the 
second half of 8-bit characters (128-255) next to the first half of them.
 
   I am not decisive on what to assign to the following Ext-control keys (80-
hexdec to 9f-hexdec).  There are some options:

1) We can assign new control keys which have become neccessary
   as the computer science evolves.  Some examples are shown below.
   I wish that someone in the field rearrage the assignment and 
   complete this, since I do not have enough knowledge and current
   implementation status of i/o utilization.                                       
2) Or we may give some freedom to the manufacturers of keyboard
   and terminals.

Even though these (00-hexdec to 1f-hexdec and 80-hexdec to 9f-hexdec) 
are not legitimately printable while editing a text file, I wish there are 
corresponding printable characters, not just
as the current '^' which does not distinguish itself from 5e-hexdec.
It will ease debugging communication problems.
 
| 00 ^@ nul   80 sml  decreases character size and increases back
  01 ^a soh   81
  02 ^b stx   82 bld  boldifies and unboldifies (toggle)
  03 ^c etx   83
  04 ^d eot   84 dwn  steps down one half line spacing 
  05 ^e enq   85     
  06 ^f ack   86 
  07 ^g bel   87 grp  enters and exits graphics mode
| 08 ^h bs    88 hlp  invokes help universally. 
  09 ^i ht    89 itl  italicize and deitalicize from now 
  0a ^j nl    8a 
  0b ^k vt    8b mlm  mouse left movement    \
  0c ^l np    8c mlb  mouse left button       |
  0d ^m cr    8d mmb  mouse middle button     |  
  0e ^n so    8e mdm  mouse downward movement |  Important!
  0f ^o si    8f                              |
| 10 ^p dle   90 mum  mouse upward movement   |
  11 ^q dc1   91 mrm  mouse right movement    | 
  12 ^r dc2   92 mrb  mouse right button     /
  13 ^s dc3   93 scr  sripticizes or unscripticizes (toggle)
  14 ^t dc4   94 
  15 ^u nak   95 up   steps up one half line spacing 
  16 ^v syn   96 rev  reverses or reverses back characters's black and white  
  17 ^w etb   97 
| 18 ^x can   98    
  19 ^y em    99    
  1a ^z sub   9a   
  1b ^[ esc   9b atn  escapes during communication calling attention of
                      the local control
  1c ^\ fs    9c  
  1d ^] gs    9d    
  1e ^^ rs    9e   
  1f ^_ us    9f

   Now in the following we have printable  characters except the 'DEL' 
key at the end of the lower 7-bit codes.  The alt key may be used to send
the 8-bit code to the host computer
by simulating this key with kermit's 'set key' program such as in 
MSFERMIT version 2.30.  

   For the 7-bit terminal environment, in which 8-bit signals are not 
generated or received by the terminal,
such as VT100, it is desirable for the C-shell or the editor to have a key 
which tells the host computer that the next key is one of the upper 
8-bit codes (128-255).  This key should not contradict with a control key
of the existing editor programs.   The 'esc' key might be thought the best 
choice.  However, most editor programs use this key heavily for some other 
purposes.  To avoid conflict, the 'cr (Cntrl-m)' key, which is redundant 
both in vi and in gnuemacs (You might have noticed notice that 'C-m' is 
changed to 'nl (C-j)' automatically by both editors), may be used.

   This will limit the use of the Meta key in our (or Stallman's) GNU-Emacs.   
This actually means no revision in GNU-Emacs.  We just use the ESC key 
to invoke the Meta editing keys, although the keyboard has the Meta key. 
This is the price we pay 
for those Greek characters and the math symbols.  If we use the 'Cntrl-h' for 
the real backspace, we have to choose another key for 
invoking 'help' in GNU-Emacs.  How about the 'Ext-Cntrl-h' ('88-hexdec')
(or 'C-m C-h' on the 7-bit terminal) as a key for invoking help 
in the future version (Ver. 19) of GNU-Emacs.  This is the only change
which is not compatible to the present version (Ver.18).

   I'd like to suggest that the 'Ext-Cntrl-h (88-hexdec)' or 'Cntrl-m Cntrl-
h' on the 7-bit terminal be a new standard key invoking help in e
very software package in the future.  Isn't it a good idea?

| 20 sp    a0 a horizontal bar longer than just '-'.
  21 !     a1 a black square
  22 "     a2 the starting double quotation mark
  23 #     a3 not-equal sign '/=' in one character site
  24 $     a4 the Pound symbol (U.K. money unit)
  25 %     a5 the division symbol, ':-' in one character site
  26 &     a6 the common set in set theory, The inverted 'U'.
  27 '     a7 the starting single quatation mark
| 28 (     a8 the top portion of the left parenthesis
  29 )     a9 the top portion of the left parenthesis
  2a *     aa a small circle that usually represents degree
  2b +     ab '+_' in one character site.
  2c ,     ac the cedilla symbol without c, s, or C.
  2d -     ad '-+' in one character site with - up and + down.
  2e .     ae a dot at the center
  2f /     af a dot at the top
| 30 0     b0 the bottom portion of the right parenthesis
  31 1     b1 the proportionality symbol, 'oc' in one character site
  32 2     b2 a vertical line whose bottom is bowed to the right.
  33 3     b3 a set symbol (obtained from U by rotating it 90 deg CCW)
  34 4     b4 a vertical line with a wart in the middle as in '{'
  35 5     b5 a vertical line with a wart in the middle as in '}'
  36 6     b6 the mirror image of '6'
  37 7     b7 the symbol in the set theory, that looks like 'U'
| 38 8     b8 the infinity symbol, 'oo' in one character site
  39 9     b9 the bottom portion of the left parenthesis
  3a :     ba the umlaut, two dots overhead.
  3b ;     bb the double-prime
  3c <     bc '_<' in one character site
  3d =     bd '=_' in one character site for the defining equality
  3e >     be '_>' in one character site
  3f ?     bf .a wiggle positioned at the underline(_) level.
| 40 @     c0 the registered trademark sign, a small capital R in a circle
  41 A     c1 angstrom, a small circle on top of 'A'
  42 B     c2 an arrow heading east
  43 C     c3 the copyright symbol, a small capital 'C' in a circle
  44 D     c4 Greek capital Delta
  45 E     c5 'an element of' symbol in set theory
  46 F     c6 Greek capital Phi
  47 G     c7 Greek capital Gamma
| 48 H     c8 accented italic h for the Planck constant in quantum mechanics
  49 I     c9 the top portion of the integral symbol
  4a J     ca the bottom portion of the integral symbol
  4b K     cb a set symbol (obtained from U by rotating it 90 deg CW)
  4c L     cc Greek capital Lambda
  4d M     cd 'x' without serif, math symbol for a multiplication
  4e N     ce nabula, inverted Greek-capital-Delta
  4f O     cf Greek capital Omega
| 50 P     d0 Greek capital Pi
  51 Q     d1 Greek capital Theta
  52 R     d2 surd, usually used for a checking sign
  53 S     d3 Greek capital Sigma
  54 T     d4 the trade mark sign, the superscripted 'TM'
  55 U     d5 Greek capital Upsilon
  56 V     d6 an arrow heading west
  57 W     d7 the double dagger symbol used for a footnote.
| 58 X     d8 Greek capital Xi
  59 Y     d9 Greek capital Psi
  5a Z     da an arrow heading south.
  5b [     db a vertical line whose top is clamped to the right
  5c \     dc negated 'one element of' in set theory.
  5d ]     dd a vertical line whose top is clamped to the left
  5e ^     de an accent symbol inverted from '^'
  5f _     df an overbar, a bar on top.
| 60 `     e0 a prime (60-hexdec is a back-prime)
  61 a     e1 Greek alpha
  62 b     e2 Greek beta
  63 c     e3 Greek chi
  64 d     e4 Greek delta
  65 e     e5 Greek epsilon
  66 f     e6 Greek phi
  67 g     e7 Greek gamma
| 68 h     e8 Greek eta
  69 i     e9 Greek iota
  6a j     ea the integral symbol, enlongated s
  6b k     eb Greek kappa
  6c l     ec Greek lambda
  6d m     ed Greek mu
  6e n     ee Greek nu
  6f o     ef Greek omega
| 70 p     f0 Greek pi
  71 q     f1 Greek theta
  72 r     f2 Greek rho
  73 s     f3 Greek sigma
  74 t     f4 Greek tau
  75 u     f5 a tripple prime
  76 v     f6 the arrow symbol that represents a vector. a step-up arrow
  77 w     f7 the dagger symbol often used for a footnote.
| 78 x     f8 Greek xi
  79 y     f9 Greek psi
  7a z     fa Greek zeta
  7b {     fb a vertical line whose bottom is clamped to the right  
  7c |     fc two vertical lines in one character site
  7d }     fd a vertical line whose bottom is clamped to the left  
  7e ~     fe double wiggle for an approximate equation
- - - - - - - - - - - - - - - - - - -
  7f del   ff erh  erase the character at the current cursor position 
-------------------------------------------------------------

    These all can be reside in the text mode in 8-bit mode so that any text 
mode terminal can display them directly on the text mode screen.
The possible benefit of this extension is:

1. If every typesetting program is revised according to the new standard,
   they will become more WYSWYG. It means we do not need to type the '\alpha' 
   while typing a TeX file.
2. The wordprocessor and the typesetting programs will be cheaper since they
   do not need to include soft-font files or the hard font ROM.
3. The word processor files can easily be exported and imported from one
   word processor file to another without losing special characters as 
   long as they reside in 256 character set.

   In addition to this new extended ASCII, I think that some of the
present ASCII characters should be revised from the present 
ones.

  " 22    should be designed to look more like the closing double
          quotation mark as in typeset books.
  ' 27    the closing single quotation mark or apostrophe
          same comment as above (" 22-hexdec)
  * 2a    position this a little higher than the present height
          so that it looks like a footnoting symbol, not like a multiplication
          symbol.
  / 2f    stretch this so that two of these can be connected without breaking
          to make a long slanted line.
  \ 5c    the same comment as above (/ 2f-hexdec)
  | 7c    make this a single long vertical line rather than the present
          one broken at the middle.

   The current ANSI standard for erasing the previous character is DEL,
not backspace!  Let us encourage everyone to observe this standard.  
I know that the troublemaker IBM does not follow this standard.  
Let them go their way.  We do not care for IBM.  We are talking about UNIX 
and GNU-Emacs and TeX.  Then backspace will do the following job in
GNU-Emacs and vi.
 
 ^h 08  bs    a backspace key without erasing the previously typed 
              character, making an overprinted image when printed. This
              key is actually in the present ANSI standard.  You might have
              noticed that the UNIX 'man'ual pages contain this in their
              text files for underlining.
              It seems now fully supported by most ANSI terminals. (But not on
              IBM's)  Nevertheless, it is not supported by vi or GNU-Emacs. 
              Let's encourage Mr.Stallman to support this in his new
              version of GNU-Emacs. It will display every accented
              vowel for foreign alphabets, the 
              cent (money unit), some foreign money units, the C-cedilla 
              ('Ext-,-backspace-c'), and the null set symbol ('0/' in one 
              character site.
 ^m 0d  met   In due consideration, the mnemonic should be changed from
              'cr' to 'met'a. 


==========================================
    KEYBORAD
-----------------
   This part is not part of my proposal.  I just wish that the new ANSI 
ASCII keyboard has the following keys.  One may assign some 
function keys for the following purposes.  But it goes
without saying that separate keys at the space bar level are more desirable.

For text/graphics terminals

Italic key : italicizes the normal character. this key should be active
             only on the alpabetic characters, Greek capital characters,
             but not on numeric characters, symbols like '%', '+', '"',etc.
             On a black-and-white text-mode-only terminal which does not have
             ROM to support various fonts (such as VT100), 
             it would be desirable if this key reverses white and black
             of those characters between the two italic keys.
             Black becomes white, white becomes black. (Toggle)
Bold key :   boldens or highlights a character. (Toggle)

For graphics terminals

Step-up key : moves the position 1/2-line higher.  and then step down key
             to go back to the original line height.
Step-down key : moves the position 1/2-line lower. and then step-up key
             to go back.
Script key : displays the scripted characters. (Toggle)
Small character key : displays small characters from now and restores the 
             size back. (Toggle)

   As to the Keyboard Layout,
We do not need to have the editing keypad on the right.
Why don't we move it to the left?

=====================================End of draft=======

  I really do not know whether this effort is made first by me.  
People in the Department of Mathematical Science at New Mexico 
State University made a draft on alt-key binding similar to mine
(See T-cube Manual p.99)  I do not know that whether they tried to set 
a new 8-bit extended ASCII or just the keyboard bindings to use 
the alt-keys while using 'T-cube' internally.  

P.S. At first, I did not intend to do this as a project.  However
it turned out to be a big project.  Now I want to drop
this project and let this free to the public by posting at 
the news system here in the news system.  I hope everybody to express their 
opinion and fruitful discussion here.   And fianlly I hope to see ANSI
or POSIX committee act.

Please start this project and act, ANSI.

ct@dde.UUCP (Claus Tondering) (06/03/88)

In article <7806@mcdchg.UUCP> gs732@uxe.cso.uiuc.edu (Ghie-Hugh Song ) writes:
>   I've got an idea for all of us.  And I wish to write a letter to
>the ANSI people about a new 256 8-bit extended ASCII character standard.  
>
>   I understand that the standardization of 8-bit extended ASCII
>is too late.  However I know that once this is implemented on 
>the new version of UNIX or POSIX, everyone will follow
>this slowly.  Now people are gathering to standardize UNIX, POSIX, SVID,
>or whatever.  Now is the time to express our opinion to ANSI people.  
>If we lose this chance we will never have a standard 8-bit ASCII.

Your idea of an extended ASCII is not new. In fact the International
Standardization Organization (ISO), of which ANSI is a member, has already
adopted an extended ASCII character set. It is known as ISO 8859. The
idea has been to extend ASCII with various characters used in languages
other than English. Almost all non-English languages have special letters
(think, for example, of the French accented letters, the Spanish n with
tilde, etc.). ISO 8859 is actually not one standard but several:

	ISO 8859/1 is ASCII extended with the letters used in the
		   western European languages.
	ISO 8859/2 is ASCII extended with the letters used in the
		   eastern European languages.
	ISO 8859/3 is ASCII extended with the letters used in the
		   languages spoken around the Mediterranean.
	ISO 8859/4 is ASCII extended with the letters used in the
		   nothern European languages.
	ISO 8859/5 is ASCII extended with the letters used in the
		   Cyrillic alphabet (i.e. Russian, Bulgarian, etc.).
	ISO 8859/6 is ASCII extended with the Arabic letters.
	ISO 8859/7 is ASCII extended with the Greek letters.
	ISO 8859/8 is ASCII extended with the Hebrew letters.

The ISO 8859 standard is nice and useful, it is, however, unfortunate to
have 8 standards instead of just one.

Adopting your proposal will be unfortunate for two reasons:

1) It would alienate the USA from the rest of the world. If the USA used
   an extended ASCII without the national characters used in French,
   German, Spanish, etc., those countries would have to follow a separate
   path in the computer industry.

2) It would confuse the ISO 8859 abundance of standards further.


-- 
Claus Tondering
Dansk Data Elektronik A/S, Herlev, Denmark
E-mail: ct@dde.dk    or   ...!uunet!mcvax!diku!dde!ct

--------------------------------------------
[Also, in the same vein...]

 From: rja@edison.GE.COM (rja)
 Organization:  GE-Fanuc North America

  There is a 8-bit character set standard already.  ISO 8859 is it.
Most of the X/OPEN member companies are implementing support for the
western European variant (ISO 8859/1) already.  See the exchanges on this
in comp.std.internat and comp.std.unix for more details.

--------------------------------------------
[And ...]

 From: glennw@Sun.COM (Glenn P. Wright)

I think all of this has already been done in the ISO 8859 standard extension
to ISO 2022. Have you read this. In particular IS 8859/7 handles greek.

Glenn Wright {..}glennw@sun, or, {..}sun!glennw
============
Sun Microsystems Inc, Mountain View, California, USA.
Tel: (415) 960 1300