[comp.std.internat] 7-bit ASCII vs. 8-bit ASCII

nukim@ndsuvax.UUCP (kyongsok kim) (04/12/89)

    When 7-bit ascii code is used on 8-bit machines, I guess that the msb
(most significant bit) is set to zero.  For example, "A" is 100 0001 in 7-bit
ascii code and it will be represented as 0100 0001 on 8-bit machines.

    In some book, I found that there is a 8-bit ASCII-8 code,
which is different from the 7-bit code w/ a leading zero prefixed.
The book says that, for example, "A" is 1010 0001 and "1" is 0101 0001
in 8-ASCII code.

    My questions are:

    1) what is ASCII-8 code?  a good reference or table?
    2) is ASCII-8 different from the 7-bit ascii code w/ a leading zero
       prefixed.
    3) where is this code used?

Thanks in advance.  Please send e-mail.

Kyongsok Kim
Dept. of Comp. Sci., North Dakota State University

e-mail address:
    nukim@plains.nodak.edu
    nukim@ndsuvax.bitnet
    uunet!ndsuvax!nukim

nukim@ndsuvax.UUCP (kyongsok kim) (04/18/89)

In article <2542@ndsuvax.UUCP> nukim@ndsuvax.UUCP (kyongsok kim) writes:
:
:    In some book, I found that there is a 8-bit ASCII-8 code,
:which is different from the 7-bit code w/ a leading zero prefixed.
:The book says that, for example, "A" is 1010 0001 and "1" is 0101 0001
:in 8-ASCII code.

Thanks to all who responded to my question.  Here goes the summary:

> The original IBM System 360 had a special ASCII-8 mode ...
> It was never implemented...
>
> ... a form that IBM introduced with the 360 back in the 1960s.  It
> was not a superset of standard ASCII and died a quiet death.  That may
> be what your book was referring to.  If so, ignore it except for
> computer archeology purposes.
>
> I know of no systems where any such 8-bit ASCII code is used.
>

k kim

#! rnews           1969
Path: psuvm.bitnet!cunyvm!

billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe,2847,) (04/18/89)

From article <2568@ndsuvax.UUCP>, by nukim@ndsuvax.UUCP (kyongsok kim):
> In article <2542@ndsuvax.UUCP> nukim@ndsuvax.UUCP (kyongsok kim) writes:
> :    In some book, I found that there is a 8-bit ASCII-8 code,
> :which is different from the 7-bit code w/ a leading zero prefixed.
> :The book says that, for example, "A" is 1010 0001 and "1" is 0101 0001
> :in 8-ASCII code.
> 
> Thanks to all who responded to my question.  Here goes the summary:
> 
>> The original IBM System 360 had a special ASCII-8 mode ...
>> It was never implemented...   (etc.)

     8-bit ASCII is simply the American Standard corresponding to
     ISO Latin 1, ISO 8859/1-9.  The statement of equivalence, and 
     a table displaying the character set, appeared in Byte several 
     years ago (circa 1985-1987); unfortunately, I don't remember the
     exact issue, nor have I ever gotten around to looking it up.

     (One of those things I've always meant to do, but never gotten done)

     At any rate, check Byte over roughly that time span, and post the
     *exact* reference for the rest of us, if you would...

     (BTW, since 8-bit ASCII contains all the European characters,
      it is quite unfortunate that there is so much inertia in industry...)


     Bill Wolfe, wtwolfe@hubcap.clemson.edu

wtwolfe@hubcap.clemson.edu (Bill Wolfe) (04/21/89)

   [This followup was sent to me by Barry Sigfried, who
    requested that I post it to comp.std.internat...]

From: bs7086@wucs2.wustl.edu (Barry Siegfried)
Subject: Re: 7-bit ASCII vs. 8-bit ASCII
Summary: Byte article on 8-bit ASCII draft standard

In article <5153@hubcap.clemson.edu>, billwolf%hazel.cs.clemson.edu@hubcap.clemson.edu (William Thomas Wolfe,2847,) writes:
>
>      8-bit ASCII is simply the American Standard corresponding to
>      ISO Latin 1, ISO 8859/1-9.  The statement of equivalence, and 
>      a table displaying the character set, appeared in Byte several 
>      years ago (circa 1985-1987); unfortunately, I don't remember the
>      exact issue, nor have I ever gotten around to looking it up.  [...]
>
>      At any rate, check Byte over roughly that time span, and post the
>      *exact* reference for the rest of us, if you would...

The Byte article (August 1985, pp 24-25) was written by Thomas N. Hastings 
of Maynard, MA, and was titled "8-bit ASCII Draft Standard."  It was a 
letter to the editor.

Please post this to comp.std.internat.  I can read that group but can't 
post to it.

Thanks,
Barry Siegfried
bs7086@wucs2.wustl.edu

greger@ism780b (Greger Leijonhufvud) (04/25/89)

In article <Apr.19.10.41.28.1989.7554@paul.rutgers.edu> halldors@paul.rutgers.edu (Magnus M Halldorsson) writes:
>The ISO 8859 character sets specify sets for specific languages. Now
>what if one wants to use a combination of those? Is there any standard
>for storing, representing, and switching between various (ISO)
>character sets? What if one wants to allow for Japanese or Chinese as
>well?
>
>Magnus

There are several standardized (and several not yet blessed) techniques for
"mixing codesets". The /usr/group Subcommittee on Internationalization
has been studying several techniques for a while, and may even propose
something to POSIX (or whoever the appropriate forum is).

The AT&T "EUC" (Extended UNIX Codes) method is the only one so far
implemented within UNIX for "internal use". This was done in Japan, 
because the Japanese language typically is written with 3 different 
script systems (Kanji, Katakana and Hiragana). 
The EUC scheme is based on the ISO 2022 single-shift coding:

	7-bit ASCII is always present as code set 0.
	All other code sets must have the high-order bit set
	in all bytes.
	Code set 1 is distinguished by the high order bit set.
	Code set 2 has the high order bit set, and each character
	is prepended by the ISO 2022 SS2 (8e) character.
	Code set 3 has the high order bit set, and each character
	is prepended by the ISO 2022 SS3 (8f) character.

This scheme supports (in theory) 4 different code sets. For 8859
compatible code sets, of course, it only supports 3 (as ASCII is
part of each code set), and it does not support code sets that does
not conform to ISO 2022 (such as the IBM Extended ASCII used on
PC's, or the Shift-JIS code set.

A more generalized scheme is the "Compound String" method, also endorsed
by ISO. It may very well be the X Windows encoding scheme for
interchange or internal representation.

There are also other encoding schemes, by Sun, Xerox and other
companies.

There is, however, no standard as yet. Unfortunately. But, from V.4,
you should be able to mix Icelandic with Bulgarian, and get your
Greek quotations OK, too.

Greger Leijonhufvud
Interactive Systems Corp.
Sunny Santa Monica, Ca.
uunet!ism780c!greger

rja@edison.GE.COM (rja) (04/25/89)

In article <Apr.19.10.41.28.1989.7554@paul.rutgers.edu>, halldors@paul.rutgers.edu (Magnus M Halldorsson) writes:
> The ISO 8859 character sets specify sets for specific languages. Now
> what if one wants to use a combination of those? Is there any standard
> for storing, representing, and switching between various (ISO)
> character sets? What if one wants to allow for Japanese or Chinese as
> well?
> 

The Chinese standard is reportedly going to reserve the characters
(decimal) 0 thru 255 for romanised characters.  I've forgotten what
the Japanese standard say, but it is possible that 128-255 are used
for either Hiragana or Katakana.

SS1 and SS2 are freqently used to shift character sets.  A good place 
to look for European usage is X/OPEN.  For Asian character sets, you'll
have to acquire the standards.

deh0654@sjfc.UUCP (Dennis E. Hamilton) (04/28/89)

In article <26644@ism780c.isc.com> greger@ism780b.UUCP (Greger Leijonhufvud) writes:
>In article <Apr.19.10.41.28.1989.7554@paul.rutgers.edu> halldors@paul.rutgers.edu (Magnus M Halldorsson) writes:
>>The ISO 8859 character sets specify sets for specific languages. Now
>>what if one wants to use a combination of those? Is there any standard
>>for storing, representing, and switching between various (ISO)
>>character sets? What if one wants to allow for Japanese or Chinese as
>>well?
>[discussion of EUC and other Unix-flavored proposals]
>There are also other encoding schemes, by Sun, Xerox and other
>companies.
>
>There is, however, no standard as yet. Unfortunately. But, from V.4,
>you should be able to mix Icelandic with Bulgarian, and get your
>Greek quotations OK, too.
There has been an ISO scheme for mixing code sets for some time now.
ISO 2022-1973 specified basically unlimited code-extension
techniques, and you can use either 7-bit or 8--bit ASCII to carry it in.
(The 7-bit scheme has a shifting scheme for getting to the other
codes that would normally have bit 8 set).  Although it can be a little
painful, there are alphabet registration systems that allow international
identification of the code used in the code stream itself.  You can use
the code identification procedure to switch the 7/8-bit "window" over those
codes you want to use at any particular time.

When the special 8-bit character codes that are talked about here
become approved, there will presumably also be internationally approved
"announcement sequences" for shifting in and out of them.

This works for communication better than for internal processing, of
course.  For what you want to see *internally* in a particular
computer system, I suppose POSIX and other standards will have to
make provision (and the C Language will become interesting, too).  But for
interchange purposes via 7/8-bit data streams, all of the machinery has
been defined for some time, including the procedure for international
registration of special code tables.


-- Dennis E. Hamilton {uucp: ... !rochester!cci632!sjfc!deh0654}
	Robert Anson Heinlein, 1907-1988
	May the First Muster always answer to your names.