[comp.misc] International Character Sets

jssk@cbnewsl.att.com (jeffrey.s.skelton) (05/01/91)

Could somebody please give me pointers to standards on international
character sets?

Thanks in advance...
---                                                                ---
 Jeffrey Skelton            |  AT&Tnet: +1 (908) 870-7634
 AT&T Bell Laboratories     | Internet: Jeff.Skelton@att.com
 185 Monmouth Pkwy          |     UUCP: {...att...}!abars!jupiter!jss
 West Long Branch, NJ 07764 |  ATTMAIL: attmail!jskelton         
---                                                                ---

larryp@sco.COM (Larry Philps) (05/02/91)

In <1991May1.131212.8983@cbnewsl.att.com> jssk@cbnewsl.att.com (jeffrey.s.skelton) writes:


> Could somebody please give me pointers to standards on international
> character sets?

Here are the ones I know of,

1) ASCII	- Nuf said

2) EBCDIC	- More than enough said.

3) IBM pc850	- The standard PC character set.  Very similar to ISO 8859/1

4) HP Roman8	- HP's equivalent of the above.  Also very similar to
		  ISO 8859/1.

5) ISO8859	- This is a set of 9 8-bit codesets that can handle most
		  alphabetic languages.  These are published final standards.

6) EUC		- This is the Extended Unix Codeset.  Characters can be
		  1, 2, 3 or 4 bytes in length, and can be intermixed.
		  This is actually resonably popular, and is the base for
		  AT&T's MNLS product.  I have misplaced my reference, but
		  I think it is ISO Standard 10664.

7) SJIS		- JIS is a Japanese Information Standard, and SJIS is called
		  Shift-JIS for some reason I have never figured out.
		  It uses 16-bit characters to encode Kanji, but also allows
		  single byte ASCII characters.

8) ISO 10646	- This is a proposed ISO standard for a 32 bit character
		  set.  In this character set, each "character" has a
		  prefix that specifies which "code set" the rest of the
		  character is an index into.  Clear?  For example one
		  prefix would indicate ISO 8859/1, then the rest of the
		  bits would be an index into that character set.

9) Unicode	- This is being developed by a consortium of companies
		  including IBM, Microsoft, Sun, and Next.  It is a 16-bit
		  character set, that tries handle all the characters
		  for many languages by mapping identical shapes to the
		  same position in Unicode, regardless of what the characters
		  name is in different languages.  In particular, the
		  Chinese, Korean and Japanese symbols have been distilled
		  down to about 18,000 unique characters (I think).  I
		  don't have a good reference for this one.

Have fun.  It's a brutal world out there.

---
Larry Philps,	 SCO Canada, Inc (Formerly: HCR Corporation)
Postman:  130 Bloor St. West, 10th floor, Toronto, Ontario.  M5S 1N5
InterNet: larryp@sco.COM  or larryp%scocan@uunet.uu.net
UUCP:	 {uunet,utcsri,sco}!scocan!larryp
Phone:	 (416) 922-1937
Fax:	 (416) 922-8397

rmich@Lise.Unit.NO (Rolf Michelsen) (05/03/91)

In article <1991May1.131212.8983@cbnewsl.att.com>, jssk@cbnewsl.att.com (jeffrey.s.skelton) writes:
|> 
|> Could somebody please give me pointers to standards on international
|> character sets?
|> 
|> Thanks in advance...
|> ---                                                                ---
|>  Jeffrey Skelton            |  AT&Tnet: +1 (908) 870-7634
|>  AT&T Bell Laboratories     | Internet: Jeff.Skelton@att.com
|>  185 Monmouth Pkwy          |     UUCP: {...att...}!abars!jupiter!jss
|>  West Long Branch, NJ 07764 |  ATTMAIL: attmail!jskelton         
|> ---                                                                ---


You could have a look at the ISO standard character sets. I think the 
standard is numbered ISO 8859/n where n says what kind of character set
you are looking for. n=1,2,3,4 gives you four (slightly) different Latin
alphabets while n=7 gives you the Greek alphabet.

Hope this helps.

    ___________________  
   /                   | 
  / Snail-Mail:        | 
 /   Rolf Michelsen    |
/    Studpost 130      |
\    7034 Trondheim-NTH|
 \  E-Mail:            |
  \  rmich@lise.unit.no|
   \___________________| 

einari@rhi.hi.is (Einar Indridason) (05/06/91)

In article <1991May3.063018.10081@ugle.unit.no> rmich@Lise.Unit.NO (Rolf Michelsen) writes:
>In article <1991May1.131212.8983@cbnewsl.att.com>, jssk@cbnewsl.att.com (jeffrey.s.skelton) writes:
>|> 
>|> Could somebody please give me pointers to standards on international
>|> character sets?
>|> 
>|> Thanks in advance...
>You could have a look at the ISO standard character sets. I think the 
>standard is numbered ISO 8859/n where n says what kind of character set
>you are looking for. n=1,2,3,4 gives you four (slightly) different Latin
>alphabets while n=7 gives you the Greek alphabet.



AND:     PLEASE DON'T *MASK* THE 8TH BIT.

(if you do, then you will leave us poor icelanders out in the cold.  Then there
are other nations out there, besides us, that *NEED* the 8th bit.)



--
Internet:    einari@rhi.hi.is        |   "Just give me my command line and drag
UUCP:    ..!mcsun!isgate!rhi!einari  |   the GUIs to the waste basket!!!!"

Surgeon Generals warning:  Masking the 8th bit can seriously damage your brain!!

buckland@cheddar.ucs.ubc.ca (Tony Buckland) (05/06/91)

 Is there a standard computer representation of the international
 phonetic symbol set (the one with the upside-down "e"s, etc.)?