jssk@cbnewsl.att.com (jeffrey.s.skelton) (05/01/91)
Could somebody please give me pointers to standards on international character sets? Thanks in advance... --- --- Jeffrey Skelton | AT&Tnet: +1 (908) 870-7634 AT&T Bell Laboratories | Internet: Jeff.Skelton@att.com 185 Monmouth Pkwy | UUCP: {...att...}!abars!jupiter!jss West Long Branch, NJ 07764 | ATTMAIL: attmail!jskelton --- ---
larryp@sco.COM (Larry Philps) (05/02/91)
In <1991May1.131212.8983@cbnewsl.att.com> jssk@cbnewsl.att.com (jeffrey.s.skelton) writes: > Could somebody please give me pointers to standards on international > character sets? Here are the ones I know of, 1) ASCII - Nuf said 2) EBCDIC - More than enough said. 3) IBM pc850 - The standard PC character set. Very similar to ISO 8859/1 4) HP Roman8 - HP's equivalent of the above. Also very similar to ISO 8859/1. 5) ISO8859 - This is a set of 9 8-bit codesets that can handle most alphabetic languages. These are published final standards. 6) EUC - This is the Extended Unix Codeset. Characters can be 1, 2, 3 or 4 bytes in length, and can be intermixed. This is actually resonably popular, and is the base for AT&T's MNLS product. I have misplaced my reference, but I think it is ISO Standard 10664. 7) SJIS - JIS is a Japanese Information Standard, and SJIS is called Shift-JIS for some reason I have never figured out. It uses 16-bit characters to encode Kanji, but also allows single byte ASCII characters. 8) ISO 10646 - This is a proposed ISO standard for a 32 bit character set. In this character set, each "character" has a prefix that specifies which "code set" the rest of the character is an index into. Clear? For example one prefix would indicate ISO 8859/1, then the rest of the bits would be an index into that character set. 9) Unicode - This is being developed by a consortium of companies including IBM, Microsoft, Sun, and Next. It is a 16-bit character set, that tries handle all the characters for many languages by mapping identical shapes to the same position in Unicode, regardless of what the characters name is in different languages. In particular, the Chinese, Korean and Japanese symbols have been distilled down to about 18,000 unique characters (I think). I don't have a good reference for this one. Have fun. It's a brutal world out there. --- Larry Philps, SCO Canada, Inc (Formerly: HCR Corporation) Postman: 130 Bloor St. West, 10th floor, Toronto, Ontario. M5S 1N5 InterNet: larryp@sco.COM or larryp%scocan@uunet.uu.net UUCP: {uunet,utcsri,sco}!scocan!larryp Phone: (416) 922-1937 Fax: (416) 922-8397
rmich@Lise.Unit.NO (Rolf Michelsen) (05/03/91)
In article <1991May1.131212.8983@cbnewsl.att.com>, jssk@cbnewsl.att.com (jeffrey.s.skelton) writes: |> |> Could somebody please give me pointers to standards on international |> character sets? |> |> Thanks in advance... |> --- --- |> Jeffrey Skelton | AT&Tnet: +1 (908) 870-7634 |> AT&T Bell Laboratories | Internet: Jeff.Skelton@att.com |> 185 Monmouth Pkwy | UUCP: {...att...}!abars!jupiter!jss |> West Long Branch, NJ 07764 | ATTMAIL: attmail!jskelton |> --- --- You could have a look at the ISO standard character sets. I think the standard is numbered ISO 8859/n where n says what kind of character set you are looking for. n=1,2,3,4 gives you four (slightly) different Latin alphabets while n=7 gives you the Greek alphabet. Hope this helps. ___________________ / | / Snail-Mail: | / Rolf Michelsen | / Studpost 130 | \ 7034 Trondheim-NTH| \ E-Mail: | \ rmich@lise.unit.no| \___________________|
einari@rhi.hi.is (Einar Indridason) (05/06/91)
In article <1991May3.063018.10081@ugle.unit.no> rmich@Lise.Unit.NO (Rolf Michelsen) writes: >In article <1991May1.131212.8983@cbnewsl.att.com>, jssk@cbnewsl.att.com (jeffrey.s.skelton) writes: >|> >|> Could somebody please give me pointers to standards on international >|> character sets? >|> >|> Thanks in advance... >You could have a look at the ISO standard character sets. I think the >standard is numbered ISO 8859/n where n says what kind of character set >you are looking for. n=1,2,3,4 gives you four (slightly) different Latin >alphabets while n=7 gives you the Greek alphabet. AND: PLEASE DON'T *MASK* THE 8TH BIT. (if you do, then you will leave us poor icelanders out in the cold. Then there are other nations out there, besides us, that *NEED* the 8th bit.) -- Internet: einari@rhi.hi.is | "Just give me my command line and drag UUCP: ..!mcsun!isgate!rhi!einari | the GUIs to the waste basket!!!!" Surgeon Generals warning: Masking the 8th bit can seriously damage your brain!!
buckland@cheddar.ucs.ubc.ca (Tony Buckland) (05/06/91)
Is there a standard computer representation of the international phonetic symbol set (the one with the upside-down "e"s, etc.)?