[comp.os.msdos.programmer] Unicode: Details Please

neil@progress.COM (Neil Galarneau) (02/14/91)

I have heard of a new multi-lingual character set called Unicode.

It is supposed to give one all the character sets in the world in 16-bit
charcters.  It is supposed to be backed by several Unix companies, Apple,
and the DOS companies.

Other than that, I have no details.

Does anyone know where I can get a spec?  I need to evaluate this proposed
"standard".



Thanks,

Neil
neil@progress.com

garry@ceco.ceco.com (Garry Garrett) (02/15/91)

In article <1991Feb14.001842.24415@progress.com>, neil@progress.COM (Neil Galarneau) writes:
> I have heard of a new multi-lingual character set called Unicode.
> 
> It is supposed to give one all the character sets in the world in 16-bit
> charcters.  It is supposed to be backed by several Unix companies, Apple,
> and the DOS companies.
> 
> ...

	I hope not.  I was working on this myself, and I was also going to
include several other features as well.  Well, I they are working on it I
hope that they were smart enough to make character 0 = '0', char 1 = '1'...
char 9 = '9', char 10 = 'A', char 11 = 'B' ...   This would make conversions to
Hexidecimal (or Octal) much much easier.  You see there is no real
reason that the control characters HAD to occupy the first 32 characters in
ASCII.  (They could have just as easily made them the last 32 and used 
positive logic circuts rather than negitve logic)  IMHO, many of the 
problems that programmers face with character sets (ASCII & EBCDIC) are
that they were designed by engineers who were (naturally) more concerned
with what was going to be easy to build hardware-wise.  You only build
hardware once, but people write software for it for years.  If it's a little
bit harder to build, but easier to program, it's worth it.  
	As for my thoughts on a NEW character set, there is no reason why 
all written languages could not be included.	There are also a wealth of
special characters that could be included, making people of various professions
that use computers, jobs easier.  Symbols like less-than-or-equal-to and 
not-equal-to, could make every programmer's job easier.  Meteorology has alot
of symbols that could be included in the character set that would simplify
the excange of weather data, for example.  We also need not limit a new 
character set to today's technology.  (ASCII was designed for teletype machines)
What I mean by this is that we should include characters that represent colors
and music.  granted, not everyone's computer has these capablities today, but
why limit the character set?  If your computer doesn't have a speaker, ignore
the music characters.  If you don't have the capabilities that some given 
character implies, then take an appropriate action that is within your hardware's
realm.  I think that there are alot of special characters that would help 
to unify word processing files (like a character for Boldface-on, Itallics-off...)
If these characters existed in the character set, word processors would not
need to make up their own representation for these things, and thus they 
could use "standard" unicode files.  Immagine, having a file of "music" 
characters: you could "print" it to your synthizer and listen to it, or
you could "print" it to you printer, and get out sheet music.  (I realize that
this is a bit idealized, but I think that it is possible).  Joe Musician could
write his new song on a computer, upload it to the studio,  Record it, 
(most likely his record label will sell the theme to it to a video game maker
to include as background to a game), and the record company will put it on
a CD ROM with it's other Top 40 songs of the month to distribute to record stores
so that you can come in and buy a copy of the sheet music (which the music
store prints off on it's laser printer from the file on the CD ROM).  I am not
saying that this form of marketing is my goal, but I am only trying to show how
much time & effort can be saved for members of a certain profession, if they
are kept in mind when a new code is developed.

	I certainly hope that if there is a Unicode, that it's makers have 
had such a far reaching outlook on it's possiblities.  It would be a shame
for a new "standard" to emerge that is outdated about the time that it is
accepted.  If any of you out there have some ideas for things that may be
included in my character set, please e-mail them to me.  I still plan
on working on this unless I get some more info on Unicode, and it does
have some forthought to it.

Garry Garrett 
garry@ceco.ceco.com

yawei@bronze.ucs.indiana.edu (mr. yawei) (02/15/91)

In article <405@ceco.ceco.com> garry@ceco.ceco.com (Garry Garrett) writes:
>In article <1991Feb14.001842.24415@progress.com>, neil@progress.COM (Neil Galarneau) writes:
>> I have heard of a new multi-lingual character set called Unicode.
>> 
>> It is supposed to give one all the character sets in the world in 16-bit
>> charcters.  It is supposed to be backed by several Unix companies, Apple,
>> and the DOS companies.

   This probably doesn't belong here, but I don't think it is possible
to include *ALL* the character sets in the world. For example, one can
not possibliy include the entire Chinese character set for two reasons: 
(1) its cardinality is huge, (2) the set is unbounded.

   As far as Chinese characters are concerned, what unicode may be able
to do is to include only the most frequently used ones, and then provide 
a composition mechanism to generate less frequently used ones when they 
are needed.

   yawei

Norbert.Zacharias@arbi.informatik.uni-oldenburg.de (Norbert Zacharias) (02/15/91)

yawei@bronze.ucs.indiana.edu (mr. yawei) writes:

>In article <405@ceco.ceco.com> garry@ceco.ceco.com (Garry Garrett) writes:
>>In article <1991Feb14.001842.24415@progress.com>, neil@progress.COM (Neil Galarneau) writes:
>>> I have heard of a new multi-lingual character set called Unicode.
>>> 
>>> It is supposed to give one all the character sets in the world in 16-bit
>>> charcters.  It is supposed to be backed by several Unix companies, Apple,
>>> and the DOS companies.

>   This probably doesn't belong here, but I don't think it is possible
>to include *ALL* the character sets in the world. For example, one can
>not possibliy include the entire Chinese character set for two reasons: 
>(1) its cardinality is huge, (2) the set is unbounded.

>   As far as Chinese characters are concerned, what unicode may be able
>to do is to include only the most frequently used ones, and then provide 
>a composition mechanism to generate less frequently used ones when they 
>are needed.

>   yawei
Hi all
I know that there is an Code that contains every usual character include
the chinese one. It was developt by the GMD (Gesellschaft fuer Mathematik und
Datenverabeitung) for the chinese/japanese version of there OS EUMEL in 85/86.
If one is interested in i'll try to get a file wich contains the definition
from GMD .(i only have a map with the chars )

Norbert
-- 
=============================================================================
Norbert Zacharias          Norbert.Zacharias@arbi.informatik.uni-oldenburg.de
FB Physik                                               148964@DOLUNI1.bitnet
Carl-von-Ossietzky-Universitaet
Tel. 0049-441-7983527
 Was Du nicht willst das man Dir tu, das will auch nicht was willst denn Du?
							   Heinz Erhard
=============================================================================

einari@rhi.hi.is (Einar Indridason) (02/16/91)

In article <405@ceco.ceco.com> garry@ceco.ceco.com (Garry Garrett) writes:
>positive logic circuts rather than negitve logic)  IMHO, many of the 
>problems that programmers face with character sets (ASCII & EBCDIC) are
>that they were designed by engineers who were (naturally) more concerned

And more often than not, they came from the USA.  Not from Europe, (or 
Iceland for that matter), because there is no thought in many 'standard'
character sets for those that have to use some characters outside of the
7 bit range.  Therefore some programmers thought:  "it is allright to
mask the 8th bit.  Nobody uses it!!"  
Well, they are wrong!!!   (Please don't mask the 8th bit!)


>	As for my thoughts on a NEW character set, there is no reason why 
>all written languages could not be included.	There are also a wealth of
>special characters that could be included, making people of various professions
>that use computers, jobs easier.  Symbols like less-than-or-equal-to and 
>not-equal-to, could make every programmer's job easier.  Meteorology has alot

A new character set?  Fine with me, as long as we can use all our 36 (not 26,
but 36) characters, plus the numericals and other 'non-letters'
Of those 36 letters that we, here in Iceland, uses, there are 10 upper case
and 10 lower case characters that *must* be placed in the higher half of ASCII
set.
Perhaps you might understand our frustration when we must put a whole lot of
people, (that could be doing some other things), into 'icelandify' a bunch of
*badly* written software :-(
(Think what will happen when the need for 16bit character set starts to spread
out?)

>accepted.  If any of you out there have some ideas for things that may be
>included in my character set, please e-mail them to me.  I still plan
>on working on this unless I get some more info on Unicode, and it does
>have some forthought to it.
>

Here is an idea:  (no offence ment)
	don't mask the 8th bit.  (or the 16th bit?)





--
Internet:    einari@rhi.hi.is        |   "Just give me my command line and drag
UUCP:    ..!mcsun!isgate!rhi!einari  |   the GUIs to the waste basket!!!!"

Surgeon Generals warning:  Masking the 8th bit can seriously damage your brain!!