[comp.os.msdos.programmer] Supporting international character sets

tr@samadams.princeton.edu (Tom Reingold) (10/23/90)

I would like to know about supporting international character sets in
an MS-DOS based application.  Is there a document that covers this?

My MS-DOS manual has some charts for some character sets, and I don't
fully understand how they are used.  For example, in the Norwegian
character set, values 0 through 127 are regular ASCII and values 128
through 255 give the alternate characters, e.g. 132 is lower case 'a'
with umlaut.  Does this mean that when the user hits his key marked
with that letter than the application thinks he hit an ASCII 132?  What
about scan codes for these keyboards?  Are charts available?  Or are
the keyboards the same with different key caps?

Thanks.
--
        Tom Reingold
        tr@samadams.princeton.edu  OR  ...!princeton!samadams!tr
        "Brew strength depends upon the
        amount of coffee used." -Black&Decker

einari@rhi.hi.is (Einar Indridason) (10/24/90)

In article <3866@rossignol.Princeton.EDU> tr@samadams.princeton.edu (Tom Reingold) writes:
>through 255 give the alternate characters, e.g. 132 is lower case 'a'
>with umlaut.  Does this mean that when the user hits his key marked
>with that letter than the application thinks he hit an ASCII 132?  What
>about scan codes for these keyboards?  Are charts available?  Or are
>the keyboards the same with different key caps?

The answer is that it depends on the application program.  Some programs
are written by STUPID authors that MASK the 8th bit.  I hate it!!!!!!!!!!
Especially since sometimes it is more work to MASK it than to leave it
alone!

Here in Iceland (and you Americans, note that down), we *must* use
additionally 10 lower case characters and 10 upper case characters that
are not part of ASCII.  They are however part of ISO 8859/1, codepage
861 and 850 from IBM for the PC/AT machines (and part of the Macintosh I
think?) 

For example DBase II, III, III+ doesn't allow us to use the big THorn. 
Instead it sees EOF and quits.  Wordstar used (and maybe still does)
the 8th bit to say: "print space, mask the 8th bit and print that
character".  WordPerfect does accept our characters, but it represents
them in a special way.  
(It accept our characters, which is more than could be said about some
other programs!!!) 
The Borland stuff usually behaves well about our 'special characters'



PROGRAMMERS:  UNITE.  **DON'T** MASK THE 8TH BIT!!!


--
Internet: einari@rhi.hi.is        |   "Just give me my command line and throw
UUCP: ..!mcsun!isgate!rhi!einari  |   the GUIs in the waste basket!!!!"

General Surgeons warning:  Masking the 8th bit can seriously damage your brain!!

steveha@microsoft.UUCP (Steve Hastings) (10/26/90)

In article <3866@rossignol.Princeton.EDU> tr@samadams.princeton.edu (Tom Reingold) writes:
>I would like to know about supporting international character sets in
>an MS-DOS based application.  Is there a document that covers this?

Support for international character sets is provided via "code pages."  Code
pages are explained fairly well in the DOS manual.  Code pages are available
in DOS 3.3 and up.

There are special code pages for some languages, and there is one code page
that works for almost every Western language.  It is called code page 850,
and it has a number of accented characters not available in the original
IBM PC character set (which is now called code page 437).  Note that when
you use code page 850, you lose certain line-drawing characters: the ones
that connect a single box-line to a double box-line.  For example, in code
page 437 character 0xC6 is a vertical line-draw character connecting to a
double-line on its right: |=   but in code page 850 it is an 'a' with a tilde
accent over it.

OS/2 defaults to code page 850, not code page 437 as DOS does.


>My MS-DOS manual has some charts for some character sets, and I don't
>fully understand how they are used.  For example, in the Norwegian
>character set, values 0 through 127 are regular ASCII and values 128
>through 255 give the alternate characters, e.g. 132 is lower case 'a'
>with umlaut.  Does this mean that when the user hits his key marked
>with that letter than the application thinks he hit an ASCII 132?

If you set the appropriate code page (which is almost always 850), and load
the appropriate keyboard driver with KEYB.COM, you will be able to type the
foreign characters and see them onscreen.  Yes, when you hit a-umlaut, your
app sees a 132, and so on.  Note that foreign character mappings are not
guaranteed to work perfectly with Ctrl and Alt -- it is nontrivial to allow
your app to recognize Ctrl+a-umlaut or Alt+a-umlaut, for example.


>What
>about scan codes for these keyboards?  Are charts available?  Or are
>the keyboards the same with different key caps?

Scan codes are raw, untranslated numbers indicating which key was struck.
The scan codes for foreign keyboards are not any different than the scan
codes for US keyboards: if you hit an a-umlaut on a German keyboard, you
get scan code 0x28, which is the same scan code for the ' key on a US
keyboard -- and the ' and a-umlaut keys are in the same places on the
keyboards.  If you plug in a US keyboard, load KEYB GR, and hit the ' key,
you will also get an a-umlaut.  If you plug in a German keyboard, load KEYB
US (or just boot raw DOS), and hit the a-umlaut key, you will get a '.


The KEYB.COM utility will allow you to re-map your keyboard to any
country's keys.  However, some foreign keyboards have an extra key not
available on US keyboards.  For example, on the German 102-key enhanced
keyboard there is an extra key between the left shift key and the y key
(the German keyboard has y where the US keyboard has z).  The US 101-key
enhanced keyboard has no counterpart for this key, which by the way has
scan code 0x56.  I plan someday to write a TSR that would re-map some other
key (Print Scrn, maybe) to this key so I can use my 101-key keyboard for
testing 102-key drivers.


Disclaimer:  These facts are accurate to the best of my knowledge.  I
checked them before posting.  Any errors are my fault.  Have a nice day.
-- 
Steve "I don't speak for Microsoft" Hastings    ===^=== :::::
uunet!microsoft!steveha  steveha@microsoft.uucp    ` \\==|