[comp.std.internat] US PC programmers still live in a 7-bit world!

karl@haddock.ISC.COM (Karl Heuer) (06/30/88)
This discussion belongs in comp.std.internat; I'm moving it there.

In article <4635@killer.UUCP> wnp@killer.UUCP (Wolf Paul) writes:
>In standard ASCII, uppercase alphabetics are codes 65-90 (Hex 41-5A),
>and lowercase alphas are codes 97-122 (Hex 61-7A). Thus, by adding 32
>(Hex 20) to an uppercase character, I can convert it to lower case, and
>by subtracting the same amount from a lower case character, i can convert
>it to an uppercase character.
>
>If IBM's character set were really "extended ASCII", this would work for
>the non-English, 8-bit characters, as well. It doesn't ...

This would probably be a good idea, simply on the grounds that (if there's no
other reason to prefer one ordering over another) it would make certain
operations faster.  But in any case, portable programs won't take advantage of
such coincidences, except through portable interfaces such as toupper().

What about characters like German eszet, which have no uppercase equivalent?
This has the property islower(c), but toupper(c) != (c-0x20) no matter how you
arrange the character set.  (In ANSI C, toupper(c) returns the argument
unchanged if it can't be uppercasified.)

>[neitzel@infbs.UUCP (Martin Neitzel) writes:]
>>[On most European printers and terminals] The ascii characters like
>>[]{}\~ were considered as "not so useful for Europeans" and their codes
>>were interpreted as national characters.  That was a HACK, not a
>>solution!  Yes, usage of those ascii charcaters and national characters
>>was mutually exclusive.  Not "both of them" in one document, listen?
>
>I agree with that assessment.

So do I.  But it does have the advantage that (a) it allows the +0x20 hack to
work, and (b) it yields a contiguous alphabet, so MIN_LETR <= c <= MAX_LETR
works (which is probably more often assumed true than the +0x20 idiom).

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint