[comp.lang.pascal] Converting to UPPER case

R1TMARG%AKRONVM.BITNET@cornellc.cit.cornell.edu (Tim Margush) (11/18/89)

I know that the question regarding conversion of strings to upper
case was asked from the turbo pascal perspective.  Here at the Univ.
of Akron, our introductory pascal class uses an IBM mainframe.  In pascal,
this means that the collating sequence for the characters is a bit different
from that on most other machines (EBCDIC vs ASCII).  The conversion routines
posted both relied upon the contiguity of the codes for the characters
a..z and A..Z.  On some EBCDIC systems, there are valid characters within this
range that should not be converted in an upcase/lowcase operation.

This is something to consider for those writing programs that might be used
in both environments.  After all, isn't Pascal code completely portable?
---------------------------------------------------------------------
Tim Margush                                    R1TMARG@AKRONVM.BITNET
Department of Mathematical Sciences         R1TMARG@VM1.CC.UAKRON.EDU
University Of Akron                        R1TMARG@AKRONVM.UAKRON.EDU
Akron, OH 44325                                        (216) 375-7109

balcer@jaguar (Marc J Balcer) (11/18/89)

R1TMARG%AKRONVM.BITNET@cornellc.cit.cornell.edu (Tim Margush) writes:
>...
>This is something to consider for those writing programs that might be used
>in both environments.  After all, isn't Pascal code completely portable?

Not only is portability important, but why memorize the ASCII (or EBCDIC) 
tables?  Here's a conversion that's rather character-set independent:

    function uppercase 
	(ch:  char) : char;
    {	
	Returns the uppercase equivalent of the given character.
	(If ch is already uppercase or is not a letter, it returns
	the value of ch unchanged.
    }
    begin
	if ch in ['a','b','c','d','e','f','g','h','i','j','k','l','m',
		  'n','o','p','q','r','s','t','u','v','w','x','y','z'] then
	    uppercase := chr (ord(ch) + ord('A') - ord('a'))
	else
	    uppercase := ch
    end;

The only assumption that this function makes is that the distance between
every capital letter and its lowercase equivalent must be the same. 
In other words, 
    (ord('a')-ord('A')) = (ord('b')-ord('B')) = (ord('c')-ord('C')) = ...
I don't know of any character set (that has both capitals and lowercase)
in which this is not true.

The ugly set expression is that way because EBCDIC has "holes" in its
alphabetic range:  there are non-alphabetic characters in between
some of the alphabetic characters.  (If you knew exactly where they are
you could probably shorten the expression.)
---------------------------------------------------------------------------
Marc J. Balcer	[balcer@cadillac.siemens.com]
Siemens Research Center, 755 College Road East, Princeton, NJ 08540
(609) 734-6531