[net.lang.c] Question about toupper and tolower

jr@fortune.UUCP (11/10/83)

I have a question about the macros "tolower" and "toupper" (which are usually
defined in /usr/include/ctype.h, although they usually don't appear in the
manual page (ctype(3)).  The UNIX versions of these (including Version 7,
System III, and System V) will screw-up the character if it's already the
correct case.  For instance, in the expression "toupper('A')", 'A' (0x41)
becomes '!' (0x21).  Now, I realize that it is possible to use:
	if (islower(c))
		c = toupper(c);
However, I'm not thrilled by doing that.  There are other versions of C which
handle this more gracefully (I'm thinking of Aztec C II and BDS C, both for the
CP/M operating system).  They return the character unchanged if it is already
in the correct case.

My question is this: other than portability reasons, is there any reason to
keep things this way?  Wouldn't it make more sense to have toupper and tolower
leave the characters alone if they're already right? Does anyone know of any
software that would break if the macros were changed?

				Thanks...
-- 
				John Rogers
				CompuServe: 70140,213
				Usenet: ...decvax!decwrl!amd70!fortune!jr

tsclark@ihnp4.UUCP (Tom Clark) (11/10/83)

John Rogers (fortune!jr) asked about tolower and toupper, and stated that in
System V (and version 7 and system III) tolower/toupper were defined as macros
in /usr/include/ctype.h.  Well, actually the macros defined in ctype.h are
called _tolower(c) and _toupper(c) and do indeed perform no checking.  The only
purpose of the ctype header file is to define character types (e.g. for running
C on a non-ASCII machine).  There are routines tolower(c) and toupper(c) in libc
which will do what you want (check case before conversion). The source is in
/usr/src/lib/libc/gen in tolower.c and toupper.c.  I can't speak for version 7
or system III, but suspect they do much the same thing.  You should also note
that the macros cannot be recoded to say:
#define tolower(c) (isupper(c) : _tolower(c) : c)
because this will result in c being evaluated more than once, which will
*definitely* break a lot of programs!
-- 
		Tom Clark, BTL IH, ihnp4!tsclark, (312) 979-2620

jhh@ihldt.UUCP (John Haller) (11/10/83)

In System V, toupper and tolower work as you desire.
This is all defined in CONV(3C).

guy@rlgvax.UUCP (Guy Harris) (11/10/83)

In V7, "toupper" and "tolower" are macros which don't work if the argument
isn't an alphabetic in the proper (improper?) case.  In the USG UNIX
releases (System III, System V, etc.) those macros have been renamed "_toupper"
and "_tolower", and "toupper" and "tolower" are routines which work correctly
for all characters passed to them (at least, for all 7-bit ASCII characters).

	Guy Harris
	{seismo,ihnp4,allegra}!rlgvax!guy

keesan@bbncca.ARPA (Morris Keesan) (11/10/83)

-------------------------------
The definitions of toupper and tolower on BBN UNIX systems were changed nearly
three years ago to leave the character in the desired case if it already is.
It doesn't appear to have broken anything in that amount of time.  Incidentally,
the efficient way to do this is NOT to build a definition that looks like
(isupper(c)?(c):toupper(c)), but simply to replace the addition and subtraction
operations with bitwise AND and OR operators to turn on or off the ASCII
lower-case bit.
						Morris M. Keesan
						decvax!bbncca!keesan

guy@rlgvax.UUCP (Guy Harris) (11/11/83)

Turning the 040 bit on and off will correctly map a mixed-case alphabetic
string to one case, but it will not work on all non-alphabetic characters.
If one is not guaranteed that the string does not contain those characters,
one would still have to do an isalpha() on the character before mapping.

	Guy Harris
	{seismo,ihnp4,allegra}!rlgvax!guy

naftoli@aecom.UUCP (Robert Berlinger) (11/11/83)

On system III and above, toupper and tolower have been changed to
routines which do allow the same case to be passed through untouched.

Robert Berlinger
Systems Support
Albert Einstein Coll. of Med.
{philabs,esquire,cucard}!aecom!naftoli

hal@cornell.UUCP (Hal Perkins) (11/12/83)

You can't just turn on or off the high-order bit on any character to
convert between upper and lower case.  You still need to test that the
character you are converting is really a letter, and not a digit or
punctuation mark that should not be changed by toupper and tolower.


Hal Perkins                         UUCP: {decvax|vax135|...}!cornell!hal
Cornell Computer Science            ARPA: hal@cornell  BITNET: hal@crnlcs

pdbain@wateng.UUCP (Peter Bain) (11/13/83)

While you can achieve toupper() and tolower() by turning bit 5 on or off
repectively, this will also translate @ to ', [ to {, and so on.
		-peter

spaf@gatech.UUCP (11/16/83)

I don't wish to seem like I'm advocating stupidity, but keep in mind
that not every machine represents its character set in 7 bits, nor
can one implement "tolower" and "toupper" just by xor'ing a bit --
not every machine uses ASCII.  

Portability and "standards" sometimes have interesting wrinkles to them.

-- 
Off the Wall of Gene Spafford
School of ICS, Georgia Tech, Atlanta GA 30332
CSNet:	Spaf @ GATech		ARPA:	Spaf.GATech @ CSNet-Relay
uucp:	...!{akgua,allegra,rlgvax,sb1,unmvax,ulysses,ut-sally}!gatech!spaf