jr@fortune.UUCP (11/10/83)
I have a question about the macros "tolower" and "toupper" (which are usually defined in /usr/include/ctype.h, although they usually don't appear in the manual page (ctype(3)). The UNIX versions of these (including Version 7, System III, and System V) will screw-up the character if it's already the correct case. For instance, in the expression "toupper('A')", 'A' (0x41) becomes '!' (0x21). Now, I realize that it is possible to use: if (islower(c)) c = toupper(c); However, I'm not thrilled by doing that. There are other versions of C which handle this more gracefully (I'm thinking of Aztec C II and BDS C, both for the CP/M operating system). They return the character unchanged if it is already in the correct case. My question is this: other than portability reasons, is there any reason to keep things this way? Wouldn't it make more sense to have toupper and tolower leave the characters alone if they're already right? Does anyone know of any software that would break if the macros were changed? Thanks... -- John Rogers CompuServe: 70140,213 Usenet: ...decvax!decwrl!amd70!fortune!jr
tsclark@ihnp4.UUCP (Tom Clark) (11/10/83)
John Rogers (fortune!jr) asked about tolower and toupper, and stated that in System V (and version 7 and system III) tolower/toupper were defined as macros in /usr/include/ctype.h. Well, actually the macros defined in ctype.h are called _tolower(c) and _toupper(c) and do indeed perform no checking. The only purpose of the ctype header file is to define character types (e.g. for running C on a non-ASCII machine). There are routines tolower(c) and toupper(c) in libc which will do what you want (check case before conversion). The source is in /usr/src/lib/libc/gen in tolower.c and toupper.c. I can't speak for version 7 or system III, but suspect they do much the same thing. You should also note that the macros cannot be recoded to say: #define tolower(c) (isupper(c) : _tolower(c) : c) because this will result in c being evaluated more than once, which will *definitely* break a lot of programs! -- Tom Clark, BTL IH, ihnp4!tsclark, (312) 979-2620
jhh@ihldt.UUCP (John Haller) (11/10/83)
In System V, toupper and tolower work as you desire. This is all defined in CONV(3C).
guy@rlgvax.UUCP (Guy Harris) (11/10/83)
In V7, "toupper" and "tolower" are macros which don't work if the argument isn't an alphabetic in the proper (improper?) case. In the USG UNIX releases (System III, System V, etc.) those macros have been renamed "_toupper" and "_tolower", and "toupper" and "tolower" are routines which work correctly for all characters passed to them (at least, for all 7-bit ASCII characters). Guy Harris {seismo,ihnp4,allegra}!rlgvax!guy
keesan@bbncca.ARPA (Morris Keesan) (11/10/83)
------------------------------- The definitions of toupper and tolower on BBN UNIX systems were changed nearly three years ago to leave the character in the desired case if it already is. It doesn't appear to have broken anything in that amount of time. Incidentally, the efficient way to do this is NOT to build a definition that looks like (isupper(c)?(c):toupper(c)), but simply to replace the addition and subtraction operations with bitwise AND and OR operators to turn on or off the ASCII lower-case bit. Morris M. Keesan decvax!bbncca!keesan
guy@rlgvax.UUCP (Guy Harris) (11/11/83)
Turning the 040 bit on and off will correctly map a mixed-case alphabetic string to one case, but it will not work on all non-alphabetic characters. If one is not guaranteed that the string does not contain those characters, one would still have to do an isalpha() on the character before mapping. Guy Harris {seismo,ihnp4,allegra}!rlgvax!guy
naftoli@aecom.UUCP (Robert Berlinger) (11/11/83)
On system III and above, toupper and tolower have been changed to routines which do allow the same case to be passed through untouched. Robert Berlinger Systems Support Albert Einstein Coll. of Med. {philabs,esquire,cucard}!aecom!naftoli
hal@cornell.UUCP (Hal Perkins) (11/12/83)
You can't just turn on or off the high-order bit on any character to convert between upper and lower case. You still need to test that the character you are converting is really a letter, and not a digit or punctuation mark that should not be changed by toupper and tolower. Hal Perkins UUCP: {decvax|vax135|...}!cornell!hal Cornell Computer Science ARPA: hal@cornell BITNET: hal@crnlcs
pdbain@wateng.UUCP (Peter Bain) (11/13/83)
While you can achieve toupper() and tolower() by turning bit 5 on or off repectively, this will also translate @ to ', [ to {, and so on. -peter
spaf@gatech.UUCP (11/16/83)
I don't wish to seem like I'm advocating stupidity, but keep in mind that not every machine represents its character set in 7 bits, nor can one implement "tolower" and "toupper" just by xor'ing a bit -- not every machine uses ASCII. Portability and "standards" sometimes have interesting wrinkles to them. -- Off the Wall of Gene Spafford School of ICS, Georgia Tech, Atlanta GA 30332 CSNet: Spaf @ GATech ARPA: Spaf.GATech @ CSNet-Relay uucp: ...!{akgua,allegra,rlgvax,sb1,unmvax,ulysses,ut-sally}!gatech!spaf