vander@nssdcb.gsfc.nasa.gov (John Vanderpool) (07/25/90)
its amazing that _toupper and _tolower "misbehave" on the SUN's it seems like they do the masking without doing the checking from VAXC v3.0 ctype.h #define _toupper(c) ((c) >= 'a' && (c) <= 'z' ? (c) & 0x5F:(c)) #define _tolower(c) ((c) >= 'A' && (c) <= 'Z' ? (c) | 0x20:(c)) work good-to-go -- John R. Vanderpool vander@nssdca.gsfc.nasa.gov NASA / Goddard Space Flight Center (634) Greenbelt, MD 20771
bruce@seismo.gps.caltech.edu (Bruce Worden) (07/25/90)
Not to drag this out too much more, but: In article <2891@dftsrv.gsfc.nasa.gov> vander@nssdcb.gsfc.nasa.gov writes: >its amazing that _toupper and _tolower "misbehave" on the SUN's >it seems like they do the masking without doing the checking >from VAXC v3.0 ctype.h >#define _toupper(c) ((c) >= 'a' && (c) <= 'z' ? (c) & 0x5F:(c)) >#define _tolower(c) ((c) >= 'A' && (c) <= 'Z' ? (c) | 0x20:(c)) >work good-to-go ^^^^^^^^^^^^^^^ Well, only if you don't mind evaluating (c) three times. If (c) has side effects, as with any macro, you may have problems. To wit, ... a = _tolower(getchar()); ... would produce a disaster. That is why the versions above are preceded by the underscore, so that they will not be accidently used in place of the more robust toupper() and tolower() functions that you undoubtably have on your system. Once again: under SunOS 4.1 tolower() works as per the standard for either the ucb or sys V compiler, the sys V compiler also works "correctly" under 4.0.3 (and probably before). (Interestingly enough under 4.1 the in the Sys V and the ucb ctype.h _tolower() and _toupper() convert *without* checking, just the opposite of the example given above. These macros should probably be avoided, unless maximum performance is desired (and the programmer is sure of what he is doing.)) Sorry about all of the Sun specific stuff, folks. Bruce Disclaimer: I do not speak for Sun Microsystems nor do I even necessarily like them all that much.
cuuee@warwick.ac.uk (Sean Legassick) (07/26/90)
In article <2891@dftsrv.gsfc.nasa.gov> John R. Vanderpool writes: >its amazing that _toupper and _tolower "misbehave" on the SUN's >it seems like they do the masking without doing the checking > >from VAXC v3.0 ctype.h > >#define _toupper(c) ((c) >= 'a' && (c) <= 'z' ? (c) & 0x5F:(c)) >#define _tolower(c) ((c) >= 'A' && (c) <= 'Z' ? (c) | 0x20:(c)) > I'm not sure what the ANSI position on these macros is (are they mentioned at all?) but my Turbo C v1.5 (claiming ANSI compliance :-) ) gives this definition of _toupper : "is a macro that does the same conversion as toupper except that it should be used only when [the arg] is known to be lowercase" and similarly for _tolower. This would seem to imply that in fact it is VAXC v3.0 which has the mistake in ctype.h. Does anyone know what ANSI has to say about these conversion routines? It would seem that using them on any other character except for capitals with _tolower and lowercase with _toupper is pretty non-portable code writing. Comments? --------------------------------------------------------------------------- Sean Legassick, cuuee@uk.ac.warwick.cu "Man, I'm so hip I find it Computing Services (the walking difficult to see over University of Warwick C obfuscator!) my pelvis" (D Adams)
karl@haddock.ima.isc.com (Karl Heuer) (07/27/90)
In article <1990Jul26.100721.14628@warwick.ac.uk> cuuee@warwick.ac.uk (Sean Legassick) writes: >I'm not sure what the ANSI position on these macros is (are they mentioned at >all?) No. Since the Standard allows for non-English alphabets, for which it's not necessarily true that toupper() does a conversion iff islower() is true%, the implementation has to do just as much work for _toupper() as for toupper(). POSIX doesn't have them either, but X/Open does. >[quote from man page] would seem to imply that in fact it is VAXC v3.0 which >has the mistake in ctype.h. Historically, implementations have disagreed on the definitions of toupper() and _toupper(). Unless a compiler claims ANSI conformance, it isn't a bug. Karl W. Z. Heuer (karl@kelp.ima.isc.com or ima!kelp!karl), The Walking Lint ________ % E.g. the German sharp s (0xdf in ISO Latin-1), which has no uppercase form.
arnold@audiofax.com (Arnold Robbins) (07/27/90)
In article <1990Jul26.100721.14628@warwick.ac.uk> cuuee@warwick.ac.uk (Sean Legassick) writes: > Does anyone know what ANSI has to say about these conversion >routines? It would seem that using them on any other character except >for capitals with _tolower and lowercase with _toupper is pretty >non-portable code writing. Comments? The standard says that tolower() and toupper() return the corresponding lower- or upper-case letter if their argument is an upper- or lower-case letter respectively. Otherwise the argument is returned unchanged. _toupper() and _tolower() are not specified in the standard. To set things straight history wise: V7 - tolower() and toupper() blindly converted the case on their arguments. Handing a nonuppercase letter to tolower() or a nonlowercase letter to toupper() could produce suprises. BSD - Inherited the above behavior from V7. (Expect this to be fixed [probably] in 4.4 BSD, which is aiming at ANSI and POSIX compliance.) System III - Made toupper() and tolower() into functions that behave as the ANSI spec says; return the translated letter or the original argument if there is no corresponding upper/lower case letter. The old behavior was still available in macros named _tolower() and _toupper() which blindly converted. Note that tolower() and toupper() became real functions, with the attendant performance loss. System V Release 1-? - inherited the above behavior from System III. System V Release 3.2 - on my 386 V.3.2 box, _tolower() and _toupper() are macros that behave like toupper() and tolower(). It looks like someone finally got smart. I don't know when this first appeared in System V. I guess tolower() and toupper() remain real functions in V.3.2 in case anyone takes their address; I can't see any other reason to not have them be macros identical to their _to* counterparts. -- Arnold Robbins AudioFAX, Inc. | Laundry increases 2000 Powers Ferry Road, #220 / Marietta, GA. 30067 | exponentially in the INTERNET: arnold@audiofax.com Phone: +1 404 933 7600 | number of children. UUCP: emory!audfax!arnold Fax: +1 404 933 7606 | -- Miriam Robbins
steve@taumet.com (Stephen Clamage) (07/27/90)
ANSI defines toupper (tolower) such that it returns a lowercase (uppercase) version of an uppercase (lowercase) argument, and returns all other characters as-is. There is no definition of _toupper or _tolower in ANSI C. The implementation of topper (tolower) must check its parameter to see what it is before converting. Sometimes the programmer knows that such a check is not necessary. Many C implementations provide the _toupper and _tolower macros, which are faster, to use in such cases. Although such macros are not guaranteed to exist on all systems, they are usually easy enough to write if they are not supplied. -- Steve Clamage, TauMetric Corp, steve@taumet.com
meissner@osf.org (Michael Meissner) (07/28/90)
In article <246@audfax.audiofax.com> arnold@audiofax.com (Arnold Robbins) writes: | I guess tolower() and toupper() remain real functions in V.3.2 in case | anyone takes their address; I can't see any other reason to not have them | be macros identical to their _to* counterparts. No, I think it's more that the normal way to implement a real toupper or tolower as macros evalulates the argument 2 times (one for the test, and once for either side of the ?:). I seem to remember coming on some real live System V code that breaks if the argument is evaluated more than once. Of course with internationalization these days, the way to implement tolower/toupper is through a 257 element array. Using the array also only evaluates the argument once if implemented as a macro. -- Michael Meissner email: meissner@osf.org phone: 617-621-8861 Open Software Foundation, 11 Cambridge Center, Cambridge, MA Do apple growers tell their kids money doesn't grow on bushes?
bruce@seismo.gps.caltech.edu (Bruce Worden) (07/29/90)
In article <246@audfax.audiofax.com> arnold@audiofax.com (Arnold Robbins) writes: >In article <1990Jul26.100721.14628@warwick.ac.uk> cuuee@warwick.ac.uk (Sean Legassick) writes: >> Does anyone know what ANSI has to say about these conversion >>routines? It would seem that using them on any other character except >>for capitals with _tolower and lowercase with _toupper is pretty >>non-portable code writing. Comments? Nice explanation of the history of these functions deleted... >I guess tolower() and toupper() remain real functions in V.3.2 in case >anyone takes their address; I can't see any other reason to not have them >be macros identical to their _to* counterparts. I don't believe that this is the reason for implementing to*() as functions. The most important reason is so that these functions can work on non-US-ascii character sets (i.e. they will continue to function correctly after a call to setlocale() which changes the LC_CTYPE locale.) Another important reason is to avoid multiple evaluations of the argument as has been discussed elsewhere. Bruce
arnold@audiofax.com (Arnold Robbins) (07/30/90)
>In article <246@audfax.audiofax.com> arnold@audiofax.com (Arnold Robbins) writes: >>I guess tolower() and toupper() remain real functions in V.3.2 in case >>anyone takes their address; I can't see any other reason to not have them >>be macros identical to their _to* counterparts. In article <1990Jul28.193255.16540@laguna.ccsf.caltech.edu> bruce@seismo.gps.caltech.edu (Bruce Worden) writes: >I don't believe that this is the reason for implementing to*() as functions. >The most important reason is so that these functions can work on >non-US-ascii character sets (i.e. they will continue to function correctly >after a call to setlocale() which changes the LC_CTYPE locale.) >Another important reason is to avoid multiple evaluations of the argument >as has been discussed elsewhere. On the surface this makes sense, but it's still possible to write a macro that will work when setlocale changes the locale and only evaluates its argument once. Like so: In ctype.h: extern char *_casemap; #define tolower(c) (_casemap[c]) #define toupper(c) (_casemap[c]) In setlocale.c: static char casemap_french[256] = { .... }; static char casemap_spanish[256] = { .... }; static char casemap_c_locale[256] = { .... }; .... char *_casemap = casemap_c_locale; setlocale(int locale) /* or whatever arg it takes, i don't know */ { if (locale == france) _casemap = casemap_french; else if (locale == spain) _casemap = casemap_spanish; else ..... } Simple enough, no? (Yes, I know setlocale has to do lots of other stuff. This is an example for the sake of discussion, ok?) -- Arnold Robbins AudioFAX, Inc. | Laundry increases 2000 Powers Ferry Road, #220 / Marietta, GA. 30067 | exponentially in the INTERNET: arnold@audiofax.com Phone: +1 404 933 7600 | number of children. UUCP: emory!audfax!arnold Fax: +1 404 933 7606 | -- Miriam Robbins