[comp.lang.c] _tolower and _toupper macros

vander@nssdcb.gsfc.nasa.gov (John Vanderpool) (07/25/90)

its  amazing that _toupper and _tolower "misbehave" on the SUN's
it seems like they do the masking without doing the checking

from VAXC v3.0 ctype.h

#define _toupper(c)	((c) >= 'a' && (c) <= 'z' ? (c) & 0x5F:(c))
#define _tolower(c)	((c) >= 'A' && (c) <= 'Z' ? (c) | 0x20:(c))

work good-to-go

--
John R. Vanderpool				vander@nssdca.gsfc.nasa.gov
NASA / Goddard Space Flight Center (634)
Greenbelt, MD  20771

bruce@seismo.gps.caltech.edu (Bruce Worden) (07/25/90)

Not to drag this out too much more, but:

In article <2891@dftsrv.gsfc.nasa.gov> vander@nssdcb.gsfc.nasa.gov writes:
>its  amazing that _toupper and _tolower "misbehave" on the SUN's
>it seems like they do the masking without doing the checking

>from VAXC v3.0 ctype.h

>#define _toupper(c)	((c) >= 'a' && (c) <= 'z' ? (c) & 0x5F:(c))
>#define _tolower(c)	((c) >= 'A' && (c) <= 'Z' ? (c) | 0x20:(c))

>work good-to-go
 ^^^^^^^^^^^^^^^
Well, only if you don't mind evaluating (c) three times.  If (c) has
side effects, as with any macro, you may have problems.  To wit,

...
a = _tolower(getchar());
...

would produce a disaster.  That is why the versions above are 
preceded by the underscore, so that they will not be accidently used in 
place of the more robust toupper() and tolower() functions that you 
undoubtably have on your system.  

Once again: under SunOS 4.1 tolower() works as per the standard for either 
the ucb or sys V compiler, the sys V compiler also works "correctly" 
under 4.0.3 (and probably before).  

(Interestingly enough under 4.1 the in the Sys V and the ucb ctype.h 
_tolower() and _toupper() convert *without* checking, just the opposite of 
the example given above.  These macros should probably be avoided, unless 
maximum performance is desired (and the programmer is sure of what he is 
doing.))

Sorry about all of the Sun specific stuff, folks.

					Bruce
Disclaimer: I do not speak for Sun Microsystems nor do I even necessarily
like them all that much.

cuuee@warwick.ac.uk (Sean Legassick) (07/26/90)

In article <2891@dftsrv.gsfc.nasa.gov> John R. Vanderpool writes:
>its  amazing that _toupper and _tolower "misbehave" on the SUN's
>it seems like they do the masking without doing the checking
>
>from VAXC v3.0 ctype.h
>
>#define _toupper(c)	((c) >= 'a' && (c) <= 'z' ? (c) & 0x5F:(c))
>#define _tolower(c)	((c) >= 'A' && (c) <= 'Z' ? (c) | 0x20:(c))
>

	I'm not sure what the ANSI position on these macros is (are
they mentioned at all?) but my Turbo C v1.5 (claiming ANSI compliance :-) )
gives this definition of _toupper : "is a macro that does the same conversion
as toupper except that it should be used only when [the arg] is known
to be lowercase" and similarly for _tolower. This would seem to imply that
in fact it is VAXC v3.0 which has the mistake in ctype.h.
	Does anyone know what ANSI has to say about these conversion
routines? It would seem that using them on any other character except
for capitals with _tolower and lowercase with _toupper is pretty
non-portable code writing. Comments?

---------------------------------------------------------------------------
Sean Legassick,       cuuee@uk.ac.warwick.cu	"Man, I'm so hip I find it
  Computing Services	  (the walking 	          difficult to see over
    University of Warwick   C obfuscator!)	    my pelvis" (D Adams)

karl@haddock.ima.isc.com (Karl Heuer) (07/27/90)

In article <1990Jul26.100721.14628@warwick.ac.uk> cuuee@warwick.ac.uk (Sean Legassick) writes:
>I'm not sure what the ANSI position on these macros is (are they mentioned at
>all?)

No.  Since the Standard allows for non-English alphabets, for which it's not
necessarily true that toupper() does a conversion iff islower() is true%, the
implementation has to do just as much work for _toupper() as for toupper().
POSIX doesn't have them either, but X/Open does.

>[quote from man page] would seem to imply that in fact it is VAXC v3.0 which
>has the mistake in ctype.h.

Historically, implementations have disagreed on the definitions of toupper()
and _toupper().  Unless a compiler claims ANSI conformance, it isn't a bug.

Karl W. Z. Heuer (karl@kelp.ima.isc.com or ima!kelp!karl), The Walking Lint
________
% E.g. the German sharp s (0xdf in ISO Latin-1), which has no uppercase form.

arnold@audiofax.com (Arnold Robbins) (07/27/90)

In article <1990Jul26.100721.14628@warwick.ac.uk> cuuee@warwick.ac.uk (Sean Legassick) writes:
>	Does anyone know what ANSI has to say about these conversion
>routines? It would seem that using them on any other character except
>for capitals with _tolower and lowercase with _toupper is pretty
>non-portable code writing. Comments?

The standard says that tolower() and toupper() return the corresponding
lower- or upper-case letter if their argument is an upper- or lower-case
letter respectively.  Otherwise the argument is returned unchanged.
_toupper() and _tolower() are not specified in the standard.

To set things straight history wise:

	V7 -	tolower() and toupper() blindly converted the case on
		their arguments.  Handing a nonuppercase letter to tolower()
		or a nonlowercase letter to toupper() could produce suprises.

	BSD -	Inherited the above behavior from V7.  (Expect this to be
		fixed [probably] in 4.4 BSD, which is aiming at ANSI and
		POSIX compliance.)

	System III - Made toupper() and tolower() into functions that behave
		as the ANSI spec says; return the translated letter or the
		original argument if there is no corresponding upper/lower
		case letter.  The old behavior was still available in
		macros named _tolower() and _toupper() which blindly
		converted.  Note that tolower() and toupper() became real
		functions, with the attendant performance loss.

	System V Release 1-? - inherited the above behavior from System III.

	System V Release 3.2 - on my 386 V.3.2 box, _tolower() and _toupper()
		are macros that behave like toupper() and tolower(). It
		looks like someone finally got smart.  I don't know when
		this first appeared in System V.

I guess tolower() and toupper() remain real functions in V.3.2 in case
anyone takes their address; I can't see any other reason to not have them
be macros identical to their _to* counterparts.
-- 
Arnold Robbins				AudioFAX, Inc. | Laundry increases
2000 Powers Ferry Road, #220 / Marietta, GA. 30067     | exponentially in the
INTERNET: arnold@audiofax.com	Phone: +1 404 933 7600 | number of children.
UUCP:	  emory!audfax!arnold	Fax:   +1 404 933 7606 |   -- Miriam Robbins

steve@taumet.com (Stephen Clamage) (07/27/90)

ANSI defines toupper (tolower) such that it returns a lowercase (uppercase)
version of an uppercase (lowercase) argument, and returns all other
characters as-is.  There is no definition of _toupper or _tolower in ANSI C.

The implementation of topper (tolower) must check its parameter to see
what it is before converting.  Sometimes the programmer knows that
such a check is not necessary.  Many C implementations provide the
_toupper and _tolower macros, which are faster, to use in such cases.
Although such macros are not guaranteed to exist on all systems, they
are usually easy enough to write if they are not supplied.
-- 

Steve Clamage, TauMetric Corp, steve@taumet.com

meissner@osf.org (Michael Meissner) (07/28/90)

In article <246@audfax.audiofax.com> arnold@audiofax.com (Arnold Robbins) writes:

| I guess tolower() and toupper() remain real functions in V.3.2 in case
| anyone takes their address; I can't see any other reason to not have them
| be macros identical to their _to* counterparts.

No, I think it's more that the normal way to implement a real toupper
or tolower as macros evalulates the argument 2 times (one for the
test, and once for either side of the ?:).  I seem to remember coming
on some real live System V code that breaks if the argument is
evaluated more than once.  Of course with internationalization these
days, the way to implement tolower/toupper is through a 257 element
array.  Using the array also only evaluates the argument once if
implemented as a macro.
--
Michael Meissner	email: meissner@osf.org		phone: 617-621-8861
Open Software Foundation, 11 Cambridge Center, Cambridge, MA

Do apple growers tell their kids money doesn't grow on bushes?

bruce@seismo.gps.caltech.edu (Bruce Worden) (07/29/90)

In article <246@audfax.audiofax.com> arnold@audiofax.com (Arnold Robbins) writes:
>In article <1990Jul26.100721.14628@warwick.ac.uk> cuuee@warwick.ac.uk (Sean Legassick) writes:
>>	Does anyone know what ANSI has to say about these conversion
>>routines? It would seem that using them on any other character except
>>for capitals with _tolower and lowercase with _toupper is pretty
>>non-portable code writing. Comments?

Nice explanation of the history of these functions deleted...

>I guess tolower() and toupper() remain real functions in V.3.2 in case
>anyone takes their address; I can't see any other reason to not have them
>be macros identical to their _to* counterparts.

I don't believe that this is the reason for implementing to*() as functions.
The most important reason is so that these functions can work on 
non-US-ascii character sets (i.e. they will continue to function correctly 
after a call to setlocale() which changes the LC_CTYPE locale.)  
Another important reason is to avoid multiple evaluations of the argument 
as has been discussed elsewhere.
						Bruce

arnold@audiofax.com (Arnold Robbins) (07/30/90)

>In article <246@audfax.audiofax.com> arnold@audiofax.com (Arnold Robbins) writes:
>>I guess tolower() and toupper() remain real functions in V.3.2 in case
>>anyone takes their address; I can't see any other reason to not have them
>>be macros identical to their _to* counterparts.

In article <1990Jul28.193255.16540@laguna.ccsf.caltech.edu> bruce@seismo.gps.caltech.edu (Bruce Worden) writes:
>I don't believe that this is the reason for implementing to*() as functions.
>The most important reason is so that these functions can work on 
>non-US-ascii character sets (i.e. they will continue to function correctly 
>after a call to setlocale() which changes the LC_CTYPE locale.)  
>Another important reason is to avoid multiple evaluations of the argument 
>as has been discussed elsewhere.

On the surface this makes sense, but it's still possible to write a macro
that will work when setlocale changes the locale and only evaluates its
argument once.  Like so:

In ctype.h:
	extern char *_casemap;
	#define tolower(c)	(_casemap[c])
	#define toupper(c)	(_casemap[c])

In setlocale.c:

	static char casemap_french[256] = { .... };
	static char casemap_spanish[256] = { .... };
	static char casemap_c_locale[256] = { .... };
	....
	char *_casemap = casemap_c_locale;

	setlocale(int locale)	/* or whatever arg it takes, i don't know */
	{
		if (locale == france)
			_casemap = casemap_french;
		else if (locale == spain)
			_casemap = casemap_spanish;
		else
			.....
	}

Simple enough, no?  (Yes, I know setlocale has to do lots of other
stuff.  This is an example for the sake of discussion, ok?)
-- 
Arnold Robbins				AudioFAX, Inc. | Laundry increases
2000 Powers Ferry Road, #220 / Marietta, GA. 30067     | exponentially in the
INTERNET: arnold@audiofax.com	Phone: +1 404 933 7600 | number of children.
UUCP:	  emory!audfax!arnold	Fax:   +1 404 933 7606 |   -- Miriam Robbins