[comp.lang.c] Programming and international chara

mcdaniel@uicsrd.csrd.uiuc.edu (11/03/88)

Written 7:27 pm Oct 27, 1988 by kjartan@rhi.hi.is in comp.emacs:
> Another way of doing this is using "is.." functions that are defined
> in <ctype.h>, an include file that comes with (almost) all C
> compilers.  Some of the above lines would look like this:
> 
> fileio.c:	  if (iscntrl( fn[tel++] ) )
> input.c:				if (iscntrl(buf[--cpos]) ) {
> input.c:				if (iscntrl(buf[--cpos])) {
> 
> This code is better (most of the is.. things are macros that mask the
> argument and return . . . either zero or positive), has more style to
> it and is easier to port to a diffrent character set.

A little while ago, there was a discussion in comp.lang.c about the
"is..."  functoids.  I call them "functoids" because they resemble
functions in use, but may be either functions OR macros.  One possible
macro implementation is:
	#define iscntrl(c)	( (c) >= 0 && (c) <= 037 )

(The first test is because implementations are permitted by dpANS to
have signed characters.)  In this case, if "c" has side effects, the
side effects will be performed twice.

As a minor point, consider this statement from the BSD 4.3 man page
for the "is..." functoids:
                . . . Isascii and toascii are defined on all
     integer values; the rest are defined only where isascii is
     true and on the single non-ASCII value EOF (see stdio(3S)).

All that C guarantees is that a "char" variable can hold all the
values in the host character set.  It may be larger, and thus able to
hold more.  Consider, for instance, a computer with 8-bit "char"s but
using 7-bit ASCII.  These functoids may therefore fail if the eighth
bit is set.

Therefore, safer versions of the three lines quoted above would be:
  fileio.c:	  tel++;  if (isascii(fn[tel])   && iscntrl(fn[tel]) )
  input.c:	  cpos--; if (isascii(buf[cpos]) && iscntrl(buf[cpos]) ) {
  input.c:	  cpos--; if (isascii(buf[cpos]) && iscntrl(buf[cpos]) ) {

(Of course, it might be "isjapanese()" instead, but you get the point.)

-- 
Tim, the Bizarre and Oddly-Dressed Enchanter
Center for Supercomputing Research and Development
at the University of Illinoid at Urbana-Champaign

Internet, BITNET:  mcdaniel@uicsrd.csrd.uiuc.edu
UUCP:    {uunet,convex,pur-ee}!uiucuxc!uicsrd!mcdaniel
ARPANET: mcdaniel%uicsrd@uxc.cso.uiuc.edu
CSNET:   mcdaniel%uicsrd@uiuc.csnet
DECnet?: GARCON::"mcdaniel@uicsrd.csrd.uiuc.edu"

ok@quintus.uucp (Richard A. O'Keefe) (11/08/88)

In article <44200016@uicsrd.csrd.uiuc.edu> mcdaniel@uicsrd.csrd.uiuc.edu writes:
>One possible
>macro implementation is:
>	#define iscntrl(c)	( (c) >= 0 && (c) <= 037 )
>
>(The first test is because implementations are permitted by dpANS to
>have signed characters.)  In this case, if "c" has side effects, the
>side effects will be performed twice.

A reasonably well-known hack, where L and U are constant integer expressions,
is	#define inrange(x,L,U) ((unsigned)((x)-(L)) <= (unsigned)((U)-(L)))
It has the virtue that x is evaluated only once.  In this case:
	*define iscntrl(c) ((unsigned)(c) < 32)
I say "*define" because the usual definition of iscntrl() for ASCII
includes DEL as one of the cntrl characters.

gwyn@smoke.BRL.MIL (Doug Gwyn ) (11/09/88)

In article <44200016@uicsrd.csrd.uiuc.edu> mcdaniel@uicsrd.csrd.uiuc.edu writes:
>	#define iscntrl(c)	( (c) >= 0 && (c) <= 037 )

This is an invalid implementation.  The is*() functions may be implemented
as macros only if they are "safe" macros (i.e. evaluate the argument only
once).  Also, for valid arguments (ints in the range 0..CHAR_MAX and EOF)
the only negative argument that must be handled is EOF.

>                . . . Isascii and toascii are defined on all
>     integer values; the rest are defined only where isascii is
>     true and on the single non-ASCII value EOF (see stdio(3S)).

Because ANSI C does not have an isascii() function, the is*() tests
are required to work right for all character values.

mcdaniel@uicsrd.csrd.uiuc.edu (11/10/88)

Written  8:30 pm  Nov  7, 1988 by ok@quintus.uucp in comp.lang.c:
> In article <44200016@uicsrd.csrd.uiuc.edu>
> mcdaniel@uicsrd.csrd.uiuc.edu writes: 
>> One possible macro implementation is:
>>	#define iscntrl(c)	( (c) >= 0 && (c) <= 037 )
>
> A reasonably well-known hack, where L and U are constant integer expressions,
> is	#define inrange(x,L,U) ((unsigned)((x)-(L)) <= (unsigned)((U)-(L)))
> It has the virtue that x is evaluated only once.

Migod, it's even dpANS portable, for ints x, L, and U, if no overflow.
Also: suppose L > U.  Let us define the range "L .. U" to be L up to
INT_MAX, then wrapping around to INT_MIN, and then up to U.  This
macro does the "wrapped" test properly: it tests for x being in L ..
INT_MAX or INT_MIN .. U.  In other words, if L > U, it tests for x NOT
in L .. U.  Neat hack!

> In this case:
>	*define iscntrl(c) ((unsigned)(c) < 32)
> I say "*define" because the usual definition of iscntrl() for ASCII
> includes DEL as one of the cntrl characters.

Of course.  The original poster (not "ok@quintus", the other one
quoted above) was a real bozo.  One of those jerks who thinks he's a
real know-it-all net.god.  He just spouted off the very first #define
that came to mind, without stopping to think whether he might be WRONG
(Kernighan forbid).  He should be ashamed of himself for spreading
such obviously broken code on a public net.

-- 
Tim, the Bizarre and Oddly-Dressed Enchanter
Center for Supercomputing Research and Development
at the University of Illinoid at Urbana-Champaign

Internet, BITNET:  mcdaniel@uicsrd.csrd.uiuc.edu
UUCP:    {uunet,convex,pur-ee}!uiucuxc!uicsrd!mcdaniel
ARPANET: mcdaniel%uicsrd@uxc.cso.uiuc.edu
CSNET:   mcdaniel%uicsrd@uiuc.csnet
DECnet?: GARCON::"mcdaniel@uicsrd.csrd.uiuc.edu"