[comp.lang.c] isascii

msb@sq.sq.com (Mark Brader) (02/12/90)

> >	isascii(*s) && isdigit(*s)
> 
> According to _Standard C_ ... there is no "isascii".  And "isdigit" etc.
> take an int in the set (EOF, 0..UCHAR_MAX) ...

> So, to write ANSI conformant C you must always say something like
> 	isdigit((unsigned char) *s)

If the code has to run on ANSI and non-ANSI C's, I'd prefer:

	#include <stdio.h>
	#include <ctype.h>

	#ifndef isascii		/* oh, must be ANSI C */
	#define isascii(x) (((x) >= 0 && (x) < UCHAR_MAX) || (x) == EOF))
	#endif

and then
	isascii(*s) && isdigit(*s)

The X3J11 people did not put isascii() in ANSI C because of the "ascii"
part of the name.  Correctly, they did not want to make any part of the
C Standard ASCII-dependent.  I suggested that isascii() be guaranteed
merely to have semantics similar to the above #define and the name kept
as a historical artifact, but they didn't buy it.

To keep this article short, I won't discuss making it work when the
argument has side-effects (as in isascii (*p++)).

-- 
Mark Brader		    At any rate, C++ != C.  Actually, the value of the
SoftQuad Inc., Toronto	    expression "C++ != C" is implementation-defined.
utzoo!sq!msb, msb@sq.com				-- Peter da Silva

This article is in the public domain.

gwc@root.co.uk (Geoff Clare) (02/14/90)

In article <1990Feb12.043324.5259@sq.sq.com> msb@sq.com (Mark Brader) writes:
>If the code has to run on ANSI and non-ANSI C's, I'd prefer:
>
>	#include <stdio.h>
>	#include <ctype.h>
>
>	#ifndef isascii		/* oh, must be ANSI C */
>	#define isascii(x) (((x) >= 0 && (x) < UCHAR_MAX) || (x) == EOF))
>	#endif
>
>and then
>	isascii(*s) && isdigit(*s)

Sorry, that won't work on the many systems which have an ANSI-type
isdigit() AND a normal isascii().  This includes all X/Open Portability
Guide 3 compliant systems.

Also the assumption that isascii being undefined implies ANSI C is bogus.
A definition of isascii() could have been enabled by a feature test macro.

I believe the only way to cope with all variants of ctype.h macros, is
to have a user-supplied configuration parameter.  E.g.

#ifdef OLD_STYLE_CTYPE
#define ISDIGIT(x)	(isascii(x) && isdigit(x))
#else
#define ISDIGIT(x)	isdigit(x)
#endif

-- 
Geoff Clare, UniSoft Limited, Saunderson House, Hayne Street, London EC1A 9HH
gwc@root.co.uk  (Dumb mailers: ...!uunet!root.co.uk!gwc)  Tel: +44-1-315-6600
                                         (from 6th May 1990): +44-71-315-6600

evans@ditsyda.oz (Bruce Evans) (02/15/90)

In article <1990Feb12.043324.5259@sq.sq.com> msb@sq.com (Mark Brader) writes:
*If the code has to run on ANSI and non-ANSI C's, I'd prefer:
*
*	#include <stdio.h>
*	#include <ctype.h>
*
*	#ifndef isascii		/* oh, must be ANSI C */
*	#define isascii(x) (((x) >= 0 && (x) < UCHAR_MAX) || (x) == EOF))
*	#endif
*
*and then
*	isascii(*s) && isdigit(*s)

Why doesn't ANSI C guarantee isdigit() (etc.) on *all* characters? The usual
implementation would be to move the base of the ctype array from -1
(EOF) back to -128 (SCHAR_MIN).

Then you can define isascii(x) to be 1 in the above, and not have to worry
about side affects. You still have to watch out for isdigit() on non-chars.
-- 
Bruce Evans		evans@ditsyda.oz.au

karl@haddock.ima.isc.com (Karl Heuer) (02/16/90)

In article <2448@ditsyda.oz> evans@ditsyda.oz (Bruce Evans) writes:
>Why doesn't ANSI C guarantee isdigit() (etc.) on *all* characters?

(The above seems to mean, "both signed and unsigned characters".)

Assume a character set in which (char)-1 is printable, e.g. ISO Latin 1.  Your
proposal would require that isprint((int)(signed char)(-1)) test as true.  But
isprint(EOF) is required to test false.  Thus, this would require that EOF be
defined as a value other than -1.  This is permitted by the Standard (I'm not
entirely sure why), but it would be A Bad Thing to create conditions that
*require* it.

Karl W. Z. Heuer (karl@ima.ima.isc.com or harvard!ima!karl), The Walking Lint

rbutterworth@watmath.waterloo.edu (Ray Butterworth) (02/23/90)

In article <2448@ditsyda.oz> evans@ditsyda.oz (Bruce Evans) writes:
>Why doesn't ANSI C guarantee isdigit() (etc.) on *all* characters?

Then the missing isascii() wouldn't even be needed.
This was proposed to the Committee and rejected on the (incorrect)
grounds that it couldn't be implemented without the macro's having
to evaluate its arguments more than once.

jeffa@hpmwtd.HP.COM (Jeff Aguilera) (02/24/90)

> Then the missing isascii() wouldn't even be needed.
> This was proposed to the Committee and rejected on the (incorrect)
> grounds that it couldn't be implemented without the macro's having
> to evaluate its arguments more than once.

Introduce the inline keyword, and then inline all the ctype functions.
Simple solution, superior in all respects.  (Gawd, ANSI C is brain dead.)
-----
jeffa

martin@mwtech.UUCP (Martin Weitzel) (02/24/90)

In article <34540@watmath.waterloo.edu> rbutterworth@watmath.waterloo.edu (Ray Butterworth) writes:
>In article <2448@ditsyda.oz> evans@ditsyda.oz (Bruce Evans) writes:
>>Why doesn't ANSI C guarantee isdigit() (etc.) on *all* characters?
>
>Then the missing isascii() wouldn't even be needed.
>This was proposed to the Committee and rejected on the (incorrect)
>grounds that it couldn't be implemented without the macro's having
>to evaluate its arguments more than once.

As much as *I* understand ANSI-C, all characters of the "machine
character set" must have positive values. So IMHO problems with
isdigit() and the other <ctype.h>-stuff can only occur,

- if the compiler claims (only) to support ASCII,
- but you in fact use it for non-ASCII (eg ISO 8859).

This is not the problem of the compiler writers, because you
clearly could never assume, that isdigit() operates on EBCDIC
*and* ASCII at the same time. The pitty is, that the lower half
of ISO 8859 (or IBM extended ASCII as found on the 'typical' PC)
is 1:1 mapped into the international ASCII variant. So it becomes
not obvious, that an implementation written for ASCII is abused with
8-Bit char-s.

On the other side, if a compiler claims to *support* ISO 8859 it
has no other choice than to implement all plain char-s as unsigned
char. So, the problem should go away. Problems with istype()
seems to stem from abuse of an implementation on character sets,
it was not designed for!

(Nevertheless I *know*, that it is sometimes necessary to "abuse"
an implementation in this way, at least in europe with our umlaut-s.
If I abuse something, I should not complain it's broken. But the
warning in the original posting was valid, of course.)
-- 
Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83

karl@haddock.ima.isc.com (Karl Heuer) (02/26/90)

In article <680020@hpmwjaa.HP.COM> jeffa@hpmwtd.HP.COM (Jeff Aguilera) writes:
>[Ray Butterworth wrote:]
>> [If isascii() were guaranteed for *all* characters,]
>> Then the missing isascii() wouldn't even be needed.
>> This was proposed to the Committee and rejected on the (incorrect)
>> grounds that it [would make the macro unsafe].
>
>Introduce the inline keyword, and then inline all the ctype functions.
>Simple solution, superior in all respects.

Unnecessary, since (as Ray noted) the unsafe-macro argument was *incorrect*:
the macro version would still evaluate the argument exactly once.

Insufficient, since (as I noted earlier) the real problem is with the
collision between signed chars and EOF.  This is a problem with the
specification itself, regardless of whether or not it's implemented as a
macro.

>(Gawd, ANSI C is brain dead.)

I suppose it is, but not because of anything you've said here.  I blame it on
heredity; pre-ANSI C was worse.

Karl W. Z. Heuer (karl@ima.ima.isc.com or harvard!ima!karl), The Walking Lint

henry@utzoo.uucp (Henry Spencer) (03/01/90)

In article <668@mwtech.UUCP> martin@mwtech.UUCP (Martin Weitzel) writes:
>As much as *I* understand ANSI-C, all characters of the "machine
>character set" must have positive values. So IMHO problems with
>isdigit() and the other <ctype.h>-stuff can only occur,
>
>- if the compiler claims (only) to support ASCII,
>- but you in fact use it for non-ASCII (eg ISO 8859).

Sorry, not so.  The ANSI C restriction is weaker than that found in
earlier C documentation:  it says that the characters in the "source
character set" -- roughly speaking, the characters used to write C --
must be positive.  There is no promise made about other characters in
your machine's character set.

Various things in the standard encourage implementors to make char
unsigned, but it is not actually compulsory even if you have an 8-bit
character set.
-- 
"The N in NFS stands for Not, |     Henry Spencer at U of Toronto Zoology
or Need, or perhaps Nightmare"| uunet!attcan!utzoo!henry henry@zoo.toronto.edu