[comp.lang.c] Character handling functions -- Jan 88, dpANS

david@dhw68k.cts.com (David H. Wolfskill) (03/26/88)

In reading a copy of the 11 January, 1988 dpANS C Standard
(X3J11/88-001), I ran across something with respect to the character
handling routines in the library that I suspect that I do not understand
adequately.

I realize that an attempt is made (in the draft standard) to accomodate
alphabets other than the English one, and that the use of such an
alphabet is not the default (but is specified by selecting a non-default
"locale"; the default locale is the "C" locale).

In section 4.3.1.2, the description of the "isalpha" function reads:

	The isalpha function tests for any character for which isupper
	or islower is true, or any of an implementation-defined set of
	characters for which none of iscntrl, isdigit, ispunct, or
	isspace is true.  In the "C" locale, isalpha returns true only
	for the characters for which isupper or islower is true.

In section 4.3.1.6, the description of the "islower" function reads:

	The islower function tests for any lower-case letter or any of
	an implementation-defined set of characters for which none of
	iscntrl, isdigit, ispunct, or isspace is true.  In the "C"
	locale, islower returns true only for the characters defined as
	lower-case letters (as defined in [section]2.2.1).

In section 4.3.1.10, the description of the "isupper" function reads:

	The isupper function tests for any upper-case letter or any of
	an implementation-defined set of characters for which none of
	iscntrl, isdigit, ispunct, or isspace is true.  In the "C"
	locale, isupper returns true only for the characters defined as
	upper-case letters (as defined in [section]2.2.1).

For the "C" locale, I see no problem whatsoever.  Since this is probably
the only locale I am likely to use, the issue I am bringing up does not
directly affect me; nevertheless, I would like to determine whether or
not my present understanding is shared by others.

I perceive 2 concerns:

1)	It would seem to be possible for a character -- interpreted in a
	locale other than the "C" locale -- to cause isalpha to return
	true, yet cause both isupper and islower to fail to return true.

	Is this both expected and reasonable?

2)	Similarly, it would seem to be possible for a character to be
	able to cause isalpha to fail to return true, and yet cause
	either (or both!) of isupper and islower to return true.

	Likewise, is this both expected and reasonable?

Here is a (partial) list of approaches (assuming that the cited wording
needs to be fixed):

1)	Include "islower" in the "stop list" for "isupper", and vice
	versa.

2)	Specify that a character that causes isalpha to return true must
	cause precisely one of islower or isupper to return true.

3)	Specify that a character that causes either islower or isupper
	to return true must also cause isalpha to return true.

Another approach, of course, would be to explicitly state (perhaps in
the Rationale) that the above-described behavior really is desired.
(Perhaps it's just my provincialism, but this really does seem a bit
unlikely to me.)

I look forward to seeing your comments to the above,
david
-- 
David H. Wolfskill
uucp: ...{trwrb,hplabs}!felix!dhw68k!david	InterNet: david@dhw68k.cts.com

gwyn@brl-smoke.ARPA (Doug Gwyn ) (03/28/88)

In article <6192@dhw68k.cts.com> david@dhw68k.cts.com (David H. Wolfskill) writes:
>1)	It would seem to be possible for a character -- interpreted in a
>	locale other than the "C" locale -- to cause isalpha to return
>	true, yet cause both isupper and islower to fail to return true.

Yes, this is right.  Not all character sets have a meaningful concept
of "case".  (Consider Chinese.)  Instead of arbitrarily picking either
lower or upper (or both), the implementor can simply tell the truth.

>2)	Similarly, it would seem to be possible for a character to be
>	able to cause isalpha to fail to return true, and yet cause
>	either (or both!) of isupper and islower to return true.

No, read the specification of isalpha again.  islower => isalpha
and  isupper => isalpha  but not the converse.

henry@utzoo.uucp (Henry Spencer) (03/29/88)

> I look forward to seeing your comments to the above...

Please remember, folks, that while discussion on Usenet may be useful in
informing yourself and others, it is *not* a substitute for formal public
comments to X3J11, which must be physically mailed on pieces of paper.
Some X3J11 members do read comp.lang.c, but they have no obligation to do
anything about your net posting, whereas they are obliged to look at and
reply to your formal comments.  (Remember, deadline is April 12.)
-- 
"Noalias must go.  This is           |  Henry Spencer @ U of Toronto Zoology
non-negotiable."  --DMR              | {allegra,ihnp4,decvax,utai}!utzoo!henry

ram%shukra@Sun.COM (Renu Raman, Taco Bell Microsystems) (03/29/88)

In article <1988Mar29.004847.2933@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
>> I look forward to seeing your comments to the above...
>
>Please remember, folks, that while discussion on Usenet may be useful in
>informing yourself and others, it is *not* a substitute for formal public
>comments to X3J11, which must be physically mailed on pieces of paper.
>Some X3J11 members do read comp.lang.c, but they have no obligation to do
>anything about your net posting, whereas they are obliged to look at and
>reply to your formal comments.  (Remember, deadline is April 12.)
>-- 
>"Noalias must go.  This is           |  Henry Spencer @ U of Toronto Zoology
>non-negotiable."  --DMR              | {allegra,ihnp4,decvax,utai}!utzoo!henry

    So, who is in that committee. I think Doug is there. 
    Can somebody come up with a list and their associations?

    Renu Raman

gwyn@brl-smoke.ARPA (Doug Gwyn ) (03/29/88)

In article <47332@sun.uucp> ram@sun.UUCP (Renu Raman, Taco Bell Microsystems) writes:
>    So, who is in that committee. I think Doug is there. 
>    Can somebody come up with a list and their associations?

Don't send your comments to me nor to other committee members;
send them to X3 in accordance with the instructions included
with the public review draft (between pages 100 and 101, I am
told; don't ask me why!).

karl@haddock.ISC.COM (Karl Heuer) (03/30/88)

In article <7571@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
>In article <6192@dhw68k.cts.com> david@dhw68k.cts.com (David H. Wolfskill)
>>[voices his concerns about isalpha() != (islower() || isupper())]
>
>Yes, this is right.  Not all character sets have a meaningful concept
>of "case".  (Consider Chinese.)  Instead of arbitrarily picking either
>lower or upper (or both), the implementor can simply tell the truth.

I agree that it's good to allow for such neither-case letters, but both-case?
Is it really intentional to allow isupper() and islower() to be simultaneously
true?

On a related note, I can see where it might be useful for me to define a
locale which reflects the native alphabet only, specifically excluding the
C-locale alphabet.  (E.g. I might want isalpha('w') to be false.)  The dpANS
doesn't allow this.  Why not?

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

daw@houxs.UUCP (David Wolverton) (03/30/88)

In article <7582@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes:
> Don't send your comments to me nor to other committee members;
> send them to X3 in accordance with the instructions included
> with the public review draft (between pages 100 and 101, I am
> told; don't ask me why!).

For those of us who don't have the instructions between pages 100 and 101,
could you post them (or at least mail to me?).  I don't really want to spend
$75 just to get 1 sheet of paper.

Dave Wolverton

gwyn@brl-smoke.ARPA (Doug Gwyn ) (03/31/88)

In article <804@houxs.UUCP> daw@houxs.UUCP (David Wolverton) writes:
>For those of us who don't have the instructions between pages 100 and 101,
>could you post them (or at least mail to me?).  I don't really want to spend
>$75 just to get 1 sheet of paper.

I thought it was $65 and that it got you a copy of the draft and
rationale to which your comments would presumably apply.

I don't have the instructions since I get revised drafts through
X3J11 committee mailings and didn't see the need to purchase a
public review copy that differed only in having change bars removed
and instructions inserted in a mysterious place.  I would guess
that the instructions say to send comments to the X3 Secretariat
(CBEMA) to arrive by April 12, 1988.