[net.lang.c] Roff in C and iscntrl

bph@ut-ngp.UUCP (hine, butler) (01/02/85)

[]
>> here's how the manual defines "iscntrl()":
>> 
>> iscntrl             c is a delete character (0177) or ordinary
>>                     control character (less than 040).
>> 
>> BOTH of the stated compilers [DeSmet C88 and CI-C86]
>> failed to interpret "ordinary" in the same way
>> as the PCC routine of the same name -- they return TRUE if the code is less
>> than octal 040.  As written, then, with this interpretation, newlines are
>> never returned, and the text is lost.  PCC, however, excludes newlines,
>> backspace codes, carriage return codes and a few others, presumably because
>> they are not "ordinary."  
>> 
>> This says something fairly awful about "portability."

> This says something fairly awful about available C implementations!
> 
> "ordinary" in the description of iscntrl is not an additional qualifier
> but an explanatory one.  iscntrl( c ) should return non-zero for c in
> { 0, 1, ..., 036, 037, 0177 } and zero for c in
> { 040, 041, ..., 0175, 0176 }.  It is illegal to supply any other value
> of c to the macro/function, although most implementations permit EOF (-1).

If this is indeed correct, then the Portable C Compiler provided with
4.2bsd is guilty of a serious offence.  I wrote a quick program to test it.
Here is the program, and its output under 4.2bsd:
------------------------------------------------
#include <stdio.h>
#include <ctype.h>

main()
{
int c;

for(c = 0; c <= '\040'; c++) {
	if(c == '\040')
		c = '\177';
	printf("%03o    ^%c    %s", c, c+'@', (iscntrl(c)) ? "Yes" : " No");
	(c&1) ? putchar('\n') : putchar('\t');
	}
}
-------------------------------------------------

000    ^@    Yes	001    ^A    Yes
002    ^B    Yes	003    ^C    Yes
004    ^D    Yes	005    ^E    Yes
006    ^F    Yes	007    ^G    Yes
010    ^H    Yes	011    ^I     No
012    ^J     No	013    ^K     No
014    ^L     No	015    ^M     No
016    ^N    Yes	017    ^O    Yes
020    ^P    Yes	021    ^Q    Yes
022    ^R    Yes	023    ^S    Yes
024    ^T    Yes	025    ^U    Yes
026    ^V    Yes	027    ^W    Yes
030    ^X    Yes	031    ^Y    Yes
032    ^Z    Yes	033    ^[    Yes
034    ^\    Yes	035    ^]    Yes
036    ^^    Yes	037    ^_    Yes
177    ^?    Yes

> "PCC" has nothing to do with the ctype macros; they are defined in
> /usr/include/ctype.h (or equivalent on non-UNIX) and usually use a table
> loaded from the standard C library.

It seems to me the manual page ought to tell you exactly what a subroutine
returns.  Clearly whoever made up the table decided codes 011-013 were NOT
ordinary control codes. Other compilers (or their libraries) take the
description of "iscntrl" above at face value, returning "Yes" for ALL of
the codes listed above.

You might try the above program on YOUR C compiler and see what happens.
 

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (01/03/85)

> > iscntrl( c ) should return non-zero for c in
> > { 0, 1, ..., 036, 037, 0177 } and zero for c in
> > { 040, 041, ..., 0175, 0176 }.
> 
> If this is indeed correct, then the Portable C Compiler provided with
> 4.2bsd is guilty of a serious offence.

(Again, this is a matter of the C library, not of the compiler!)
I confirmed that the 4.2BSD iscntrl() ctype macro is broken in the way
previously described.  UNIX System V iscntrl() performs as I described
and as, apparently, the DeSmet C88 and CI-C86 C systems do.

I wonder how ANYone could have decided that e.g. LF is NOT a control
character?  (I'm sure the 4.2BSD fans will think that makes sense!)

guido@boring.UUCP (01/04/85)

In article <6932@brl-tgr.ARPA> gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) writes:
>I wonder how ANYone could have decided that e.g. LF is NOT a control
>character?  (I'm sure the 4.2BSD fans will think that makes sense!)

This was already in the original v7 ctype.h.  It would not have been such
a problem if they had documented it clearly (e.g. by putting the whole
table in the manual page), but the manual page supplied with v7 was, alas,
very vague about the actual assignments.  I remember that the space was
not considered a 'printing' character!  (This has been fixed in all BSD
versions I know of, at least, and I suppose also in sys3/sys5.)

When one first checks for 'isspace' and only then for 'iscntrl',
everything goes well.  I guess the original idea was that 'isspace',
'iscntrl' and 'isprint' would designate disjunct sets, whose union whas
the whole ASCII set (0-177 octal).

	Guido van Rossum, "Stamp Out BASIC" Committee, CWI, Amsterdam
	guido@mcvax.UUCP