[net.unix-wizards] ctype.h

eric@wucs.UUCP (03/16/84)

Why does ctype have 0 as the first element, and all the ctype
functions add 1 to get the proper index?

eric
..!ihnp4!afinitc!wucs!eric

kurt@wucs.UUCP (Kurt Haserodt) (03/16/84)

	Why does ctype have 0 as the first element, and all the ctype
	functions add 1 to get the proper index?


This is so EOF can be characterized (no pun intended) by the ctype
macros.  As you recall EOF is (many times) defined to be -1.

gwyn@brl-vgr.ARPA (Doug Gwyn ) (03/17/84)

The ctype(3) macros work when handed anything getchar() may return,
including EOF, which is -1.

rcd@opus.UUCP (03/17/84)

<>
> Why does ctype have 0 as the first element, and all the ctype
> functions add 1 to get the proper index?
(I believe) because the EOF value returned by getc/getchar (stdio) looks
like -1 when used as an index to the table.  Thus the EOF value is not a
member of any of the character classes.
-- 
{hao,ucbvax,allegra}!nbires!rcd

bet@ecsvax.UUCP (03/26/84)

In a lexical analyzer I wanted translate tables for values returned by
getchar() -- including EOF (-1). I wanted them FAST. So I created arrays
like this:

struct
{
	char dummy;
	char class[128];
} character=
{
	/* list of 129 values for characters, starting with EOF */
};

My reasoning was as follows: members of a structure of homogeneous composition
(no alignment problems) occupy consecutive locations in memory. C, god bless
its black-hearted soul, doesn't attempt subscript bounds checking. Finally,
character.class evaluates to a constant expression at compile time, which
C compilers can (and my reading suggests they will) simplify at compile time.
Therefore, I think I have a legal array with subscripts ranging from -1 to 127.
Anything wrong with this? Shouldn't it be faster than always using array[i+1]
(or evaluating i+1 into a temporary)? Inasmuch as I explained the trick clearly
in a comment, I am not interested in arguments like "UGLY" or "confusing".
					Bennett Todd
					...{decvax,ihnp4,akgua}!mcnc!ecsvax!bet

matt@UCLA-LOCUS.ARPA (03/28/84)

From:            Matthew J. Weinstein <matt@UCLA-LOCUS.ARPA>

	Date: 25 Mar 84 17:58:07-PST (Sun)
	To: Unix-Wizards@Brl-Vgr.ARPA
	From: decvax!mcnc!ecsvax!bet@Ucb-Vax.ARPA
	Subject: Re: Ctype.h (start arrays at 1 then add 1 before looking up)

	Article-I.D.: ecsvax.2189

	In a lexical analyzer I wanted translate tables for values returned by
	getchar() -- including EOF (-1). I wanted them FAST. So I created arrays
	like this:

	struct
	{
		char dummy;
		char class[128];
	} character=
	{
		/* list of 129 values for characters, starting with EOF */
	};

	My reasoning was as follows: members of a structure of homogeneous composition
	(no alignment problems) occupy consecutive locations in memory. C, god bless
	its black-hearted soul, doesn't attempt subscript bounds checking. Finally,
	character.class evaluates to a constant expression at compile time, which
	C compilers can (and my reading suggests they will) simplify at compile time.
	Therefore, I think I have a legal array with subscripts ranging from -1 to 127.
	Anything wrong with this? Shouldn't it be faster than always using array[i+1]
	(or evaluating i+1 into a temporary)? Inasmuch as I explained the trick clearly
	in a comment, I am not interested in arguments like "UGLY" or "confusing".
						Bennett Todd
						...{decvax,ihnp4,akgua}!mcnc!ecsvax!bet

---

I did a bit of experimenting with the following sort of code:

	{
	static char table[129];
	register int i;
	register char *ptr = &lookup[1];
	...
	y = ptr[i];
	...
	}

The generated assembly for this is basically (base,index,dest):

	cvtbl (rB)[rI],rD

(Note that y is an int because register chars don't get to live in registers;
if y is declared as char, the generated stores relative to the FP on the
Vax).

The sequence:

	y = table[i+1]

generates reasonable code too:

	cvtbl Ltable+1[rI],rD

[Of course, if table is allocated dynamically, the first of the two forms
(initializing a pointer) is less expensive, since otherwise table's
offset must be recomputed at each access]

There doesn't seem to be any gain to building a structure in this case.

				- Matt
				matt@ucla-locus
				{ihnp4,ucbvax}!ucla-s!matt

chris@hwcs.UUCP (04/05/84)

I prefer the following technique for -1 origin arrays (or indeed
any other origin).  I think it is easier to understand, because
it makes no assumption that C will preserve the order of fields
within a structure, and will insert no padding between them:

	------------------------------------------------------
	char	tableinit[129] = { /* Initial values ... */ };
	char	*table = tableinit+1;
	/* table is now an ARRAY [-1..128] OF CHAR, in Pascal
	 * terminology.
	 */
	------------------------------------------------------
Chris Miller

Tom Perrine <tom@LOGICON.ARPA> (01/23/85)

I know this is nearly unbeleivable, but I am working on a PWB
(Programmer's WorkBench) UNIX system, which pre-dates V7.

ALL of the software from unix sources uses <ctype.h>, which is a great
idea, BUT CAN SOMEONE HELP ME MAKE ctype.h for PWB? A ctype.h from
any PDP-11 version should do. We have source licenses for PWB
(and will have V7 license in a month) but I need some help sooner.

Can anyone help?

Thanks in advance,
Tom Perrine
Logicon - OSD
San Diego, CA
{tom@logicon.arpa}