[comp.lang.c] LEX behaviour when given "large" au

carroll@snail.CS.UIUC.EDU (03/24/88)

/* Written 11:30 pm  Mar 19, 1988 by chris@mimsy.UUCP in snail:comp.lang.c */
Several people suggested replacing lex description such as

	if	return (IF);
	then	return (THEN);
	else	return (ELSE);
	{id}	{ symenter(yytext); return (ID); }

with

	{id}	{ /* first try it as a keyword */
		  k = keyword_index(yytext); if (k >= 0) return (k);
		  /* not a keyword, must be an identifier */
		  symenter(yytext); return (ID); }

This may (probably does) help in lex, but it is clearly the wrong way
around in almost all cases.  The lexer has to traverse a DFA to decide
that something is an ID.  While it is at it it could easily tell
whether it is a keyword instead.  Obviously this makes the DFA table
larger, but the result should run faster.
( ... )
/* End of text from snail:comp.lang.c */

	I disagree. For a compiler class, we had to write a lex file, and
the official way was to use the

IF	 {return T_IF }

etc. thing. It was big and slow. I changed mine to use the single rule for
identifier, and look it up in a table, and I got a speed up of roughly 4
in lex/compile time, and 40% reduction in compiled object size. I think that
the point is that the DFA that decides if something is a token or not is
so much simpler than one that has to check for all of the keywords, that
the gain is positive from removing the keyword cases. In addition, I would
think that for most cases, you have to find non-builtin "keywords" (e.g,
variable names, etc.) as you go, and so have to check for something
that is not a keyword but is still an identifier anyway.


Alan M. Carroll		amc@woodshop.cs.uiuc.edu	carroll@s.cs.uiuc.edu
...{ihnp4,convex}!uiucdcs!woodshop!amc
Quote of the day :
	"Touch my soul, catch the very light
	 Hide the moment, from my eager eyes"