[comp.unix.wizards] Misuse of lex

mohamedr@hscfvax.UUCP (770457@Mohamed Ellozy) (03/09/87)

It is common to use lex to recognize a long list of reserved words
with constructs like:


options				return (OPTIONS);
precedence			return (PRECEDENCE);
trusted				return (TRUSTED);

		many, many more

[A-Za-z][A-Za-z0-9_-]*		{
				/* store identifiers in symbol table */
				yylval.phe = LookupSymbol (yytext);
				return (IDENT);
				}

This produces huge lexical analysers.  Furthermore, they compile
with agonizing slowness, so they can hardly be considered a
convenience to the developper.  It is easy to recode them to look
up reserved words by binary search in a table.  See Introduction to
Compiler Construction with UNIX by A. T. Schreiner and H. G. Friedman Jr.
Prentice Hall, 1985, ISBN 0-13-474396-2 01 if you need help.

The improvement in compilation speed is remarkable, as the following
timings will show:

	hscfvax# time lex lexan.old.l
	249.7u 14.1s 5:55 74% 49+239k 2+12io 14pf+0w

	hscfvax# mv lex.yy.c lexan.old.c
	hscfvax# time cc -c lexan.old.c
	67.2u 5.5s 1:28 82% 131+243k 70+94io 29pf+0w

	hscfvax# time lex lexan.l
	2.5u 0.7s 0:05 62% 44+199k 1+6io 14pf+0w

	hscfvax# mv lex.yy.c lexan.c
	hscfvax# time cc -c lexan.c
	20.1u 2.6s 0:28 79% 124+224k 16+46io 29pf+0w

	hscfvax# ls -l lex*
	-rw-r--r--  1 root        14240 Mar  8 12:08 lexan.c
	-rw-r--r--  1 root         4761 Mar  8 12:07 lexan.l
	-rw-r--r--  1 root        10873 Mar  8 12:09 lexan.o

	-rw-r--r--  1 root        42495 Mar  8 12:02 lexan.old.c
	-rw-r--r--  1 root         5031 Mar  8 10:55 lexan.old.l
	-rw-r--r--  1 root        37343 Mar  8 12:04 lexan.old.o

Seven and a half minutes of clock time versus half a minute.  As my wife
(a law student) would say: Res ipsa loquitur.

(Must I translate?  Oh well, it means: The thing speaks for itself.)

vern@lbl-csam.arpa (Vern Paxson) (03/10/87)

Mohamed Ellozy wrote that it is a misuse of lex to recognize a long
list of reserved words along with a general identifier pattern, because
the resulting lexical analyzers are huge and take forever to generate
and compile.  This is quite true; however, we have a from-the-ground-up
rewrite of lex, soon to be made available, which is optimized for just
these kinds of scanners.  It runs much faster than lex and produces
smaller scanners which execute faster.  If there's interest and Mohamed
mails me his inputs, I'll make the timings for our version of lex, and
perhaps win a few converts to recognizing keywords with lex.

	Vern Paxson				vern@lbl-csam.arpa
	Real Time Systems			ucbvax!lbl-csam.arpa!vern
	Lawrence Berkeley Laboratory		(415) 486-6411

nigelh@uvicctr.UUCP (03/13/87)

> It is common to use lex to recognize a long list of reserved words
> with constructs like:
> 	options				return (OPTIONS);
> 	precedence			return (PRECEDENCE);
> 	trusted				return (TRUSTED);
>		....

If lex were properly oriented to the user's needs, it would
automatically generate a lexical analyzer that implemented an
efficient look-up technique for the reserved words (hashing would
be my preference).

We gave up on lex a long time ago.  We wrote a replacement tool,
mkscan, that has abandoned the regular-expression idiom and simply
provides an easy-to-use, menu-driven, full-screen interface to
the user.  It does things like ask the user for a list of reserved
words.  When you are finished, out comes a lexical analyzer that is
one-third the size of the equivalent lex-produced code and runs twice
as fast.  Look out for a paper in a forthcoming issue of Software-
Practice and Experience that describes mkscan in more detail.

There is also a tool called GLA developed at the University of
Colorado which handles lists of reserved words in an intelligent
manner.  Look at S-P&E 16 (9) 1986 for a reference to that tool.

	---	Nigel Horspool
		University of Victoria, Canada

henry@utzoo.UUCP (Henry Spencer) (03/17/87)

> We gave up on lex a long time ago.  We wrote a replacement tool...
> one-third the size of the equivalent lex-produced code and runs twice
> as fast.  Look out for a paper in a forthcoming issue of Software-
> Practice and Experience that describes mkscan in more detail.

Also look at the proceedings of the winter Usenix conference for an
abstract (sigh, the paper didn't get in) of a talk by Van Jacobson of
LBL, explaining how to speed up lex so its scanners are faster than even
the best hand-cooked C code.  Van said he was going to post his paper to
mod.sources; hope it shows up soon.
-- 
"We must choose: the stars or	Henry Spencer @ U of Toronto Zoology
the dust.  Which shall it be?"	{allegra,ihnp4,decvax,pyramid}!utzoo!henry