mohamedr@hscfvax.UUCP (770457@Mohamed Ellozy) (03/09/87)
It is common to use lex to recognize a long list of reserved words
with constructs like:
options return (OPTIONS);
precedence return (PRECEDENCE);
trusted return (TRUSTED);
many, many more
[A-Za-z][A-Za-z0-9_-]* {
/* store identifiers in symbol table */
yylval.phe = LookupSymbol (yytext);
return (IDENT);
}
This produces huge lexical analysers. Furthermore, they compile
with agonizing slowness, so they can hardly be considered a
convenience to the developper. It is easy to recode them to look
up reserved words by binary search in a table. See Introduction to
Compiler Construction with UNIX by A. T. Schreiner and H. G. Friedman Jr.
Prentice Hall, 1985, ISBN 0-13-474396-2 01 if you need help.
The improvement in compilation speed is remarkable, as the following
timings will show:
hscfvax# time lex lexan.old.l
249.7u 14.1s 5:55 74% 49+239k 2+12io 14pf+0w
hscfvax# mv lex.yy.c lexan.old.c
hscfvax# time cc -c lexan.old.c
67.2u 5.5s 1:28 82% 131+243k 70+94io 29pf+0w
hscfvax# time lex lexan.l
2.5u 0.7s 0:05 62% 44+199k 1+6io 14pf+0w
hscfvax# mv lex.yy.c lexan.c
hscfvax# time cc -c lexan.c
20.1u 2.6s 0:28 79% 124+224k 16+46io 29pf+0w
hscfvax# ls -l lex*
-rw-r--r-- 1 root 14240 Mar 8 12:08 lexan.c
-rw-r--r-- 1 root 4761 Mar 8 12:07 lexan.l
-rw-r--r-- 1 root 10873 Mar 8 12:09 lexan.o
-rw-r--r-- 1 root 42495 Mar 8 12:02 lexan.old.c
-rw-r--r-- 1 root 5031 Mar 8 10:55 lexan.old.l
-rw-r--r-- 1 root 37343 Mar 8 12:04 lexan.old.o
Seven and a half minutes of clock time versus half a minute. As my wife
(a law student) would say: Res ipsa loquitur.
(Must I translate? Oh well, it means: The thing speaks for itself.)
vern@lbl-csam.arpa (Vern Paxson) (03/10/87)
Mohamed Ellozy wrote that it is a misuse of lex to recognize a long list of reserved words along with a general identifier pattern, because the resulting lexical analyzers are huge and take forever to generate and compile. This is quite true; however, we have a from-the-ground-up rewrite of lex, soon to be made available, which is optimized for just these kinds of scanners. It runs much faster than lex and produces smaller scanners which execute faster. If there's interest and Mohamed mails me his inputs, I'll make the timings for our version of lex, and perhaps win a few converts to recognizing keywords with lex. Vern Paxson vern@lbl-csam.arpa Real Time Systems ucbvax!lbl-csam.arpa!vern Lawrence Berkeley Laboratory (415) 486-6411
nigelh@uvicctr.UUCP (03/13/87)
> It is common to use lex to recognize a long list of reserved words > with constructs like: > options return (OPTIONS); > precedence return (PRECEDENCE); > trusted return (TRUSTED); > .... If lex were properly oriented to the user's needs, it would automatically generate a lexical analyzer that implemented an efficient look-up technique for the reserved words (hashing would be my preference). We gave up on lex a long time ago. We wrote a replacement tool, mkscan, that has abandoned the regular-expression idiom and simply provides an easy-to-use, menu-driven, full-screen interface to the user. It does things like ask the user for a list of reserved words. When you are finished, out comes a lexical analyzer that is one-third the size of the equivalent lex-produced code and runs twice as fast. Look out for a paper in a forthcoming issue of Software- Practice and Experience that describes mkscan in more detail. There is also a tool called GLA developed at the University of Colorado which handles lists of reserved words in an intelligent manner. Look at S-P&E 16 (9) 1986 for a reference to that tool. --- Nigel Horspool University of Victoria, Canada
henry@utzoo.UUCP (Henry Spencer) (03/17/87)
> We gave up on lex a long time ago. We wrote a replacement tool... > one-third the size of the equivalent lex-produced code and runs twice > as fast. Look out for a paper in a forthcoming issue of Software- > Practice and Experience that describes mkscan in more detail. Also look at the proceedings of the winter Usenix conference for an abstract (sigh, the paper didn't get in) of a talk by Van Jacobson of LBL, explaining how to speed up lex so its scanners are faster than even the best hand-cooked C code. Van said he was going to post his paper to mod.sources; hope it shows up soon. -- "We must choose: the stars or Henry Spencer @ U of Toronto Zoology the dust. Which shall it be?" {allegra,ihnp4,decvax,pyramid}!utzoo!henry