schmidt@zola.ics.uci.edu (Doug Schmidt) (02/27/90)
In article <MEISSNER.90Feb26145638@curley.osf.org>, meissner@osf (Michael Meissner) writes: >I have supported three compilers in the last 10 years (DG/L, Data >General C, and now GNU C), and none of them use any automatic tools >for lexing. Actually, GNU C uses a reserved word recognizer function generated automatically by my GPERF program. But I understand your general point. >First of all, UNIX lex is too general purpose, and too slow to be used >in a production compiler (yes I know flex fixes at least some of the >speed problems). You don't need backtracing, or even generalized >regular expressions, and such for most computer languages to lex >stuff, just simple one character lookahead. Here are some interesting statistics that add empirical support to Michael's claims: Given an input file with 350,466 C++ reserved words and 624,156 user-defined identifiers, organized as 1 word per-line (created by running the command tr -cs A-Za-z_ '\012' on the preprocessed source code for ET++) the following timing results were obtained on an otherwise idle Sun 4/260 with 32 meg of RAM: /bin/time gperf.exe < ET++.input 19.9 real 17.8 user 1.3 sys /bin/time flex.exe < ET++.input 49.2 real 32.2 user 2.5 sys /bin/time comp-flex.exe < ET++.input 92.9 real 62.4 user 2.7 sys The gperf.exe file was a gperf-generated recognizer for the 47 standard C++ reserved words invoked by the following short driver: ---------------------------------------- int main (void) { char buf[200]; while (gets (buf)) in_word_set (buf, strlen (buf)); } ---------------------------------------- The flex.exe file came from a flex 2.1 generated recognizer (also for C++ reserved words) created with the -f (no table compaction) option, in order to maximize speed. comp-flex is the same input file generated with the default settings (-cem, which gives maximal compression). Finally, here are the relative sizes (in bytes) for the object files (all compiled gcc 1.37 using the -O -strength-reduce option): ---------------------------------------- text data bss dec 1672 0 0 1672 gperf.o 6032 56 16440 22528 comp-flex.o 3616 56632 16440 76688 flex.o ---------------------------------------- My conclusions? Even flex can be big and slow for certain lexing purposes, though it beats the standard UNIX lex and its use is often justifiable on other grounds (facilitates rapid-prototyping, may be easier to maintain than a hand-generated lexer for certain languages, etc.). Finally, tool-generated code (like that produced by gperf) doesn't necessarily have to be slow, even though it's produced automatically. Doug -- On a clear day, under blue skies, there is no need to seek. And asking about Buddha +------------------------+ Is like proclaiming innocence, | schmidt@ics.uci.edu | With loot in your pocket. | office: (714) 856-4043 |