[comp.compilers] replacements for LEX: flex, ... , Rex

johnl@ima.UUCP (03/31/88)

                       The Scanner Generator REX
                       ==========================

     Rex is a scanner generator similar to LEX.  Scanners  generated  by
Rex  are more than 4 times faster and 4 times smaller than LEX-generated
ones.  The following lines describe some features of Rex and  give  some
numbers of its performance.

     Rex stands for Regular EXpression tool. Like LEX a scanner specifi-
cation  contains regular expressions, arbitrary action statements, right
context, and start states. From the logical point of view the specifica-
tion  techniques are equivalent. The syntax differs slightly, because we
tried to improve the readability. A few LEX features are missing in  Rex
because  they  would  decrease performance and they are rarely needed in
scanners.  On the other hand we added automatic  calculation  of  source
coordinates  (line  and  column numbers) of the recognized tokens and an
efficient way to normalize recognized tokens  to  upper  or  lower  case
letters.   Scanners  can  be  generated in the languages C and Modula-2.
Rex itself is implemented as a 5,000 line Modula-2 program.

     Scanners generated by Rex are more than 4 times faster and up to  5
times  smaller  than LEX-generated ones. It is possible to reach a speed
of 180,000 to 195,000 lines per minute on a MC 68020 processor  (includ-
ing  input  from file). If hashing of identifiers is performed addition-
ally the speed is between 125,000 and 150,000  lines  per  minute.   The
generator Rex itself is 10 to 20 times faster than LEX in typical cases.

     In the following we compare the time to generate a scanner as  well
as  the  speed  of the generated scanners using LEX and Rex in realistic
cases.

     To compare the scanner generators we used 4 versions of a  Modula-2
scanner  specification.  Version  1  is incomplete, version 2 omits key-
words, version 3 has keywords, and version 4 has lower as well as  upper
case  keywords.   The  times are given in seconds measured on a MC 68020
processor (Table 1).


               Table 1: Comparison of Scanner Generators


                                         Spec. 1  Spec. 2  Spec. 3  Spec. 4
___________________________________________________________________________
# of regular expressions                      12       39       79      119
# of states (generated by Rex)                40       66      221      376
___________________________________________________________________________
generation time using LEX       [sec.]      3.16     6.66    74.11   215.99
generation time using Rex       [sec.]      3.56     3.72     4.78     6.16
___________________________________________________________________________
table size using LEX            [bytes]    2,996    4,708   39,156   67,384
table size using Rex            [bytes]    3,114    3,218    4,366    5,530
___________________________________________________________________________
scanner size using LEX          [bytes]    5,464    8,044   43,756   73,264
scanner size using Rex          [bytes]    8,452    8,892   11,208   13,544


     LEX wins with small specifications.  The (correct) scanner specifi-
cation 3 is processed nearly 20 times faster by Rex and the scanner is 4
times smaller because the tables are 10 times smaller.

     To compare the generated scanners we used a big Modula-2 program as
input  whose  characteristics  are shown in Table 2. The timings include
input from file, scanning and hashing of identifiers.


		  Table 2: Comparison of Generated Scanners


	 # of lines                                          4,171
	 # of characters                                   100,268
	 # of tokens                                        16,948
	 _________________________________________________________
	 scanning time using LEX            [sec.]            7.28
	 scanning time using Rex (Modula)   [sec.]            2.78
	 scanning time using Rex (C)        [sec.]            1.75
	 _________________________________________________________
	 scanning speed of LEX              [lines/min.]    34,400
	 scanning speed of Rex (Modula)     [lines/min.]    90,000
	 scanning speed of Rex (C)          [lines/min.]   143,000

     I would like to hear some numbers from "flex".

     Josef Grosch, GMD - Forschungsstelle an der Universitaet Karlsruhe
     Haid-und-Neu-Str. 7, D-7500 Karlsruhe 1, West Germany

     Phone 0721-662226 grosch@gmdka.UUCP
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

vern@lbl-rtsg.arpa (Vern Paxson) (04/02/88)

Josef Grosch describes Rex, a scanner generator similar to lex,
and gives some timings comparing its perfomance with lex's:

>     Scanners generated by Rex are more than 4 times faster and up to  5
> times  smaller  than LEX-generated ones. It is possible to reach a speed
> of 180,000 to 195,000 lines per minute on a MC 68020 processor  (includ-
> ing  input  from file)....
> ....  I would like to hear some numbers from "flex".

Here they are for a C tokenizer which includes keywords.

Generation times:

	lex	    83.0 secs
	flex	     3.9
	flex -cfe    7.1	# uncompressed table, equivalence classes
	flex -cf    15.0	# uncompressed table, no equivalence classes

Scanner object file sizes:

	lex	   41.0K bytes
	flex	    9.4K
	flex -cfe  49.6K
	flex -cf  126.5K

Running times on a 28,088 line input (685K characters):

	lex	   29.8 secs
	flex	   19.3
	flex -cfe   9.0
	flex -cf    7.8

The timings were made on a Sun 3/60.  All times are user + system CPU time,
and don't include hashing of identifiers.

Summary:

    For about the same sized scanner, you get a factor of 3 in performance.
    For a 30% faster scanner, you get a scanner 1/4th the size, and it's
	generated in 1/20th the time.
    For a scanner that's 3 times larger, you get a factor of 3.8 in
	performance.

Rex sounds very good - taken separately, various options of flex perform in
about its league, but flex can't match its combined performance.  This is
due in part to flex's compressed scanners not having been optimized for
speed nearly as much as the full table scanners.

What is Rex's availability?

		Vern

	Vern Paxson				vern@lbl-csam.arpa
	Real Time Systems			ucbvax!lbl-csam.arpa!vern
	Lawrence Berkeley Laboratory		(415) 486-6411
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request