johnl@ima.UUCP (03/31/88)
The Scanner Generator REX
==========================
Rex is a scanner generator similar to LEX. Scanners generated by
Rex are more than 4 times faster and 4 times smaller than LEX-generated
ones. The following lines describe some features of Rex and give some
numbers of its performance.
Rex stands for Regular EXpression tool. Like LEX a scanner specifi-
cation contains regular expressions, arbitrary action statements, right
context, and start states. From the logical point of view the specifica-
tion techniques are equivalent. The syntax differs slightly, because we
tried to improve the readability. A few LEX features are missing in Rex
because they would decrease performance and they are rarely needed in
scanners. On the other hand we added automatic calculation of source
coordinates (line and column numbers) of the recognized tokens and an
efficient way to normalize recognized tokens to upper or lower case
letters. Scanners can be generated in the languages C and Modula-2.
Rex itself is implemented as a 5,000 line Modula-2 program.
Scanners generated by Rex are more than 4 times faster and up to 5
times smaller than LEX-generated ones. It is possible to reach a speed
of 180,000 to 195,000 lines per minute on a MC 68020 processor (includ-
ing input from file). If hashing of identifiers is performed addition-
ally the speed is between 125,000 and 150,000 lines per minute. The
generator Rex itself is 10 to 20 times faster than LEX in typical cases.
In the following we compare the time to generate a scanner as well
as the speed of the generated scanners using LEX and Rex in realistic
cases.
To compare the scanner generators we used 4 versions of a Modula-2
scanner specification. Version 1 is incomplete, version 2 omits key-
words, version 3 has keywords, and version 4 has lower as well as upper
case keywords. The times are given in seconds measured on a MC 68020
processor (Table 1).
Table 1: Comparison of Scanner Generators
Spec. 1 Spec. 2 Spec. 3 Spec. 4
___________________________________________________________________________
# of regular expressions 12 39 79 119
# of states (generated by Rex) 40 66 221 376
___________________________________________________________________________
generation time using LEX [sec.] 3.16 6.66 74.11 215.99
generation time using Rex [sec.] 3.56 3.72 4.78 6.16
___________________________________________________________________________
table size using LEX [bytes] 2,996 4,708 39,156 67,384
table size using Rex [bytes] 3,114 3,218 4,366 5,530
___________________________________________________________________________
scanner size using LEX [bytes] 5,464 8,044 43,756 73,264
scanner size using Rex [bytes] 8,452 8,892 11,208 13,544
LEX wins with small specifications. The (correct) scanner specifi-
cation 3 is processed nearly 20 times faster by Rex and the scanner is 4
times smaller because the tables are 10 times smaller.
To compare the generated scanners we used a big Modula-2 program as
input whose characteristics are shown in Table 2. The timings include
input from file, scanning and hashing of identifiers.
Table 2: Comparison of Generated Scanners
# of lines 4,171
# of characters 100,268
# of tokens 16,948
_________________________________________________________
scanning time using LEX [sec.] 7.28
scanning time using Rex (Modula) [sec.] 2.78
scanning time using Rex (C) [sec.] 1.75
_________________________________________________________
scanning speed of LEX [lines/min.] 34,400
scanning speed of Rex (Modula) [lines/min.] 90,000
scanning speed of Rex (C) [lines/min.] 143,000
I would like to hear some numbers from "flex".
Josef Grosch, GMD - Forschungsstelle an der Universitaet Karlsruhe
Haid-und-Neu-Str. 7, D-7500 Karlsruhe 1, West Germany
Phone 0721-662226 grosch@gmdka.UUCP
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU
Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers. Meta-mail to ima!compilers-requestvern@lbl-rtsg.arpa (Vern Paxson) (04/02/88)
Josef Grosch describes Rex, a scanner generator similar to lex, and gives some timings comparing its perfomance with lex's: > Scanners generated by Rex are more than 4 times faster and up to 5 > times smaller than LEX-generated ones. It is possible to reach a speed > of 180,000 to 195,000 lines per minute on a MC 68020 processor (includ- > ing input from file).... > .... I would like to hear some numbers from "flex". Here they are for a C tokenizer which includes keywords. Generation times: lex 83.0 secs flex 3.9 flex -cfe 7.1 # uncompressed table, equivalence classes flex -cf 15.0 # uncompressed table, no equivalence classes Scanner object file sizes: lex 41.0K bytes flex 9.4K flex -cfe 49.6K flex -cf 126.5K Running times on a 28,088 line input (685K characters): lex 29.8 secs flex 19.3 flex -cfe 9.0 flex -cf 7.8 The timings were made on a Sun 3/60. All times are user + system CPU time, and don't include hashing of identifiers. Summary: For about the same sized scanner, you get a factor of 3 in performance. For a 30% faster scanner, you get a scanner 1/4th the size, and it's generated in 1/20th the time. For a scanner that's 3 times larger, you get a factor of 3.8 in performance. Rex sounds very good - taken separately, various options of flex perform in about its league, but flex can't match its combined performance. This is due in part to flex's compressed scanners not having been optimized for speed nearly as much as the full table scanners. What is Rex's availability? Vern Vern Paxson vern@lbl-csam.arpa Real Time Systems ucbvax!lbl-csam.arpa!vern Lawrence Berkeley Laboratory (415) 486-6411 -- Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | bbn}!ima Please send responses to the originator of the message -- I cannot forward mail accidentally sent back to compilers. Meta-mail to ima!compilers-request