johnl@ima.UUCP (03/31/88)
The Scanner Generator REX ========================== Rex is a scanner generator similar to LEX. Scanners generated by Rex are more than 4 times faster and 4 times smaller than LEX-generated ones. The following lines describe some features of Rex and give some numbers of its performance. Rex stands for Regular EXpression tool. Like LEX a scanner specifi- cation contains regular expressions, arbitrary action statements, right context, and start states. From the logical point of view the specifica- tion techniques are equivalent. The syntax differs slightly, because we tried to improve the readability. A few LEX features are missing in Rex because they would decrease performance and they are rarely needed in scanners. On the other hand we added automatic calculation of source coordinates (line and column numbers) of the recognized tokens and an efficient way to normalize recognized tokens to upper or lower case letters. Scanners can be generated in the languages C and Modula-2. Rex itself is implemented as a 5,000 line Modula-2 program. Scanners generated by Rex are more than 4 times faster and up to 5 times smaller than LEX-generated ones. It is possible to reach a speed of 180,000 to 195,000 lines per minute on a MC 68020 processor (includ- ing input from file). If hashing of identifiers is performed addition- ally the speed is between 125,000 and 150,000 lines per minute. The generator Rex itself is 10 to 20 times faster than LEX in typical cases. In the following we compare the time to generate a scanner as well as the speed of the generated scanners using LEX and Rex in realistic cases. To compare the scanner generators we used 4 versions of a Modula-2 scanner specification. Version 1 is incomplete, version 2 omits key- words, version 3 has keywords, and version 4 has lower as well as upper case keywords. The times are given in seconds measured on a MC 68020 processor (Table 1). Table 1: Comparison of Scanner Generators Spec. 1 Spec. 2 Spec. 3 Spec. 4 ___________________________________________________________________________ # of regular expressions 12 39 79 119 # of states (generated by Rex) 40 66 221 376 ___________________________________________________________________________ generation time using LEX [sec.] 3.16 6.66 74.11 215.99 generation time using Rex [sec.] 3.56 3.72 4.78 6.16 ___________________________________________________________________________ table size using LEX [bytes] 2,996 4,708 39,156 67,384 table size using Rex [bytes] 3,114 3,218 4,366 5,530 ___________________________________________________________________________ scanner size using LEX [bytes] 5,464 8,044 43,756 73,264 scanner size using Rex [bytes] 8,452 8,892 11,208 13,544 LEX wins with small specifications. The (correct) scanner specifi- cation 3 is processed nearly 20 times faster by Rex and the scanner is 4 times smaller because the tables are 10 times smaller. To compare the generated scanners we used a big Modula-2 program as input whose characteristics are shown in Table 2. The timings include input from file, scanning and hashing of identifiers. Table 2: Comparison of Generated Scanners # of lines 4,171 # of characters 100,268 # of tokens 16,948 _________________________________________________________ scanning time using LEX [sec.] 7.28 scanning time using Rex (Modula) [sec.] 2.78 scanning time using Rex (C) [sec.] 1.75 _________________________________________________________ scanning speed of LEX [lines/min.] 34,400 scanning speed of Rex (Modula) [lines/min.] 90,000 scanning speed of Rex (C) [lines/min.] 143,000 I would like to hear some numbers from "flex". Josef Grosch, GMD - Forschungsstelle an der Universitaet Karlsruhe Haid-und-Neu-Str. 7, D-7500 Karlsruhe 1, West Germany Phone 0721-662226 grosch@gmdka.UUCP -- Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | bbn}!ima Please send responses to the originator of the message -- I cannot forward mail accidentally sent back to compilers. Meta-mail to ima!compilers-request
vern@lbl-rtsg.arpa (Vern Paxson) (04/02/88)
Josef Grosch describes Rex, a scanner generator similar to lex, and gives some timings comparing its perfomance with lex's: > Scanners generated by Rex are more than 4 times faster and up to 5 > times smaller than LEX-generated ones. It is possible to reach a speed > of 180,000 to 195,000 lines per minute on a MC 68020 processor (includ- > ing input from file).... > .... I would like to hear some numbers from "flex". Here they are for a C tokenizer which includes keywords. Generation times: lex 83.0 secs flex 3.9 flex -cfe 7.1 # uncompressed table, equivalence classes flex -cf 15.0 # uncompressed table, no equivalence classes Scanner object file sizes: lex 41.0K bytes flex 9.4K flex -cfe 49.6K flex -cf 126.5K Running times on a 28,088 line input (685K characters): lex 29.8 secs flex 19.3 flex -cfe 9.0 flex -cf 7.8 The timings were made on a Sun 3/60. All times are user + system CPU time, and don't include hashing of identifiers. Summary: For about the same sized scanner, you get a factor of 3 in performance. For a 30% faster scanner, you get a scanner 1/4th the size, and it's generated in 1/20th the time. For a scanner that's 3 times larger, you get a factor of 3.8 in performance. Rex sounds very good - taken separately, various options of flex perform in about its league, but flex can't match its combined performance. This is due in part to flex's compressed scanners not having been optimized for speed nearly as much as the full table scanners. What is Rex's availability? Vern Vern Paxson vern@lbl-csam.arpa Real Time Systems ucbvax!lbl-csam.arpa!vern Lawrence Berkeley Laboratory (415) 486-6411 -- Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | bbn}!ima Please send responses to the originator of the message -- I cannot forward mail accidentally sent back to compilers. Meta-mail to ima!compilers-request