melling@psuvax1.cs.psu.edu (Michael D Mellinger) (11/16/90)
Can someone give me an estimate on how much faster parsing can be made by writing a recursive-descent parser instead of using Yacc and Lex? Is there enough of a difference to consider using a RDP in a commercial C compiler? PC compilers seem to be very fast when compared to workstation compilers such as GCC. Is optimization(or lack of) the crucial difference in PC and workstation compilation speeds? -Mike [It is my distinct impression that yacc parsers work fairly quickly. Particularly on machines with slow procedure calls, they can be faster than recursive descent. Lex is slow, but flex uses newer techniques so that it is usually as fast as a hand-coded lexer. Unix compilers are slow because they use a separate assembly pass and do more optimizations. If you use, e.g., MS C with all of its optimization options it's no speed demon, either. -John] -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
pardo@cs.washington.edu (David Keppel) (11/17/90)
melling@psuvax1.cs.psu.edu (Michael D Mellinger) writes: >[How much faster is parsing with Recursive Descent instead of YACC?] See also: %A Frank G. Pagan %T Comparative Efficiency of General and Residual Parsers %J SIGPLAN Notices %V 25 %N 4 %D April 1990 %P 59-68 %X * Concise and good introduction to partial evaluation * Focus on partial evaluation of parsers generating pure code. * Hand traslation of Pascal and C. * Time (2-10X faster) and size (0.5-1.25 larger). %A Thomas J. Pennello %T Very Fast LR Parsing %J Proceedings of the SIGPLAN 1986 Symposium on Compiler Construction; SIGPLAN Notices %V 21 %N 7 %D July 1986 %P 145-151 %X * Partial evaluation of the table interpreter with resepct to each element of the table. * Speedup: on a VAX-like machine, 40,000 to 500,000 lines per minute. On an 80286, 37,000 to 240,000 lines per minute. * FSM converted to assembly language. * 2-4X increase in table size. On a machine with fast procedure call and so forth, I GUESS that these schemes win on the basis of better register allocation. I don't know of any fundamental big-oh differences between YACC-style and recursive descent. ;-D oN ( YACCity-YACC -- Don't Parse Back ) Pardo -- pardo@cs.washington.edu {rutgers,cornell,ucsd,ubc-cs,tektronix}!uw-beaver!june!pardo -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
grimlok@hubcap.clemson.edu (Mike Percy) (11/17/90)
melling@psuvax1.cs.psu.edu (Michael D Mellinger) writes: >Can someone give me an estimate on how much faster parsing can be made by >writing a recursive-descent parser instead of using Yacc and Lex? Is there >enough of a difference to consider using a RDP in a commercial C compiler? I would tend to believe that table driven parsers (YACC) are faster in execution than recursive descent parsers. The stack and function call overhead alone can get to be pretty hairy, especially in deeply recursive parses. What is faster, in my experience, is the speed in which you can get a parser running. Trying to coax a grammer that YACC likes (conflict-free) is downright a pain, but the recursive descent parser generators I've worked on/with have much laxer restrictions on the grammar than LR(1). In theory, one could have a non-deterministic, backtracking RD parser (seen one or two done in Prolog), but most generators at least require the grammar to be deterministic (is this the same as LR(1)? If you want to get a parser up quickly, use an Early's algorithm-based parser. It will run slow, but it runs the first time, giving you time to massage the grammar for a faster parser type. -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
hankd@ecn.purdue.edu (Hank Dietz) (11/18/90)
In article <F7p?bk63@cs.psu.edu> melling@psuvax1.cs.psu.edu (Michael D Mellinger) writes: >Can someone give me an estimate on how much faster parsing can be made by >writing a recursive-descent parser instead of using Yacc and Lex? ... PCCTS (the Purdue Compiler-Construction Tool Set) builds LL(k) parsers which average about 2-3x as fast as YACC's parsers. The primary difference between them is that LL stack management can be implemented directly with machine instructions on most machines, whereas LR stack handling requires an interpreter... hence the 2-3x difference (for example, it was about 2x for Pascal). -hankd -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
henry@zoo.toronto.edu (Henry Spencer) (11/18/90)
In article <F7p?bk63@cs.psu.edu> melling@psuvax1.cs.psu.edu (Michael D Mellinger) writes: >Can someone give me an estimate on how much faster parsing can be made by >writing a recursive-descent parser instead of using Yacc and Lex? Is there >enough of a difference to consider using a RDP in a commercial C compiler? >... >[It is my distinct impression that yacc parsers work fairly quickly. >Particularly on machines with slow procedure calls, they can be faster than >recursive descent... Of note here is the recent EUUG paper from Ken Thompson on the Plan 9 C compiler. (This is a small, fast compiler generating quite good code.) He uses a yacc parser, but says that the compiler spends a lot of time parsing and he thinks a recursive-descent parser would be substantially faster, now that the compiler and the language (which has some changes from orthodox C) have settled down. He also notes that using the manufacturer's fancy "call" instruction is usually a bad idea, since a stripped-down one (fortunately both the VAX and the NS32k series have such) is generally much faster. His assessment of relative speeds probably assumes an efficient call sequence. -- Henry Spencer at U of Toronto Zoology henry@zoo.toronto.edu utzoo!henry -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
Bruce.Hoult@actrix.co.nz (Bruce Hoult) (11/18/90)
In article <F7p?bk63@cs.psu.edu> melling@psuvax1.cs.psu.edu (Michael D Mellinger) writes: > Can someone give me an estimate on how much faster parsing can be made by > writing a recursive-descent parser instead of using Yacc and Lex? Is there > enough of a difference to consider using a RDP in a commercial C compiler? Bjarne Stroustrup is on record as saying that he regrets having used yacc to generate the parser in CFront (AT&T's C++ compiler), not because of any difference in parsing speed, but because of the extreme difficulty of providing meaningful syntax errors with yacc, compared to using RD. Walter Bright used RD in Zortech C++, and it's no slouch in compile speed. -- Bruce Hoult Bruce.Hoult@actrix.gen.nz -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
dave@labtam.labtam.oz.au (David Taylor) (11/20/90)
grimlok@hubcap.clemson.edu (Mike Percy) writes: >What is faster, in my experience, is the speed in which you can get a parser >running. Trying to coax a grammer that YACC likes (conflict-free) is >downright a pain, but the recursive descent parser generators I've worked >on/with have much laxer restrictions on the grammar than LR(1). In theory, >one could have a non-deterministic, backtracking RD parser (seen one or two >done in Prolog), but most generators at least require the grammar to be >deterministic (is this the same as LR(1)? Recursive descent parser == LL(0) LR(1) is LESS restrictive. You can still use a grammar with yacc that has some conflicts as it has some simple rules for conflict resolution. Some of these recursive descent parser generators that you mention probably use similar rules for conflict resolution but don't bother to tell you about their use. Look up some of the compiler theory ... it helps when you're trying to design the grammar. -- David Taylor Labtam Information Systems Pty. Ltd. Work: (03) 587-1444 UUCP: !uunet!munnari!labtam.oz!dave FAX: (03) 580-5581 Internet: dave@labtam.oz.au Home: (03) 857-5660 (occasionally) -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
dhw@uunet.UU.NET id AA00722; Tue, 20 Nov 90 15:41:42 -0500 (David H. West) (11/21/90)
In article <11678@hubcap.clemson.edu> grimlok@hubcap.clemson.edu (Mike Percy) writes: >the recursive descent parser generators I've worked on/with [...] >If you want to get a parser up quickly, use an Early's algorithm-based >parser. Does anyone know of any of the above kinds of parser freely available in source? -David West dhw@iti.org [Quite a while ago I used an Earley parser in IMP-10, but I haven't seen one lately. Personally, I think the effort spent in pushing a grammar into LALR or LL(k) is usually worth it, one often finds surprising ambiguities. -John] -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
grosch@gmdka.uucp (Josef Grosch) (11/22/90)
Michael D Mellinger writes:
[Can someone give me an estimate on how much faster parsing can be made by
writing a recursive-descent parser instead of using Yacc and Lex?]
I did a comparison of generated parsers on a SUN 3 with MC 68020 processor
(excluding scanning):
tool speed (tokens per second)
Yacc 16,000
Lalr - our LALR(1) parser generator 35,000
Ell - our LL(1) recursive descent parser generator 55,000
Pure parsing can be made faster by a factor of 2, 3, or more compared to Yacc.
However, as pure parsing takes only 5 to 10 % of the whole compilation time,
it is not the bottle neck and does not matter too much.
Scanning is much more time critical, as it takes 30 to 50 %.
Therefore Lex can not be used for high speed compilers.
Scanner generators like Rex and flex are up to 5 times faster.
Josef Grosch
---
grosch@gmdka.uucp
--
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
grimlok@hubcap.clemson.edu (Mike Percy) (11/24/90)
dave@labtam.labtam.oz.au (David Taylor) writes: >grimlok@hubcap.clemson.edu (Mike Percy) writes: [some of my comments WRT RD parsers] >Recursive descent parser == LL(0) My references mention only that RD parsers are used with "extended-tree grammars," which are then defined. Perhaps I'm using the worng references, or have missed something, but I never noticed that RD == LL(0). >LR(1) is LESS restrictive. No argument there. >You can still use a grammar with yacc that has some conflicts as it >has some simple rules for conflict resolution. Some of these >recursive descent parser generators that you mention probably use >similar rules for conflict resolution but don't bother to tell you >about their use. Actually, doesn't yacc and its ilk use LALR? LALR can introduce conflicts that aren't in LR. As for conflict resolution methods, I'm sure that they all do similar things. >Look up some of the compiler theory ... it helps when you're trying to >design the grammar. It would also help if the language designers would make life a little easier... As for the theory, I do look it up. RD parsers weren't even mentioned in compiler class, except by me. I was told we weren't going to look at RD parsers. LL was defined, but not explored like LR and LALR were. Mike Percy grimlok@hubcap.clemson.edu ISD, Clemson University mspercy@clemson.BITNET (803)656-3780 mspercy@clemson.clemson.edu -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
jsp@milton.u.washington.edu (Jeff Prothero) (11/24/90)
grosch@gmdka.uucp (Josef Grosch) writes: >I did a comparison of generated parsers on a SUN 3 with MC 68020 processor >(excluding scanning): > >tool speed (tokens per second) > >Yacc 16,000 >Lalr - our LALR(1) parser generator 35,000 >Ell - our LL(1) recursive descent parser generator 55,000 > >Pure parsing can be made faster by a factor of 2, 3, or more compared to Yacc. Note that "LALR(1) parser generator" doesn't have to mean "table driven". Don't have the reference handy, but someone (I believe MetaWare's Tom Penello?) has obtained big speedups by using procedural implementations of the LALR(1) tables. Robert Henry has done much the same thing with code generation tables in CODEGEN (UW technical report), and observed that table compression and lookup techniques are essentially application-independent, and that a general tool could be developed to factor, compress, and implement tables, including generating fast procedural implementations. Seems to me procedural table implementations bring us full circle: We started with fast random code, then switched to slower table-driven code as an aid to portability and code maintainablility. Now that we can generate the unportable, unmaintainable code automatically from declarative specifications, the speed consideration is coming to the fore again. No? Jeff Prothero jsp@u.washington.edu -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
melling@psuvax1.cs.psu.edu (Michael D Mellinger) (11/26/90)
In article <9011221247.AA18998@gmdka.uucp> grosch@gmdka.uucp (Josef Grosch) writes: > Pure parsing can be made faster by a factor of 2, 3, or more compared to > Yacc. However, as pure parsing takes only 5 to 10 % of the whole > compilation time, it is not the bottle neck and does not matter too much. > Scanning is much more time critical, as it takes 30 to 50 %. Therefore Lex > can not be used for high speed compilers. Scanner generators like Rex and > flex are up to 5 times faster. I have been playing around with gcc (the NeXT Objective C version) and have found that by giving a -Q command-line option gcc will tell you how much time it spends in each section of compilation. Without optimization turned on, it seems to spend most its time parsing code. gcc doesn't use Lex, and it is still much slower than Turbo C or Think C. Here is the result of compiling a 149 line program. I would like to speed GCC up. Bison is used when compiling GCC and not YACC. Would switching to another parsing tool give me the speed-up I desire? A FAST FREE compiler would sure complement a FREE, slow optimizing compiler. Besides gcc is the fastest compiler available for the NeXT. -Mike handel> cc -Q -c CardDrawView.m _1_CardDrawView _2_CardDrawView _3_CardDrawView _4_CardDrawView time in parse: 5.939384 time in integration: 0.000000 time in jump: 0.000000 time in cse: 0.000000 time in loop: 0.000000 time in flow: 0.188038 time in combine: 0.000000 time in local-alloc: 0.000000 time in global-alloc: 0.246630 time in final: 0.168993 time in varconst: 0.281626 time in symout: 0.000000 time in dump: 0.000000 handel> wc CardDrawView.m 149 249 2690 CardDrawView.m [I suspect that there is considerably more time spent passing code through the assembler and linker than is spent by yacc. Keep in mind that the yacc code switches out to action routines that build parse trees and symbol tables, and such routines will be present no matter how you parse. -John] -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
moss@cs.umass.edu (Eliot Moss) (11/26/90)
Folks, what is this LL(0) stuff? You need at least one token lookahead to choose the correct production in LL or recursive descent parsing, so LL(k) makes sense only for k > 0. -- J. Eliot B. Moss, Assistant Professor Department of Computer and Information Science Lederle Graduate Research Center University of Massachusetts Amherst, MA 01003 (413) 545-4206; Moss@cs.umass.edu -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
aycock@cpsc.ucalgary.ca (John Aycock) (11/27/90)
| Reply-To: moss@cs.umass.edu (Eliot Moss) | | Folks, what is this LL(0) stuff? You need at least one token lookahead to | choose the correct production in LL or recursive descent parsing, so LL(k) | makes sense only for k > 0. Not so. Consider the following grammar: S -> A B C A -> a B -> b C -> c No lookahead is required, yet a parser can predict that the first token will be an 'a', the second a 'b', and so on. As an example of where a grammar like this could be applied, I recently wrote an EBNF->BNF converter for a compiler class that took input files of the form: <file> -> <terminal-section> <nonterminal-section> <start> <rules> . <terminal-section> -> TERMINALS <terminal-list> END . <nonterminal-section> -> NONTERMINALS <nonterminal-list> END . &c... Admittedly, a rather rigid and restrictive format, but it suited the purpose nicely. :ja -- John D. Aycock aycock@cpsc.ucalgary.ca (403) 285-8727 [Justin Graver <graver@ufl.edu> send a similar message. -John] -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.