[comp.compilers] Recursive Descent Parsers and YACC

melling@psuvax1.cs.psu.edu (Michael D Mellinger) (11/16/90)

Can someone give me an estimate on how much faster parsing can be made by
writing a recursive-descent parser instead of using Yacc and Lex?  Is there
enough of a difference to consider using a RDP in a commercial C compiler?
PC compilers seem to be very fast when compared to workstation compilers such
as GCC.  Is optimization(or lack of) the crucial difference in PC and
workstation compilation speeds?

-Mike
[It is my distinct impression that yacc parsers work fairly quickly.
Particularly on machines with slow procedure calls, they can be faster than
recursive descent.  Lex is slow, but flex uses newer techniques so that it
is usually as fast as a hand-coded lexer.  Unix compilers are slow because
they use a separate assembly pass and do more optimizations.  If you use,
e.g., MS C with all of its optimization options it's no speed demon, either.
-John]
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.

pardo@cs.washington.edu (David Keppel) (11/17/90)

melling@psuvax1.cs.psu.edu (Michael D Mellinger) writes:
>[How much faster is parsing with Recursive Descent instead of YACC?]

See also:

%A Frank G. Pagan
%T Comparative Efficiency of General and Residual Parsers
%J SIGPLAN Notices
%V 25
%N 4
%D April 1990
%P 59-68
%X * Concise and good introduction to partial evaluation
 * Focus on partial evaluation of parsers generating pure code.
 * Hand traslation of Pascal and C.
 * Time (2-10X faster) and size (0.5-1.25 larger).

%A Thomas J. Pennello
%T Very Fast LR Parsing
%J Proceedings of the SIGPLAN 1986 Symposium on Compiler Construction;
SIGPLAN Notices
%V 21
%N 7
%D July 1986
%P 145-151
%X * Partial evaluation of the table interpreter with resepct to each
element of the table.
 * Speedup: on a VAX-like machine, 40,000 to 500,000 lines per minute.
On an 80286, 37,000 to 240,000 lines per minute.
 * FSM converted to assembly language.
 * 2-4X increase in table size.

On a machine with fast procedure call and so forth, I GUESS that these
schemes win on the basis of better register allocation.  I don't know
of any fundamental big-oh differences between YACC-style and recursive
descent.

	;-D oN  ( YACCity-YACC -- Don't Parse Back )  Pardo
-- 
		    pardo@cs.washington.edu
    {rutgers,cornell,ucsd,ubc-cs,tektronix}!uw-beaver!june!pardo
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.

grimlok@hubcap.clemson.edu (Mike Percy) (11/17/90)

melling@psuvax1.cs.psu.edu (Michael D Mellinger) writes:

>Can someone give me an estimate on how much faster parsing can be made by
>writing a recursive-descent parser instead of using Yacc and Lex?  Is there
>enough of a difference to consider using a RDP in a commercial C compiler?

I would tend to believe that table driven parsers (YACC) are faster in
execution than recursive descent parsers.  The stack and function call
overhead alone can get to be pretty hairy, especially in deeply recursive
parses.

What is faster, in my experience, is the speed in which you can get a parser
running.  Trying to coax a grammer that YACC likes (conflict-free) is
downright a pain, but the recursive descent parser generators I've worked
on/with have much laxer restrictions on the grammar than LR(1).  In theory,
one could have a non-deterministic, backtracking RD parser (seen one or two
done in Prolog), but most generators at least require the grammar to be
deterministic (is this the same as LR(1)?

If you want to get a parser up quickly, use an Early's algorithm-based
parser.  It will run slow, but it runs the first time, giving you time to
massage the grammar for a faster parser type.

-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.

hankd@ecn.purdue.edu (Hank Dietz) (11/18/90)

In article <F7p?bk63@cs.psu.edu> melling@psuvax1.cs.psu.edu (Michael D Mellinger) writes:
>Can someone give me an estimate on how much faster parsing can be made by
>writing a recursive-descent parser instead of using Yacc and Lex?  ...

PCCTS (the Purdue Compiler-Construction Tool Set) builds LL(k) parsers
which average about 2-3x as fast as YACC's parsers.  The primary
difference between them is that LL stack management can be implemented
directly with machine instructions on most machines, whereas LR stack
handling requires an interpreter...  hence the 2-3x difference (for
example, it was about 2x for Pascal).

							-hankd
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.

henry@zoo.toronto.edu (Henry Spencer) (11/18/90)

In article <F7p?bk63@cs.psu.edu> melling@psuvax1.cs.psu.edu (Michael D Mellinger) writes:
>Can someone give me an estimate on how much faster parsing can be made by
>writing a recursive-descent parser instead of using Yacc and Lex?  Is there
>enough of a difference to consider using a RDP in a commercial C compiler?
>...
>[It is my distinct impression that yacc parsers work fairly quickly.
>Particularly on machines with slow procedure calls, they can be faster than
>recursive descent...

Of note here is the recent EUUG paper from Ken Thompson on the Plan 9 C
compiler.  (This is a small, fast compiler generating quite good code.)
He uses a yacc parser, but says that the compiler spends a lot of time
parsing and he thinks a recursive-descent parser would be substantially
faster, now that the compiler and the language (which has some changes
from orthodox C) have settled down.

He also notes that using the manufacturer's fancy "call" instruction is
usually a bad idea, since a stripped-down one (fortunately both the VAX
and the NS32k series have such) is generally much faster.  His assessment
of relative speeds probably assumes an efficient call sequence.
-- 
Henry Spencer at U of Toronto Zoology
henry@zoo.toronto.edu   utzoo!henry
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.

Bruce.Hoult@actrix.co.nz (Bruce Hoult) (11/18/90)

In article <F7p?bk63@cs.psu.edu> melling@psuvax1.cs.psu.edu (Michael D Mellinger) writes:
> Can someone give me an estimate on how much faster parsing can be made by
> writing a recursive-descent parser instead of using Yacc and Lex?  Is there
> enough of a difference to consider using a RDP in a commercial C compiler?

Bjarne Stroustrup is on record as saying that he regrets having used yacc to
generate the parser in CFront (AT&T's C++ compiler), not because of any
difference in parsing speed, but because of the extreme difficulty of
providing meaningful syntax errors with yacc, compared to using RD.

Walter Bright used RD in Zortech C++, and it's no slouch in compile speed.

-- Bruce Hoult
   Bruce.Hoult@actrix.gen.nz
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.

dave@labtam.labtam.oz.au (David Taylor) (11/20/90)

grimlok@hubcap.clemson.edu (Mike Percy) writes:

>What is faster, in my experience, is the speed in which you can get a parser
>running.  Trying to coax a grammer that YACC likes (conflict-free) is
>downright a pain, but the recursive descent parser generators I've worked
>on/with have much laxer restrictions on the grammar than LR(1).  In theory,
>one could have a non-deterministic, backtracking RD parser (seen one or two
>done in Prolog), but most generators at least require the grammar to be
>deterministic (is this the same as LR(1)?

Recursive descent parser == LL(0)

LR(1) is LESS restrictive.

You can still use a grammar with yacc that has some conflicts as it
has some simple rules for conflict resolution.  Some of these
recursive descent parser generators that you mention probably use
similar rules for conflict resolution but don't bother to tell you
about their use.

Look up some of the compiler theory ... it helps when you're trying to
design the grammar.
--
David Taylor                             Labtam Information Systems Pty. Ltd.
Work:  (03) 587-1444                     UUCP:  !uunet!munnari!labtam.oz!dave
FAX:   (03) 580-5581                     Internet:          dave@labtam.oz.au
Home:  (03) 857-5660 (occasionally)

-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.

dhw@uunet.UU.NET id AA00722; Tue, 20 Nov 90 15:41:42 -0500 (David H. West) (11/21/90)

In article <11678@hubcap.clemson.edu> grimlok@hubcap.clemson.edu (Mike Percy) writes:
>the recursive descent parser generators I've worked on/with  [...]

>If you want to get a parser up quickly, use an Early's algorithm-based
>parser.  

Does anyone know of any of the above kinds of parser freely available in
source?

-David West      dhw@iti.org
[Quite a while ago I used an Earley parser in IMP-10, but I haven't seen
one lately.  Personally, I think the effort spent in pushing a grammar into
LALR or LL(k) is usually worth it, one often finds surprising ambiguities.
-John]
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.

grosch@gmdka.uucp (Josef Grosch) (11/22/90)

Michael D Mellinger writes:
[Can someone give me an estimate on how much faster parsing can be made by
writing a recursive-descent parser instead of using Yacc and Lex?]

I did a comparison of generated parsers on a SUN 3 with MC 68020 processor
(excluding scanning):

tool                                      speed (tokens per second)

Yacc                                                  16,000
Lalr - our LALR(1) parser generator                   35,000
Ell  - our LL(1) recursive descent parser generator   55,000

Pure parsing can be made faster by a factor of 2, 3, or more compared to Yacc.
However, as pure parsing takes only 5 to 10 % of the whole compilation time,
it is not the bottle neck and does not matter too much.
Scanning is much more time critical, as it takes 30 to 50 %.
Therefore Lex can not be used for high speed compilers.
Scanner generators like Rex and flex are up to 5 times faster.

Josef Grosch
---
grosch@gmdka.uucp
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.

grimlok@hubcap.clemson.edu (Mike Percy) (11/24/90)

dave@labtam.labtam.oz.au (David Taylor) writes:

>grimlok@hubcap.clemson.edu (Mike Percy) writes:
[some of my comments WRT RD parsers]

>Recursive descent parser == LL(0)
My references mention only that RD parsers are used with "extended-tree
grammars," which are then defined.  Perhaps I'm using the worng
references, or have missed something, but I never noticed that RD ==
LL(0).

>LR(1) is LESS restrictive.
No argument there.  

>You can still use a grammar with yacc that has some conflicts as it
>has some simple rules for conflict resolution.  Some of these
>recursive descent parser generators that you mention probably use
>similar rules for conflict resolution but don't bother to tell you
>about their use.
 
Actually, doesn't yacc and its ilk use LALR?  LALR can introduce
conflicts that aren't in LR.  As for conflict resolution methods, I'm
sure that they all do similar things.
  
>Look up some of the compiler theory ... it helps when you're trying to
>design the grammar.

It would also help if the language designers would make life a little
easier... As for the theory, I do look it up.  RD parsers weren't even
mentioned in compiler class, except by me.  I was told we weren't going
to look at RD parsers.  LL was defined, but not explored like LR and
LALR were.  

Mike Percy                    grimlok@hubcap.clemson.edu
ISD, Clemson University       mspercy@clemson.BITNET
(803)656-3780                 mspercy@clemson.clemson.edu
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.

jsp@milton.u.washington.edu (Jeff Prothero) (11/24/90)

grosch@gmdka.uucp (Josef Grosch) writes:
>I did a comparison of generated parsers on a SUN 3 with MC 68020 processor
>(excluding scanning):
>
>tool                                      speed (tokens per second)
>
>Yacc                                                  16,000
>Lalr - our LALR(1) parser generator                   35,000
>Ell  - our LL(1) recursive descent parser generator   55,000
>
>Pure parsing can be made faster by a factor of 2, 3, or more compared to Yacc.

Note that "LALR(1) parser generator" doesn't have to mean "table driven".
Don't have the reference handy, but someone (I believe MetaWare's Tom
Penello?) has obtained big speedups by using procedural implementations of
the LALR(1) tables.  Robert Henry has done much the same thing with code
generation tables in CODEGEN (UW technical report), and observed that table
compression and lookup techniques are essentially application-independent,
and that a general tool could be developed to factor, compress, and
implement tables, including generating fast procedural implementations.

Seems to me procedural table implementations bring us full circle: We
started with fast random code, then switched to slower table-driven code as
an aid to portability and code maintainablility. Now that we can generate
the unportable, unmaintainable code automatically from declarative
specifications, the speed consideration is coming to the fore again. No?

Jeff Prothero
jsp@u.washington.edu
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.

melling@psuvax1.cs.psu.edu (Michael D Mellinger) (11/26/90)

In article <9011221247.AA18998@gmdka.uucp> grosch@gmdka.uucp (Josef Grosch) writes:
> Pure parsing can be made faster by a factor of 2, 3, or more compared to
> Yacc.  However, as pure parsing takes only 5 to 10 % of the whole
> compilation time, it is not the bottle neck and does not matter too much.
> Scanning is much more time critical, as it takes 30 to 50 %.  Therefore Lex
> can not be used for high speed compilers.  Scanner generators like Rex and
> flex are up to 5 times faster.

I have been playing around with gcc (the NeXT Objective C version) and have
found that by giving a -Q command-line option gcc will tell you how much
time it spends in each section of compilation.  Without optimization turned
on, it seems to spend most its time parsing code.  gcc doesn't use Lex, and
it is still much slower than Turbo C or Think C.  Here is the result of
compiling a 149 line program.  I would like to speed GCC up.  Bison is used
when compiling GCC and not YACC.  Would switching to another parsing tool
give me the speed-up I desire?  A FAST FREE compiler would sure complement a
FREE, slow optimizing compiler.  Besides gcc is the fastest compiler
available for the NeXT.

-Mike

handel> cc -Q -c CardDrawView.m
 _1_CardDrawView _2_CardDrawView _3_CardDrawView _4_CardDrawView
time in parse: 5.939384
time in integration: 0.000000
time in jump: 0.000000
time in cse: 0.000000
time in loop: 0.000000
time in flow: 0.188038
time in combine: 0.000000
time in local-alloc: 0.000000
time in global-alloc: 0.246630
time in final: 0.168993
time in varconst: 0.281626
time in symout: 0.000000
time in dump: 0.000000

handel> wc CardDrawView.m
     149     249    2690 CardDrawView.m
[I suspect that there is considerably more time spent passing code through
the assembler and linker than is spent by yacc.  Keep in mind that the yacc
code switches out to action routines that build parse trees and symbol tables,
and such routines will be present no matter how you parse. -John]
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.

moss@cs.umass.edu (Eliot Moss) (11/26/90)

Folks, what is this LL(0) stuff? You need at least one token lookahead to
choose the correct production in LL or recursive descent parsing, so LL(k)
makes sense only for k > 0.
--
		J. Eliot B. Moss, Assistant Professor
		Department of Computer and Information Science
		Lederle Graduate Research Center
		University of Massachusetts
		Amherst, MA  01003
		(413) 545-4206; Moss@cs.umass.edu
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.

aycock@cpsc.ucalgary.ca (John Aycock) (11/27/90)

| Reply-To: moss@cs.umass.edu (Eliot Moss)
|
| Folks, what is this LL(0) stuff? You need at least one token lookahead to
| choose the correct production in LL or recursive descent parsing, so LL(k)
| makes sense only for k > 0.

Not so.  Consider the following grammar:

S -> A B C
A -> a
B -> b
C -> c

No lookahead is required, yet a parser can predict that the first token will
be an 'a', the second a 'b', and so on.  As an example of where a grammar like
this could be applied, I recently wrote an EBNF->BNF converter for a compiler
class that took input files of the form:

<file> -> <terminal-section> <nonterminal-section> <start> <rules> .
<terminal-section> -> TERMINALS <terminal-list> END .
<nonterminal-section> -> NONTERMINALS <nonterminal-list> END .
   &c...

Admittedly, a rather rigid and restrictive format, but it suited the purpose
nicely.
:ja
--
John D. Aycock		aycock@cpsc.ucalgary.ca
(403) 285-8727
[Justin Graver <graver@ufl.edu> send a similar message. -John]
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.