[comp.compilers] Looking for parsing references

sankar@Neon.Stanford.EDU (Sriram Sankar) (06/19/91)

Hello!

I'm interested in references/pointers and also your views and opinions
on the following:

1. Tools/algorithms for parsing of general context free grammars

2. LL-parsing tools/algorithms

3. Advantages and disadvantages of LL and LR parsing as compared to each
   other.

I'll post a summary if there is a good response and sufficient interest.

Please mail your response to sankar@cs.stanford.edu even if you think
your contribution is trivial.

Thank you very much.

Sriram.
[The only algorithm I know of for parsing general context-free grammars is
Earley's.  LL and LR(n) algorithms are too weak. -John]
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.

mauney@eos.ncsu.edu (Dr. Jon Mauney) (06/20/91)

>I'm interested in references/pointers and also your views and opinions
>on the following:>
>
>1. Tools/algorithms for parsing of general context free grammars

%A Susan L. Graham
%A Michael A. Harrison
%A Walter L. Ruzzo
%T An improved context-free recognizer
%J TOPLAS
%V 2
%N 3
%P 415-462
%D July 1980
Describes a general parsing algorithm, including a good discussion of the
relationship between their algorithm, Earley's, and the Cocke-Kasami-Younger
algorithm.

>2. LL-parsing tools/algorithms

The standard compiler texts do a pretty good job.
For gory detail on LL, including how to test language-equivalence,
see Aho-Ullman "The Theory of Parsing, Translation, and More Parsing" :-)
volume 2.

The LL(1) parser generator with syntax-error handling described in
Fischer&LeBlanc "Crafting a Compiler [in C]" is available, as is (I believe)
the one described by Holub in his book.

>3. Advantages and disadvantages of LL and LR parsing as compared to each
>   other.

The LL(1) parsing algorithm is simpler, and so perhaps easier to understand
and implement.  LL(1) tables tend to be a little smaller and you should be
able to make the parser loop a hair faster than LR and its popular variants.
But parsing speed is not usually an important factor, and these days the
table size should be irrelevant.

The form of LL(1) grammars is more restrictive: no left-recursion, no common
prefixes.  So it is easier to bash a grammar into LALR form than LL.  But
once you know what LL(1) requires, it really is not difficult to make most
grammars LL(1).  Mostly it is a matter of putting things into an idiomatic
form; and people who program in C should have no problems with
strange-looking idioms.

Once a grammar is LL, calls to semantic actions can be placed anywhere.  In
LR, actions are normally associated with reductions only.  LR parsers needs
must restrict the placement of semantic actions, since in some cases it is
not clear which production is being matched, and therefore which action
should be done.  These are exactly the places where LL(1) requires you to
factor out the common prefixes.

Recovery from syntax errors is easier with LL(1), because the information on
the stack is straightforward: a list of the grammar symbols expected to match
the rest of the input.  with LR methods, the stack cleverly encodes
information in states, and you must cleverly decode the states in order to
figure how to repair the error.  A question that periodically arises on the
net is "how do I produce a list of tokens that could be accepted from a given
parse stack in YACC?"  The answer is "very painfully", but for an LL(1)
parser it should be simple.  
-- 
Jon Mauney,  parsing partisan
Computer Science Dept.
N.C. State University
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.