johnl@ima.UUCP (Compilers mailing list) (12/06/86)
I've always been under the impression that YACC and friends produce
code/tables which are somewhat worse than could be done with a hand-built
parser (ie. usually recursive-descent).
Recently, I've tried to parse the C declaration syntax using recursive-descent
and have found that it's a fairly difficult task. I find a need to delay
parsing decisions until much later in the parse. For example, you don't know
whether you're parsing a function definition until you've seen a left
brace - up to that point, it could be anything from a declaration of a
variable (up until the left paren) to an external function definition.
I'm still reluctant to use a straight LALR table parser for reasons of
(ahem) efficiency. Things like expressions & statements are wonderful
candidates for recursive-descent. Lists of thingies also fall out into
simple code:
thingie-list ::= thingie | thingie ',' thingie-list
turns into:
thingie ();
while (inputChoice (COMMA)) {
thingie ();
}
without the need for interpreting table entries.
Has anybody successfully married LR parsing with recursive-descent
techniques? Are there any other clean tricks/techniques I could use?
thanx,
Paul Tarvydas
Tarvydas-Sanford Controls Inc.
...{utcs|utzoo|...}!utcsri!tarvydas
[Well, actually, I've never seen a compiler where the parser was the
particularly slow part. More typically you get slowness either in character
processing in the lexer or else in slow table searches in the code gen.
If you insist, I see no reason why you couldn't define EXPR as a token in
the yacc parser and, in contexts where an expression is legal, sneak off to
an operator precedence or recursive descent parser. But I really don't see
the point. Yacc certainly has its problems, but excessively slow parsing has
never seemed to be one of them. -John]
--
Send compilers mail to ima!compilers or, in a pinch to Levine@YALE.EDU
Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | bbncca}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers. Meta-mail to ima!compilers-request
johnl@ima.UUCP (12/09/86)
LR parsing implies bottom up parsing. The standard trick in LL (recursive descent) parsing is to delay doing something about a token if it could be one of several things. Basically you are rewriting your grammar so that it is LL(1) instead of LL(k), k > 1, by left factoring. For example: statement -> identifier := expression -> identifier ( paramlist ) is not LL(1). But when you factor it out thus: statement -> identifier stmttail stmttail -> := expression -> ( paramlist ) Now one token of lookahead suffices. Ken [This is true, but in real recursive descent compilers that I have seen, it's as likely that you parse these things by kludgery as by adding productions. Most recursive descent compilers are written by hand, so you'd parse the first syntax by eating the identifier, remembering it, eating the next token, and then going ahead with the appropriate construction. But I'd rather let yacc do the work for me. Yacc can parse anything with enough help -- I have written a yacc parser with a moderate number of kludges that correctly parses Fortran 77, a language in which a statement can't be tokenized until you know what kind of statement it is. -John] -- Send compilers mail to ima!compilers or, in a pinch to Levine@YALE.EDU Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | bbncca}!ima Please send responses to the originator of the message -- I cannot forward mail accidentally sent back to compilers. Meta-mail to ima!compilers-request
johnl@ima.UUCP (12/09/86)
Paul Tarvydas writes: >I'm still reluctant to use a straight LALR table parser for reasons of >(ahem) efficiency. Things like expressions & statements are wonderful >candidates for recursive-descent. > ... >Has anybody successfully married LR parsing with recursive-descent >techniques? Are there any other clean tricks/techniques I could use? At the most recent Compiler Construction Conference (SigPlan notices, July 1986), Thomas Pennello presented a technique for "compiling" the parse-tables directly into (assembly) code, which apparently speeds up the parsing process by a factor of 6-10. >[But I really don't see the point. Yacc certainly has its problems, but >excessively slow parsing has never seemed to be one of them. -John] In his conclusions, Pennello also echoes this opinion: "Trying to speed up an LR parser may seem like beating as dead horse, since LR parsers are already reasonably fast." Still, I think the paper is worth reading. Steve Vegdahl Computer Research Lab Tektronix Labs Beaverton, Oregon -- Send compilers mail to ima!compilers or, in a pinch to Levine@YALE.EDU Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | bbncca}!ima Please send responses to the originator of the message -- I cannot forward mail accidentally sent back to compilers. Meta-mail to ima!compilers-request
johnl@ima.UUCP (12/10/86)
In article <282@ima.UUCP> Paul Tarvydas <toronto.edu!tarvydas@csri> writes: >Recently, I've tried to parse the C declaration syntax using recursive-descent >and have found that it's a fairly difficult task. It is difficult regardless of the parsing technique used. Using LR (or LALR), rather than LL, parsing helps slightly. C's declaration syntax is a mess. >I'm still reluctant to use a straight LALR table parser for reasons of >(ahem) efficiency. [...] Lists of thingies also fall out into >simple code: > > thingie (); > while (inputChoice (COMMA)) { > thingie (); > } I doubt this is more efficient than a table driven parser. The cost of the call to "inputChoice" is at least 3 instructions (say 10 bytes or more). A parsing table could encode this in 2 bytes (2 bits for the action (shift/reduce/accept), leaving 14 bits for the symbol). The execution time of the table driven parser should be at least equal to, if not faster than the above code. The parser's code to check that the current symbol equals the value in the table would have similar speed to the code in the "inputChoice" routine, but without the overhead of the procedure call (register saving/restoring, allocating local vars, etc). My experience is that table driven parsers (whether LL or LR) are smaller and faster than recursive descent. They also offer the significant advantage of allowing automatic syntax error correction/recovery. Doing this in a procedural recursive-descent parser requires passing recovery sets to each parsing procedure, as described in Hartmann's book on the Concurrent Pascal compiler, or Pemberton's paper in "Software - Practice & Experience" (around 1981). This slows the parser (even when parsing a syntactically correct program), and increases the stack space requirements. Recursive descent parsers have the advantages of being easy to write by hand, automatic allocation of temporary variables while analysing recursively defined structures (such as statements and expressions), and the ability to pass values down to lower level parsing routines. YACC's method to do the last two is much less clean, although a pre-processor such as Prep (van Katwijk, ACM SIGPLAN Notices, Oct 1983) helps. -- Send compilers mail to ima!compilers or, in a pinch to Levine@YALE.EDU Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | bbncca}!ima Please send responses to the originator of the message -- I cannot forward mail accidentally sent back to compilers. Meta-mail to ima!compilers-request
johnl@ima.UUCP (Compilers mailing list) (12/11/86)
I recently wrote a C interpreter (which of course must parse also). I found the particular problem mentioned to not be that bad. The declaration of a function (or a variable) and the actual definition of a function was handled by the same code. Roughly I had a routine which I called declaration. This called something which picked up the declarative part (like the int i or the char *p(arg)). I put the list of arguments in a place to be found later by `declaration'. Then I would look to see if I found a function and if I was not looking at a semi-colon. If that was the case, I assumed I had a function definition and parsed it. The part I found to be tricky was the code which picked up the `int i' or the `char *p[4](arg)'. It turns out that this is not very recursive. I did this in a loop which continued until I did not see anything I liked. In that look I picked up the *'s, followed by a name, followed by the []'s or ()'s for functions. The ()'s used for grouping was the reason for the loop. If you want more details, I could send you part of the code. However, I would like to end with a statement about yacc. As was pointed out, the problem with yacc is not the speed nor the size. It is rather difficult to write a good grammar for C which yacc will accept and still provides useful screening of illegal input. I have written one which correctly knows about lvalue's. It knows the difference between functions and pointers to functions so that the items inside a struct (for example) are always pointers to functions and not functions. It also has 0 shift/reduce conflicts (as reported out of yacc). All the other grammars I have looked at fail miserably in these reguards. The code produced (tables and driver) is 8K I think. (The reason I did not use it was because of legal problems with ATT which I still have not heard a definate answer on.) -- Perry Smith {convex!ctvax,{ti-csl,infotel}!pollux}!bobkat!pedz -- Send compilers mail to ima!compilers or, in a pinch to Levine@YALE.EDU Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | bbncca}!ima Please send responses to the originator of the message -- I cannot forward mail accidentally sent back to compilers. Meta-mail to ima!compilers-request
johnl@ima.UUCP (Compilers mailing list) (12/17/86)
> ... Doing [automatic error recovery] in a procedural recursive-descent > parser requires passing recovery sets to each parsing procedure... > This slows the parser (even when parsing a syntactically correct program), > and increases the stack space requirements... On the contrary, doing automatic error recovery in recursive descent does not require this, although one particular error-recovery algorithm may. (I would add that if you are implementing the usually-sparse recovery sets as bitmaps and passing them by value, as one might all-too-easily do in Pascal, it's not surprising that the result is slow and bulky.) Building automatic error recovery into a recursive-descent parser is not difficult or inefficient if you think hard and do it right; I speak from experience with a C parser, not exactly a favorable case. While I tend to agree that table-driven parsers are likely to be a lot smaller and possibly somewhat faster, the *practical* differences in error recovery are not significant. (Why was I doing recursive-descent parsing if I think table-driven is better, you ask? Partly because of some of the nice practical conveniences of recursive descent, partly because the particular project was supposed to be highly portable and hence I didn't want to be dependent on a big parser generator that I couldn't export freely. I tinkered with hand-generated tables for a bit, and I think that approach could have been made to work acceptably, but it was taking too long to get all the details right.) Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,decvax,pyramid}!utzoo!henry -- Send compilers mail to ima!compilers or, in a pinch to Levine@YALE.EDU Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | bbncca}!ima Please send responses to the originator of the message -- I cannot forward mail accidentally sent back to compilers. Meta-mail to ima!compilers-request