johnl@ima.UUCP (09/09/87)
This is a summary of the responses to my recent USENET comp.compilers poll on Yacc. 1. Straw issues: the form of Yacc's input, the limitations of its syntax/semantic interface, the incomprehensibility of the C output, the size of the generated parser, and debugging difficulties. These drew few or no comments. (Frank DeRemer did point out years ago that LR tables, being randomly accessed, can disturb a paging system more than a recursive-descent parser, which has better locality of reference.) 2. Speed of the generated parser: the consensus was that there is no problem, although it certainly is possible to hand-write faster parsers if required. Those interested are referred to Tom Penello's recent TOPLAS paper, for recursive-descent and LR(1) implementation possibilities. 3. Generic objections to LR parsing: Steve Stevenson (steve@hubcap.clemson.edu) contributed an interesting discussion. (a) it is possible (desirable?) to unite the lexical and syntax grammars (b) he personally wants a 'pipe of parsers' (c) you may not want to use the LR algorithm to gen. the parser. Steve has modified his 'yacc' to do some of these things. Occasionally, papers surface with titles like the recent SIGPLAN 'Are LR parsers too powerful?', which I personally found entirely content-free. One poll respondent claimed that LR parsing is a solution looking for a problem, again without real justification. 4. Generic preference for other techniques: lots of people like recursive-descent, as indeed do I, but I don't like living with the result. I haven't lived with a Yacc result yet ... Nobody seemed keen on ad-hoc techniques, and nobody even mentioned McKeeman/Horning MSP, Floyd production systems, or the other antecedents. A long debate ensued on Yacc's error recovery, agreeing that it was poor. I can provide three causes for this: (a) Parsers with LALR(1) (as opposed to LR(1)) tables sometimes reduce too soon, making context-dependent recovery more difficult. (b) System V Yacc treats states where nothing except <error> can be shifted as full LR(1) states, not merging them, preventing (a). BSD Yacc, which I would guess most folks are using, merges them, causing (a). (c) There is no good advice or folklore as to how to use Yacc error rules; the Yacc manual implies that you should be content with a simple-minded panic-mode: stmnt : error ';' This appears to have been followed fairly literally in pcc, which has only 8 error productions. The best suggestions on Yacc error productions I have seen are in 'Introduction to Compiler Construction with Unix' by Schriener & Friedman (Prentice-Hall 1985) (a book of otherwise limited usefulness). Without infringing their copyright, they say that error productions should be associated with each repetitive construct, showing how for the three possible cases (optional sequence, non-optional sequence and separated list). Doing this systematically allows Yacc to recover to almost arbitrarily close restart points. It still won't attempt error *repair*, i.e. insertion or substitution of symbols, but then there is another debate as to whether compilers should attempt to second-guess programmers in this way, or simply catch as many of their errors as possible. See also the April 1987 TOPLAS page 164, for 'A Practical Method for LR and LL Syntax Error Diagnosis and Recovery', a promising-looking scheme involving delayed reductions. There is yet another debate as to whether compilers should say 'syntax error' or things like 'ELSE without IF'. One is uninformative, while the other describes the symptom, not the problem. Perry Smith a.k.a. (Pedz Thing) (pedz@bobkat) wrote "the shift/reduce conflict resolution could be more intelligent. In determining a shift/reduce conflict, I think it needs to look up the parse tree and see what productions call the productions in conflict. Usually, the problems could be resolved by looking the ``parent'' productions." He gave these examples: (a) statement : stuff statement | stuff statement ELSE statement ; which gives an S/R conflict, and (b) statement : stuff thing ; thing : statement | statement ELSE statement ; which doesn't. I found this comment - and the examples - most surprising. LR(1) analysers including Yacc do indeed 'look back up the parse tree', in fact they look all over it. However, in the above (a) is of course its own 'parent' production, as it contains recursion. It is entirely clear that (a) is just plain ambiguous, because of having > 1 recursion. Even in recursive descent you'd have to make ELSE left-associative (by testing for it first) if you coded directly from (a). Respondents also went for Yacc for reasons of formal correctness, productivity, maintenance, ... Incidentally, we went with Yacc. Of course it ain't easy for Cobol, which is neither regular, context-free, nor LR(1). I can discuss some of these problems further: inquiries via e-mail please. Thanks to all who responded, and to the moderator for making it possible. -- Esmond Pitt, Austec International Ltd ...!seismo!munnari!ausmelb!ejp,ejp@ausmelb.oz.au -- Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.ARPA Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | cca}!ima Please send responses to the originator of the message -- I cannot forward mail accidentally sent back to compilers. Meta-mail to ima!compilers-" w" w"
johnl@ima.UUCP (09/14/87)
I was mis-quoted in the Yacc poll and would like to clarify. In the example, a) statement : IF ( blah ) statement | IF ( blah ) stateement ELSE statement b) statement : IF ( blah ) thing thing : statement | statement ELSE statement version a does NOT produce a conflict while version b DOES. (This assumes that ELSE has been declared as the lowest precedence operator and has left associtivity.) This is opposite of that the article said. This indicates that the construct can be resolved intelegently if the conflict from version b is looked upon more as a macro substitution rather than a significant production. I do not know why this does not happen in the first place since all of the information about the precedence, etc is carried along. In terms of pratical problems, version b is often desirable to reduce code size and the amount of repeated code. -- Perry Smith a.k.a. (Pedz Thing) pedz@bobkat or {ti-csl,infotel}!pollux!bobkat!pedz -- Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.ARPA Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | cca}!ima Please send responses to the originator of the message -- I cannot forward mail accidentally sent back to compilers. Meta-mail to ima!compilers-request
johnl@ima.UUCP (09/21/87)
In article <710@ima.ISC.COM> Pedz Thing <sun!texsun!pollux!bobkat!pedz@ucbvax.Berkeley.EDU> writes: >I was mis-quoted in the Yacc poll and would like to clarify. In the >example, > >a) statement : IF ( blah ) statement | IF ( blah ) stateement ELSE statement >b) statement : IF ( blah ) thing > thing : statement | statement ELSE statement > >version a does NOT produce a conflict while version b DOES. (This >assumes that ELSE has been declared as the lowest precedence operator >and has left associativity.) This is opposite of what the article said. Sorry for misquoting. However I have since checked out Perry's assertions. In fact, both (a) and (b) produce s/r conflicts. (See PS.) This isn't the fault of 'yacc', or LALR, or LR(k) theory. The IF/ELSE construction, when stated as baldly as either of the above, really is ambiguous! Basically, Perry doesn't seem to want formal methods to work formally: >This indicates that the construct can be resolved intelligently if the >conflict from version b is looked upon more as a macro substitution >rather than a significant production. I do not know what this means. In any case how could yacc know which productions are 'significant' to their author and which are 'more ... macro substitutions'? In any case, for those who may prefer not to fiddle with yacc's precedences and associativities, the ambiguity is resolvable with a balanced_if construction as shown in many textbooks. P.S. Just for the record, here is my (a).y: %token IF %token ELSE %left ELSE %token ID %% statement : id '=' id ';' /* trivia for reduction to terminal */ | IF '(' ID ')' statement | IF '(' ID ')' statement ELSE statement ; -- Esmond Pitt, Austec International Ltd ...!seismo!munnari!ausmelb!ejp,ejp@ausmelb.oz.au [As they say in Computing Reviews, this ends the skirmishing, on this topic at least. -John] -- Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.ARPA Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | cca}!ima Please send responses to the originator of the message -- I cannot forward mail accidentally sent back to compilers. Meta-mail to ima!compilers-request