cowan@marob.masa.com (John Cowan) (09/05/90)
I'm interested in parsing a language that is not LALR(1) using YACC. The general approach taken in the past (by others) is to handle the non-LALR(1) portions (which are fairly simple) by writing a hard-coded "intermediate stage" between the lexer and the parser. This intermediate stage filters the stream of lexer tokens and uses recursive descent to detect the problem combinations and pack them up into single tokens which are then passed to the parser. It seems to me that it wouldn't be difficult to implement this intermediate stage using YACC as well. In this version, the lexer would pass its tokens to the first-stage YACC grammar, which would pass most of them unchanged. Sequences that needed it would be reduced as above. The resulting stream would then go to the second-stage YACC grammar, which would do most of the parsing work. What I need to know is, how does one write a set of YACC actions that cause the resulting grammar to emulate yylex? I understand the preprocessor tricks that allow two different YACC grammars to co-exist without name collision. What I don't understand is how to make the generated parser of the first stage "return" tokens to its caller. It seems to require generalized co-routines, which of course C does not provide. The operating system is Unix System V. Has anyone tried to solve this sort of problem? Please send comments directly to me, and I will summarize them to the net. I have set followups to "poster". -- cowan@marob.masa.com (aka ...!hombre!marob!cowan) [I suppose you could run it as two processes connected by a pipe, the Unix version of coroutines. -John] -- Send compilers articles to compilers@esegue.segue.boston.ma.us {ima | spdcc | world}!esegue. Meta-mail to compilers-request@esegue.
mike@relay.EU.net (Michael K. Gschwind) (09/11/90)
In article <26E50E36.34BE@marob.masa.com> you write: >What I don't understand is how to make the generated parser of the first >stage "return" tokens to its caller. It seems to require generalized >co-routines, which of course C does not provide. The operating system is >Unix System V. No, you don't want to do this. Rather, I would queue the tokens, insert magic tokens (which make the grammar LALR(1)) and after the first stage call the second. Sequence works fine as long as there is no interaction. In the above scenario, there is no need for interaction. BTW, in many cases, lex is powerful enough to make a grammar lalr(1) with right context rules. You want to look into these. If these aren't sufficient, you could use start states as well. E.g., if you have to differentiate between function definitions, function declarations and variable dfinitions in C, you could use: %% {identifier}/\([^)]\) return FUNCTION_DEFINITION; {identifier}/\([^)]\); return FUNCTION_DECLARATION; {identifier} return VARIABLE; %% bye, mike Michael K. Gschwind mike@vlsivie.at Institute for VLSI-Design mike@vlsivie.uucp Technical University, Vienna e182202@awituw01.bitnet Voice: (++43).1.58801 8144 Fax: (++43).1.569697 -- Send compilers articles to compilers@esegue.segue.boston.ma.us {ima | spdcc | world}!esegue. Meta-mail to compilers-request@esegue.