[comp.theory] CFG-squared grammars: Your challenge of the day.

markh@csd4.csd.uwm.edu (Mark William Hopkins) (09/19/90)

   Most of you should know that a grammar defined with productions whose
right hand sides are allowed to be regular expressions over the set of
terminals and non-terminals is just a fancy Context-Free Grammar.

   Now ... let's define a Context-Free Expression (CFE)...

   CFE's use the alternation ("|"), concatenation ("."), and iteration ("*"),
just like regular expressions.  They have the null expression, and the 'fail'
expression (with null.X = X = X.null, fail.X = fail = X.fail, and
fail| X = X = X | fail).

   But in addition, they can be defined by recursive systems of equations
where each equation is of the form:

   Non-terminal = Regular-expression in Terminals and Non-terminals.

For instance:

   list(a, b, c, d) would be the solution, Z, to the system:

   Z = T (d T)*.   T = c | a Z b.

If a, b, c, and d are context-free expressions, then so is Z (and so is T).

   A CFG-squared grammar is a phrase structure grammar, where each production
is of the form:

		 NonTerminal -> CFE in Terminals and NonTerminals.

Here is your challenge:

[1] Develop an algorithm that converts each CFG-squared grammar into an
    equivalent CFG-grammar.

[2] Define a suitable extension of the LR(k) item-set construction for
    CFG-squared grammars.  You might be interested to know that this
    has been done for CFG-grammars where the right-hand sides are allowed
    to be regular expressions.  Use this definition to come up with an
    enhanced LR(k) parser-generator.

[3] Define LR(k) CFG-squared grammars in analogy to LR(k) CFG grammars.
    Develop a theory, write a book :).

If you actually come up with a parser generator algorithm, try it out on
this sample grammar:

Block -> list(Statement, null, '{', '}').
Statement -> 'v' '=' Expression.
Expression -> list('p', 'i', '(', ')').

You can see where all this is ultimately leading to: a new powerful AND
efficient parser-generator technology, that leads to efficiently
'generatable' concise grammar specifications...