[mod.compilers] extended notations for grammars

johnl@ima.UUCP (Compilers mailing list) (07/13/86)

(Article from Dick Dunn = nbires!rcd)

This article moves off on a tangent from Guido van Rossum's article about
the Acta Informatica paper on ELL(1) notation and my followup to van
Rossum's article.

I think that the main significance of ELL(1) notation (or any other major
notational extension for programming-language grammars) is that it gives you
a form for the grammar which is simultaneously easier to read (and write)
and easier to process.  For example, it's unnatural for a human reader or
writer to turn simple iteration into recursion.  It is similarly a nuisance
for the parser generator to discover the types of recursion which can be
replaced by iteration by a transformation of the grammar.  The solution is
to introduce iteration as a primitive in the grammar notation; don't make
your notation the bottleneck.

The same argument can be applied to other common constructs.  For example,
a common language form is a list with separators.  The RE form of this is
	A (B A)*
where A is the list item and B is the separator.  This is a nuisance if A
is not a single item; it may force you to introduce a nonterminal just to
make it a simple item.  Why not introduce into the grammar notation a form
for "this separated by that"?

Other useful notation:
	- Option--replace  (A|)  (read "A or nothing") with  [A].  This
	  gets rid of the funny "nothing" that you can't see and is
	  common in other notation anyway.
	- Iteration, one or more--replace  A A*  with  A+.  As with the
	  separated list, this eliminates the double appearance of the item
	  in the rule.

At first glance it may seem that having an item appear twice in a rule is
not that big a deal.  However, if you have semantics that have to be
attached to items, things get cluttered much faster--and the number of
entries in the parse tables increases in direct correspondence to the
"clutter" in the rules.

The sense of the notational changes is to make the form of a language defi-
nition more like a programming language (at a higher level).  They allow
a much more readable, compact notation.  With a little care, they don't
complicate the parser generation process any.  In fact, they may simplify
it since the parser generator need not search for "optimizations" to remove
non-essential recursion, fold duplicate states, and the like.
---
Dick Dunn	{hao,ucbvax,allegra}!nbires!rcd		(303)444-5710 x3086
   ...Simpler is better.
-- 
Send compilers mail to ima!compilers or, in a pinch to Levine@YALE.EDU
Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | bbncca}!ima

Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request