[comp.compilers] Seeking parser generator topics

terry@cs.tamu.edu (Terry Escamilla) (08/25/90)

Keywords; parse, question, yacc

I am interested in designing a parser generator tool using object-oriented
approaches.  For example, one could completely write a parser generator in
Eiffel, regardless of the output language for the parser itself.  The
question, though, is how to come up with an interesting slant to the topic
so that it is sufficient for a dissertation.  Some ideas that I have
include:

- reusable components for creating similar parser generator tools,

- extending the concept of grammar inheritance proposed by
  Aksit, et. al., from University of Twente in the Netherlands, or

- making every component of the grammar an object with multiple attributes
  for grammar symbols (contrary to YACC), thus giving a very fine grain of
  reusability (though I don't know what one would do with this).

Another possibility is to take an OO approach for compiler specifications.
I am unfamiliar with current research in building a compiler which is
specified by its grammar and a formal semantics.  However, there may be
some potential for OOP in this approach as well.

Thanks for any ideas,

Terry Escamilla
terry@cs.tamu.edu
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

burley@world.std.com (James C Burley) (08/27/90)

I've been thinking about whether using OO techniques for language
specifications would be useful, especially as I try to understand the
Fortran 90 specification.

It seems previous attempts to formally specify languages have produced specs
somewhat unreadable by mere mortals, impossible to implement with
contemporary technology, difficult to maintain and enhance, or any
combination of these problems.  Yet mostly-human-readable specs can seem
readable but suffer the expected problems of divergent opinion on
interpretation and inability to quickly produce a working implementation
directly from the spec; and even their readability becomes questionable when
the sheer complexity of the language makes the lower "locality of reference"
of related topics in human-readable languages into a significant problem.
(The spec for Fortran 90 should convince anyone of the validity of that
statement!)

But perhaps using a very simple OO language, or I should say one at least
incorporating inheritance and a simple message- and argument-passing
mechanism, the community at large can once again try to scale this
conceptually ideal mountain of a specification that is both easily read and
understand by humans and trivially implemented by feeding it into a
preexisting system.

For Fortran 90, I've sort of attempted to classify the incredible variety of
twisty little objects, mostly alike but all slightly different, using a
variant of multiple inheritance that I made up for the purpose.  After
typing it all in by studying the spec over a period of a few weeks, and
writing a program to read it and check it in various ways for sanity, I've
temporarily given up the effort to implement Fortran 90 for now.  But it
seems the approach was working, at least to codify my own understanding of
the spec.  If the resulting text file could be coaxed, via a reasonably
simple program, into producing code that would check constraints and track
object refinements, it would suggest that perhaps the approach could
generally be applied to describe the language in the first place.  When and
if I resume working on Fortran 90 instead of a subset (for GNU Fortran), I
plan to pick up where I left off.

An interesting experiment would be to design a simplified (yet equivalently
powerful) version of, say, Fortran 90 and write a spec AND implementation
using these techniques.  It seems to me that OOP techniques would offer a
higher "locality of reference" and (almost equivalently) more compact
descriptions of entities than traditional procedural or BNF notations.

My random thoughts on the relation of such a class description to a parser
generator tool go in two directions at once: on the one hand, a classic
BNF-with-implementation-decoration (Yacc) description can contain messages
rather than functions in the implementation sections, and the object model
of the language would capitulate by specifying the methods and their
implementations.  On the other hand, even lexer tokens are themselves
objects, so why should they (along with input file, input line, input
character, and so on) not also be classified along with the rest of the
language model?  Offhand, it seems to me the former approach would allow one
to concentrate on a few important questions while producing useful results,
while the latter would elevate one's approach to looking at a plethora of
interesting questions from a higher level, while risking being unable to
implement a fully functioning spec/system within a reasonable timeframe.

I'd be very interested in anyone's comments on these ideas, pro, con, or
otherwise, since I have plans to pursue these ideas on my own in addition to
sharing them freely with others.  Use email; I'll summarize the responses to
this if there seems to be enough interest.  (In fact, I'd be thrilled if it
turned out somebody has already developed this line of reasoning more fully,
so I can stop reinventing the wheel!)

Terry, please let us know (and me via email) what you decide to do.  It will
be very interesting to see what you come up with!

James Craig Burley, Software Craftsperson    burley@world.std.com
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.