terry@cs.tamu.edu (Terry Escamilla) (08/25/90)
Keywords; parse, question, yacc I am interested in designing a parser generator tool using object-oriented approaches. For example, one could completely write a parser generator in Eiffel, regardless of the output language for the parser itself. The question, though, is how to come up with an interesting slant to the topic so that it is sufficient for a dissertation. Some ideas that I have include: - reusable components for creating similar parser generator tools, - extending the concept of grammar inheritance proposed by Aksit, et. al., from University of Twente in the Netherlands, or - making every component of the grammar an object with multiple attributes for grammar symbols (contrary to YACC), thus giving a very fine grain of reusability (though I don't know what one would do with this). Another possibility is to take an OO approach for compiler specifications. I am unfamiliar with current research in building a compiler which is specified by its grammar and a formal semantics. However, there may be some potential for OOP in this approach as well. Thanks for any ideas, Terry Escamilla terry@cs.tamu.edu -- Send compilers articles to compilers@esegue.segue.boston.ma.us {ima | spdcc | world}!esegue. Meta-mail to compilers-request@esegue.
burley@world.std.com (James C Burley) (08/27/90)
I've been thinking about whether using OO techniques for language specifications would be useful, especially as I try to understand the Fortran 90 specification. It seems previous attempts to formally specify languages have produced specs somewhat unreadable by mere mortals, impossible to implement with contemporary technology, difficult to maintain and enhance, or any combination of these problems. Yet mostly-human-readable specs can seem readable but suffer the expected problems of divergent opinion on interpretation and inability to quickly produce a working implementation directly from the spec; and even their readability becomes questionable when the sheer complexity of the language makes the lower "locality of reference" of related topics in human-readable languages into a significant problem. (The spec for Fortran 90 should convince anyone of the validity of that statement!) But perhaps using a very simple OO language, or I should say one at least incorporating inheritance and a simple message- and argument-passing mechanism, the community at large can once again try to scale this conceptually ideal mountain of a specification that is both easily read and understand by humans and trivially implemented by feeding it into a preexisting system. For Fortran 90, I've sort of attempted to classify the incredible variety of twisty little objects, mostly alike but all slightly different, using a variant of multiple inheritance that I made up for the purpose. After typing it all in by studying the spec over a period of a few weeks, and writing a program to read it and check it in various ways for sanity, I've temporarily given up the effort to implement Fortran 90 for now. But it seems the approach was working, at least to codify my own understanding of the spec. If the resulting text file could be coaxed, via a reasonably simple program, into producing code that would check constraints and track object refinements, it would suggest that perhaps the approach could generally be applied to describe the language in the first place. When and if I resume working on Fortran 90 instead of a subset (for GNU Fortran), I plan to pick up where I left off. An interesting experiment would be to design a simplified (yet equivalently powerful) version of, say, Fortran 90 and write a spec AND implementation using these techniques. It seems to me that OOP techniques would offer a higher "locality of reference" and (almost equivalently) more compact descriptions of entities than traditional procedural or BNF notations. My random thoughts on the relation of such a class description to a parser generator tool go in two directions at once: on the one hand, a classic BNF-with-implementation-decoration (Yacc) description can contain messages rather than functions in the implementation sections, and the object model of the language would capitulate by specifying the methods and their implementations. On the other hand, even lexer tokens are themselves objects, so why should they (along with input file, input line, input character, and so on) not also be classified along with the rest of the language model? Offhand, it seems to me the former approach would allow one to concentrate on a few important questions while producing useful results, while the latter would elevate one's approach to looking at a plethora of interesting questions from a higher level, while risking being unable to implement a fully functioning spec/system within a reasonable timeframe. I'd be very interested in anyone's comments on these ideas, pro, con, or otherwise, since I have plans to pursue these ideas on my own in addition to sharing them freely with others. Use email; I'll summarize the responses to this if there seems to be enough interest. (In fact, I'd be thrilled if it turned out somebody has already developed this line of reasoning more fully, so I can stop reinventing the wheel!) Terry, please let us know (and me via email) what you decide to do. It will be very interesting to see what you come up with! James Craig Burley, Software Craftsperson burley@world.std.com -- Send compilers articles to compilers@esegue.segue.boston.ma.us {ima | spdcc | world}!esegue. Meta-mail to compilers-request@esegue.