adamk@mit-amt.MEDIA.MIT.EDU (Adam Kao) (01/06/90)
I've been seriously considering extending Forth into a C interpreter. I should be careful to distinguish this from writing a traditional interpreter/compiler, such as Dojun Yoshikami and Ian Green seem to be discussing. A traditional compiler is usually broken into modules something like the following: a lexical scanner that tokenizes the input a grammar/parser that places the tokens in a syntax tree a typechecker that looks at nodes in the syntax tree a code generator that looks at nodes in the syntax tree For details see Aho, Sethi, and Ullman's "Dragon Book" (not its real name). Since Dojun Yoshikami and Ian Green are using these terms, I assume they are using the standard compiler architecture as a starting point. But Forth's philosophy is that words do their own work, even compiling words. In my opinion this feature is not nearly emphasized enough. It is truly a unique feature to Forth; the only similar feature I know of is the continuation in Scheme. The Forth compiler shows us how to implement traditional control structures with words that execute themselves, using the current state of the stacks to record where they are. There is no monolithic compiler. Compiling words "know" what they do. The traditional compiler distinction of compiler as procedures versus source code as data does not exist in Forth. Therefore to extend Forth into a C interpreter means to use the Forth philosophy while writing a C compiler, defining words like if, {, goto, and so on. Ideally every word would look at the current state (current expression, unprocessed input stream, etc.) and perform the correct actions possible up to that point. There are some interesting issues that I have been thinking about: 1. The Forth "scanner" is rudimentary. Essentially it does only three things: interpret numbers, use whitespace as a separator, and make up words out of everything else. C has certain features that depend on a traditional scanner, most notably that non-alphanumeric tokens (eg +, ->, {) are automatically separated. Thus we must have a traditional scanner. 2. C uses infix notation. The Forth-style infix operators I have seen were all unwieldy or non-intuitive (I would love to be corrected on this). Thus, for expressions at the very least we will need a traditional parser. 3. Actually it is misleading to try and match C tokens with Forth words. Incomplete expressions generally cannot be parsed. There is a much better match between C statements and Forth words. Both specify complete actions and so can be compiled as a unit. This leads us to some simple design decisions. We can view ; and perhaps { } and , as fundamental separators. Upon reaching one of these, our more sophisticated scanner can eliminate whitespace and form a token list. We may still be able to associate actions with each token in this list, since they can now look ahead to the end of the statement and be guaranteed a complete expression. Thus we execute each token, with words like + leaving a reminder on some stack. I believe C control structures can be direct copies of Forth control structures, although I have not thought deeply about it. This is all preliminary speculation; I have not written a single line of code yet. I want to hear ideas and discussion, I want to know if anyone has tried this before. The most helpful responses will be those that start "The right way to do <x> is . . . " Every reader of this group should know what advantages Forth would bring to C. The advantages of C for Forth are somewhat harder to quantify. Forth users may benefit from access to existing C code. Perhaps C programmers could be "lured" into a Forth environment this way, and then learn about Forth's advantages. And following the Forth tradition, I intend to make this interpreter half as big and twice as fast as any existing C interpreter or compiler! :-) But after all, the only real justification is that it is an interesting problem. I hope it interests you, as well. Adam
wmb@SUN.COM (Mitch Bradley) (01/07/90)
I wrote a C lexical scanner in Forth. It was fairly easy and quite small. Scanning is easy; parsing is a bit harder. Probably the easiest way to do a C parser in Forth would be to take a yacc grammar of C and run it through yacc to get the state tables. Then duplicate the effect of the (quite small) yacc table walking machine in Forth. Then you have to decide what to do for code generation. That's where the real engineering skill comes into play. You have to worry about how to implement things like type promotion and storage allocation and switch statements (decent C compilers automatically generate jump tables if the cases are densely packed and if else if else ... chains if the cases are sparse). Doing a complete job will take a lot of work, so be prepared for it. By the way, Tayfun Kocaoglu, who works with me at Sun, did a subset C compiler in Forth for his master's thesis at University of Florida. Tayfun wrote a scanner and a code generator, and he used yacc for the parser. Mitch
sabbagh@acf5.NYU.EDU (sabbagh) (01/09/90)
In article <9001070938.AA07304@jade.berkeley.edu> Forth Interest Group International List <FIGI-L%SCFVM.bitnet@jade.berkeley.edu> writes: >I wrote a C lexical scanner in Forth. It was fairly easy and quite >small. Scanning is easy; parsing is a bit harder. Probably the >easiest way to do a C parser in Forth would be to take a yacc grammar >of C and run it through yacc to get the state tables. Then duplicate >the effect of the (quite small) yacc table walking machine in Forth. Why not implement it as recursive descent? Need to left factor the grammar, but Forth is inherently suitable, since the word definitions could be made to look like the grammar. I wrote a small interpreter this way a while back. The C language is well-suited to this approach; the PCC compiler (Steve Johnson?) used it in parsing. Hadil G. Sabbagh E-mail: sabbagh@csd27.nyu.edu Voice: (212) 998-3125 Snail: Courant Institute of Math. Sci. 251 Mercer St. New York,NY 10012 186,282 miles per second -- it's not just a good idea, it's the law!
wmb@SUN.COM (Mitch Bradley) (01/09/90)
> Why not implement [a C parser] as recursive descent?
No ironclad reason; it just seemed like yacc would be easier, since
a yacc grammar for C is floating around, and yacc parse tables
are compact, easy, and fast to interpret.
Mitch
peter@ficc.uu.net (Peter da Silva) (01/09/90)
> >easiest way to do a C parser in Forth would be to take a yacc grammar > >of C and run it through yacc to get the state tables. Then duplicate > Why not implement it as recursive descent?... I'll second this, from experience. I was involved in a project about 6 years ago that involved a language parser in Forth. I wrote the parser, language-dependent editor, and file system in about 2 months. Then the guy managing the project took exception to the technique I used, which was recursive-descent, and spent the next 6 months redoing what I'd done using the yacc/lex method. The project was subsequently canned, because it went way over schedule. His biggest problem seemed to be all the time it took him to incorporate a change in the language, since he'd lost the tight forth test/execute loop... -- _--_|\ Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>. / \ Also <peter@ficc.lonestar.org> or <peter@sugar.lonestar.org> \_.--._/ v "Have you hugged your wolf today?" `-_-'
chekmate@athena.mit.edu (Adam Kao) (01/09/90)
I've been thinking more about my original proposal; I need to study the Dragon Book more closely before I can get more specific. However, I do feel that just reimplementing the C yacc table walker in Forth makes for a more traditional parser architecture than I had in mind. I was hoping for the easy extensibility that Peter mentioned. I need to study the Dragon Book to make sure I understand these and other options and the tradeoffs. Maybe I can even think of an option not in the Dragon Book (well, I can dream, can't I?). On a separate topic, I'm trying to decide which Forth to use. I've played with JForth and I like it a lot, but since this is a Forth extension (not an application) I hesitate to depend on features available from only one vendor on one machine (eg JForth's ODE). It might be better to use some standard UNIX Forth (memory protection!), possibly with redistributable extensions like those in Dick Pountain's book (those ARE redistributable, right?) Opinions? Advice? Finally, I am very interested in the experiences of the people who have implemented these kinds of things, and I'd love to get my hands on any of the code you wrote, if that's possible. Tayfun Kocaoglu's thesis sounds like exactly what I should be reading -- could you put me in touch with him, Mitch? Thank you for your time, Adam
fred@ubvax.UB.Com (Fred Noon) (01/10/90)
An advantage of recursive descent over LALR(1) tables is that error diagnostics, recovery, or just doing things "on the fly" can be made very straight-forward to implement. Using tables one has to build error recovery into the grammar; with LALR that can get a little tricky. I suspect that it is this, rather than FORTH's user interface, which would be responsible for turning a 2-month project into a 6-month project, particularly if it were someone's first go-round with lex and yacc. (Not being a production FORTHer I can't speak from experience, however.) ====================================================================== Fredrik Noon (ubvax.ub.com!fred) 5290 Garrison Circle; San Jose, CA 95123; USA tel (408) 365-7167 or, when toiling away at Ungermann-Bass (408) 562-5632 ======================================================================