ews00461@uxa.cso.uiuc.edu (08/01/89)
I posted this question a while ago, and not even "jax" responded, so I'll try it again... I've seen Forth systems that generate object code. How is usually done ? Inline code ? Lots of jsr (subroutine calls) ? Are they simply using the dictionary as a "symbol table" ? I'm interested in any discussion at all on this, if you don't know how it IS done, how would you do it ? It seems to me that storing assembler mneumonics for each dictionary word and simply generating inline code would work, but I'm sure there are other algortihms. Also, some flexibility is necessary I'm sure because pure inline expansion of all words would yield HUGE code. Any ideas ? Eric W Sink ews00461@uxa.cso.uiuc.edu
wmb@SUN.COM (Mitch Bradley) (08/02/89)
How people do inline code expansion: a) First of all, the basic mechanism is to compile a "jsr" or "bsr" or "call" (or whatever) instruction for each Forth word called by a definition. In and of itself, this is usually a lose, because on most machines it takes up more space than threaded code, plus it is likely to be slower because most machines push a return address on the memory stack for each "jsr" and pop it on the return. However, it opens the door to in-line compilation, which can be a big win: b) Some words are marked with an "in-line" bit. When the compiler encounters those words, instead of compiling the "jsr", it copies the machine code for that word into the new definition. Often these in-line code sequences are only a few bytes long, so the code expansion is minimal. Longer words are usually not worth expanding in-line anyway, since their call/return overhead is amortized over a greater amount of actual work. c) The compiler can do peephole optimization as it compiles the in-line code. For instance, if one word ends with, e.g. PUSH A0 and the next word begins with, e.g. POP A0, the compiler could just erase both instructions. More sophisticated optimizations are of course possible. d) The "in-line" words generally have surrogate versions which can be "ticked" and executed in the normal interactive way. In-line compilation is actually quite easy to implement. I did an in-line version of my system several years ago. The basic work took a couple of evenings. I didn't pursue the effort much further, because I was unwilling to trade off the ability to decompile for extra speed, which I could always get by writing a few code words. Mitch
toma@tekgvs.LABS.TEK.COM (Tom Almy) (08/02/89)
In article <114600003@uxa.cso.uiuc.edu> ews00461@uxa.cso.uiuc.edu writes: >I've seen Forth systems that generate object code. How is usually done ? >Inline code ? Lots of jsr (subroutine calls) ? Are they simply using >the dictionary as a "symbol table" ? >I'm interested in any discussion at all on this, if you don't know how >it IS done, how would you do it ? About seven years ago I was wondering why no one ever compiled Forth, so I set about writing a Forth compiler. The first one I did (using MMS Forth for the TRS-80) took about 10 screens of code. I later embelished it into a 120 screen version that is now part of LMI Forth products (Z-80, 8086, and 80386 based, five diferent versions). 1. A new keyword COMPILE: is used instead of : in the definition. While the definition is basically an unchanged colon def, it compiles into the dictionary like a CODE word. An additional keyword CDOES> is used in place of DOES>, and generates code like ;CODE. An additonal keyword genereated PROCs (standard asm subroutines) which could be efficiently called from other COMPILE: words. 2. About 110 Forth primitives compile directly into machine code. This includes all of the control structure words and arithmetic functions as well as most of the memory referencing words. I have an extension to compile inline 8087 or 80387 code for about a dozen floating point primitives. 3. Variables, constants and equates (sometimes known as VARs or TO variables) are compiled directly into machine code. 3A. (I forgot to add...) any unrecognized words cause an inline change to threaded mode. Compilation reverted to threaded mode when a recognized word was parsed. 4. Up to two registers are used to hold top of stack values. The compiler keeps track of how many and where. A loop index can be kept in a third register. Additional registers are used as temporaries. A lookahead technique is used to merge primitive operations into single instructions. For instance, if A B and C are integers, A @ B @ + C ! would compile into: MOV AX, WORD PTR A ADD AX, WORD PTR B MOV WORD PTR C, AX There is also optimization performed on comparison operations followed by conditional operations. A @ B @ > IF generates 3 instructions. 5. Performance improvement runs 2-7x, with code size expansion 0% - 20% typically. I also wrote a batch Forth compiler CFORTH, for Z-80 and 8086. This compiler, written in Forth, was based on the Native Code Compiler, discussed above. Major features: 1. Colon defintions could be directed to the target (the program being compiled) or the host (the compiler itself) allowing the writing of "macros" which can algorithmically generate data structures. A parameterless macro facility was also provided. 2. A "compile only to resolve references" keyword allowed libraries. The parts of Forth-83 supplied in subroutine form, as well as DOS and other libraries were supplied in source form, and read in with each compile. 3. An assembler was provided. 4. ROMMable generation. Z-80 version allowed RAM and ROM segments. 8086 version allowed separate CODE DATA and STACK segments. For MS/DOS use, TINY model (.COM file) and SMALL model (.EXE file) are supported, and the stack segment can be separate in either case. 5. A multitasker allowed multiple execution threads (separate stack segments but shared code and data segments). A facility allowed independent task variables (USER VARIABLES in Forthspeak). 6. Compilation time was on par with Borland TURBO languages for similar programs. Execution speed was slightly better than Turbo-C or MSC, and code size much smaller than any other compiler I've seen. Typically smaller than Forth! (No headers, remember!). CFORTH is still available, but has never really sold well. I use it all the time because it simply is the fastest, most compact compiler around (btw, the exe file size is about 32k!). I feel that it died because: 1. Lack of promotion -- who has heard about it? 2. Forth programmers have a distaste for batch compilers. 3. Other language uses (who like(?) batch compilers) have a distaste for Forth. Tom Almy toma@tekgvs.labs.tek.com Yes, I have a commercial interest in this, but the interest does not extend to my employer, though.
marc@noe.UUCP (Marc de Groot) (08/03/89)
In article <114600003@uxa.cso.uiuc.edu> ews00461@uxa.cso.uiuc.edu writes: >I've seen Forth systems that generate object code. How is usually done ? >Inline code ? Lots of jsr (subroutine calls) ? Are they simply using >the dictionary as a "symbol table" ? I have seen (alas, only briefly) two interpreters that address this. The first if JForth for the Amiga. It is a subroutine-threaded Forth (i. e. compiled JSR's). It is flexible, allowing object code compilation to be mixed with traditional threaded code pointers. There are some neat parameters you can control (like, to what level the compiler will de-thread the code). Wil Baden gave a talk at one of the Forth conventions about turning Forth right into 68000 code. I will give an example, but please bear with me: it's been years since I wrote 68000 assembler. I will probably get the mnemonics wrong. Let's say that register A0 points at the parameter stack. DROP can be coded as addq A0, 4 DUP becomes movl (A0), (A0)- I'll leave the rest of the standard primitives as an exercise for the reader (along with 50 or so push-ups :-). >It seems to me that storing assembler mneumonics for each dictionary >word and simply generating inline code would work, but I'm sure there >are other algortihms. Also, some flexibility is necessary I'm sure >because pure inline expansion of all words would yield HUGE code. You are right. I once wrote a word called [DECOMPILE] . If you said [DECOMPILE] WORDS inside a colon definition, it would expand the reference to WORDS into in-line code. WORDS (actually it was VLIST) expands to about 2.5k bytes! More sophisticated is the concept of the optimizing compiler for Forth. Each word capable of compiling code should look at the code already compiled (or some representation thereof) and eliminate unnecessary operations. The optimizing compiler that Chuck Moore put in cmFORTH for the Novix chip is a fascinating example of how to do it, although his coding style is pretty unreadable :-( . -- Marc de Groot (KG6KF) These ARE my employer's opinions! Noe Systems, San Francisco UUCP: uunet!hoptoad!noe!marc Internet: marc@kg6kf.AMPR.ORG
jax@well.UUCP (Jack J. Woehr) (08/03/89)
Forth syntax compilers have been around for a while. See JFAR IV/3 page 379 "Compiling Forth for Performance" by Thomas Almy. Some of Silicon Composers' Forth systems, I believe, have partaken more of the nature of Forth cross-compilers than "live" Forth systems. I may be wrong on this latter. New Micros Forth for the MC68HC11 is an "almost" compiler, the code bodies being generated in RAM and shifted out to EPROM sans headers. The advantages that drive people into the arms of Forth are to a certain extent negated by the compiler environment. It's that quick feedback that makes Forth a dream for control projects. But Forth compilers certainly will continue to be written for special applications, such as extrememly high performance realtime systems, and I encourage you to pursue this if you are interested. {}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{} {} {} {} jax@well ." Sysop, Realtime Control and Forth Board" FIG {} {} jax@chariot ." (303) 278-0364 3/12/2400 8-n-1 24 hrs." Chapter {} {} JAX on GEnie ." Tell them JAX sent you!" Coordinator {} {} {} {}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}
andrew@idacom.UUCP (Andrew Scott) (08/09/89)
In article <114600003@uxa.cso.uiuc.edu>, ews00461@uxa.cso.uiuc.edu writes: > > I've seen Forth systems that generate object code. How is usually done ? > Inline code ? Lots of jsr (subroutine calls) ? Are they simply using > the dictionary as a "symbol table" ? Yes, yes, and sometimes. Or, to be a little less glib, there are many ways of implementing subroutine threaded systems and you've touched on some of the techniques used. First of all, a subroutine threaded Forth uses subroutine calls as the threading mechanism. Depending on the underlying architechture, this can be faster than any other kind of threading because the inner interpreter has been eliminated. Literals are handled by inline code that pushes the value to the stack. Cond- itional code uses the processor's native branch instructions instead of mani- pulating the inner interpreter's instruction pointer. The subroutine calls need not take up more space, either. For a 68000 system, you can use the two-byte or four-byte forms of BSR when compiling words that are within 128 or 32K of the current location. You only need to use a JSR instruction when you call a word over 32K away. If the code is sufficiently modular, it shouldn't happen that often. Inline code is usually used as an optimization in these systems. Things like DUP, DROP, +, and EXIT are usually very short words and do not increase the size of code when inlined. I recently wrote a subroutine-threaded Forth that added a new class of optim- izations to those above. I made the compiler do more than " CFA , " when compiling a word. Instead, sequences of words are found that can be collapsed into short sequences of inline code. For example: Forth 68000 code ----- ---------- DUP >R MOV.L (SP), -(RP) LIT + ADDI.L #N, (SP) (ADDQ used when possible) = 0BRANCH CMPM.L (SP)+, (SP)+ BNE.S ?? The latter example illustrates how conditionals can be made even faster. IF, UNTIL, WHILE etc. all compile the primitive 0BRANCH (or ?BRANCH in other Forths). A relational operator such as = can be "folded into" the branch, eliminating the need of pushing a true/false value to the stack only to be popped off by 0BRANCH and tested. Using a list of about 75 sequence "rules", the compiler produced code for a particular application that ran about three times as fast as indirect threaded Forth and was 12% smaller. The optimizer was not that large, either. It was written in about 250 lines of assembler (for compilation speed). The rules file was about 200 lines of Forth. No "inline" flags were required either. The only hook was to replace the CFA , in the guts of INTERPRET with a new word, which I called (COMPILE). (BTW: I'm glad to see that activity on comp.lang.forth has picked up recently!) -- Andrew Scott andrew@idacom - or - {att, watmath, ubc-cs}!alberta!idacom!andrew