[comp.lang.forth] Forth Compilation

ews00461@uxa.cso.uiuc.edu (08/01/89)

I posted this question a while ago, and not even "jax" responded, so
I'll try it again...

I've seen Forth systems that generate object code.  How is usually done ?
Inline code ?  Lots of jsr (subroutine calls) ?  Are they simply using
the dictionary as a "symbol table" ?

I'm interested in any discussion at all on this, if you don't know how
it IS done, how would you do it ?

It seems to me that storing assembler mneumonics for each dictionary
word and simply generating inline code would work, but I'm sure there
are other algortihms.  Also, some flexibility is necessary I'm sure
because pure inline expansion of all words would yield HUGE code.

Any ideas ?

Eric W Sink
ews00461@uxa.cso.uiuc.edu

wmb@SUN.COM (Mitch Bradley) (08/02/89)

How people do inline code expansion:

a) First of all, the basic mechanism is to compile a "jsr" or "bsr" or
   "call" (or whatever) instruction for each Forth word called by a
   definition.

   In and of itself, this is usually a lose, because on most machines
   it takes up more space than threaded code, plus it is likely to be
   slower because most machines push a return address on the memory
   stack for each "jsr" and pop it on the return.

   However, it opens the door to in-line compilation, which can be a
   big win:

b) Some words are marked with an "in-line" bit.  When the compiler
   encounters those words, instead of compiling the "jsr", it copies
   the machine code for that word into the new definition.  Often these
   in-line code sequences are only a few bytes long, so the code expansion
   is minimal.  Longer words are usually not worth expanding in-line
   anyway, since their call/return overhead is amortized over a greater
   amount of actual work.

c) The compiler can do peephole optimization as it compiles the in-line
   code.  For instance, if one word ends with, e.g. PUSH A0  and the
   next word begins with, e.g. POP A0, the compiler could just erase
   both instructions.  More sophisticated optimizations are of course
   possible.

d) The "in-line" words generally have surrogate versions which can be
   "ticked" and executed in the normal interactive way.

In-line compilation is actually quite easy to implement.  I did an
in-line version of my system several years ago.  The basic work took
a couple of evenings.  I didn't pursue the effort much further, because
I was unwilling to trade off the ability to decompile for extra speed,
which I could always get by writing a few code words.

Mitch

toma@tekgvs.LABS.TEK.COM (Tom Almy) (08/02/89)

In article <114600003@uxa.cso.uiuc.edu> ews00461@uxa.cso.uiuc.edu writes:

>I've seen Forth systems that generate object code.  How is usually done ?
>Inline code ?  Lots of jsr (subroutine calls) ?  Are they simply using
>the dictionary as a "symbol table" ?

>I'm interested in any discussion at all on this, if you don't know how
>it IS done, how would you do it ?

About seven years ago I was wondering why no one ever compiled Forth, so
I set about writing a Forth compiler.  The first one I did (using MMS Forth
for the TRS-80) took about 10 screens of code.  I later embelished it into
a 120 screen version that is now part of LMI Forth products (Z-80, 8086, and
80386 based, five diferent versions).

1.	A new keyword COMPILE: is used instead of : in the definition.  While
	the definition is basically an unchanged colon def, it compiles into
	the dictionary like a CODE word.  An additional keyword CDOES> is
	used in place of DOES>, and generates code like ;CODE.  An additonal
	keyword genereated PROCs (standard asm subroutines) which could be
	efficiently called from other COMPILE: words.

2.	About 110 Forth primitives compile directly into machine code.  This
	includes all of the control structure words and arithmetic functions
	as well as most of the memory referencing words.  I have an extension
	to compile inline 8087 or 80387 code for about a dozen floating point
	primitives.

3.	Variables, constants and equates (sometimes known as VARs or TO 
	variables) are compiled directly into machine code.

3A.	(I forgot to add...) any unrecognized words cause an inline change
	to threaded mode.  Compilation reverted to threaded mode when a
	recognized word was parsed.


4.	Up to two registers are used to hold top of stack values.  The compiler
	keeps track of how many and where.  A loop index can be kept in a
	third register.  Additional registers are used as temporaries.
	A lookahead technique is used to merge primitive operations into single
	instructions.  For instance, if A B and C are integers, A @ B @ + C !
	would compile into:
		MOV	AX, WORD PTR A
		ADD	AX, WORD PTR B
		MOV	WORD PTR C, AX
	There is also optimization performed on comparison operations followed
	by conditional operations.  A @ B @ > IF  generates 3 instructions.

5.	Performance improvement runs 2-7x, with code size expansion 0% - 20%
	typically.

I also wrote a batch Forth compiler CFORTH, for Z-80 and 8086.  This compiler,
written in Forth, was based on the Native Code Compiler, discussed above.
Major features:

1.	Colon defintions could be directed to the target (the program being
	compiled) or the host (the compiler itself) allowing the writing of
	"macros" which can algorithmically generate data structures.  A
	parameterless macro facility was also provided.

2.	A "compile only to resolve references" keyword allowed libraries.
	The parts of Forth-83 supplied in subroutine form, as well as DOS
	and other libraries were supplied in source form, and read in with
	each compile.

3.	An assembler was provided.  

4.	ROMMable generation.  Z-80 version allowed RAM and ROM segments.
	8086 version allowed separate CODE DATA and STACK segments.  For
	MS/DOS use, TINY model (.COM file) and SMALL model (.EXE file) are
	supported, and the stack segment can be separate in either case.

5.	A multitasker allowed multiple execution threads (separate stack
	segments but shared code and data segments).  A facility allowed
	independent task variables (USER VARIABLES in Forthspeak).

6.	Compilation time was on par with Borland TURBO languages for 
	similar programs.  Execution speed was slightly better than Turbo-C
	or MSC, and code size much smaller than any other compiler I've
	seen.  Typically smaller than Forth!  (No headers, remember!).

CFORTH is still available, but has never really sold well.  I use it all
the time because it simply is the fastest, most compact compiler around (btw,
the exe file size is about 32k!).  I feel that it died because:

1.	Lack of promotion -- who has heard about it?

2.	Forth programmers have a distaste for batch compilers.

3.	Other language uses (who like(?) batch compilers) have a distaste for
	Forth.


Tom Almy
toma@tekgvs.labs.tek.com
Yes, I have a commercial interest in this, but the interest does not
extend to my employer, though.

marc@noe.UUCP (Marc de Groot) (08/03/89)

In article <114600003@uxa.cso.uiuc.edu> ews00461@uxa.cso.uiuc.edu writes:
>I've seen Forth systems that generate object code.  How is usually done ?
>Inline code ?  Lots of jsr (subroutine calls) ?  Are they simply using
>the dictionary as a "symbol table" ?

I have seen (alas, only briefly) two interpreters that address this.  The
first if JForth for the Amiga.  It is a subroutine-threaded Forth (i. e.
compiled JSR's).  It is flexible, allowing object code compilation to
be mixed with traditional threaded code pointers.  There are some neat
parameters you can control (like, to what level the compiler will
de-thread the code).
Wil Baden gave a talk at one of the Forth conventions about turning
Forth right into 68000 code.  I will give an example, but please bear with
me: it's been years since I wrote 68000 assembler.  I will probably get the
mnemonics wrong.

Let's say that register A0 points at the parameter stack.  DROP can be
coded as
	addq	A0, 4

DUP becomes
	movl	(A0), (A0)-

I'll leave the rest of the standard primitives as an exercise for the
reader (along with 50 or so push-ups :-).

>It seems to me that storing assembler mneumonics for each dictionary
>word and simply generating inline code would work, but I'm sure there
>are other algortihms.  Also, some flexibility is necessary I'm sure
>because pure inline expansion of all words would yield HUGE code.

You are right.  I once wrote a word called [DECOMPILE] .  If you
said [DECOMPILE] WORDS inside a colon definition, it would expand
the reference to WORDS into in-line code.  WORDS (actually it was
VLIST) expands to about 2.5k bytes!

More sophisticated is the concept of the optimizing compiler for Forth.
Each word capable of compiling code should look at the code already compiled
(or some representation thereof) and eliminate unnecessary operations.
The optimizing compiler that Chuck Moore put in cmFORTH for the Novix
chip is a fascinating example of how to do it, although his coding
style is pretty unreadable :-( .
-- 
Marc de Groot (KG6KF)                   These ARE my employer's opinions!
Noe Systems, San Francisco
UUCP: uunet!hoptoad!noe!marc
Internet: marc@kg6kf.AMPR.ORG

jax@well.UUCP (Jack J. Woehr) (08/03/89)

	Forth syntax compilers have been around for a while. See JFAR
IV/3 page 379 "Compiling Forth for Performance" by Thomas Almy.

	Some of Silicon Composers' Forth systems, I believe, have partaken
more of the nature of Forth cross-compilers than "live" Forth systems.
I may be wrong on this latter. 

	New Micros Forth for the MC68HC11 is an "almost" compiler, the
code bodies being generated in RAM and shifted out to EPROM sans headers.

	The advantages that drive people into the arms of Forth are to a 
certain extent negated by the compiler environment. It's that quick
feedback that makes Forth a dream for control projects.

	But Forth compilers certainly will continue to be written for
special applications, such as extrememly high performance realtime
systems, and I encourage you to pursue this if you are interested.

{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}
{}                                                                        {}
{} jax@well     ." Sysop, Realtime Control and Forth Board"      FIG      {}
{} jax@chariot  ." (303) 278-0364 3/12/2400 8-n-1 24 hrs."     Chapter    {}
{} JAX on GEnie       ." Tell them JAX sent you!"             Coordinator {}
{}                                                                        {}
{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}

andrew@idacom.UUCP (Andrew Scott) (08/09/89)

In article <114600003@uxa.cso.uiuc.edu>, ews00461@uxa.cso.uiuc.edu writes:
> 
> I've seen Forth systems that generate object code.  How is usually done ?
> Inline code ?  Lots of jsr (subroutine calls) ?  Are they simply using
> the dictionary as a "symbol table" ?

Yes, yes, and sometimes.  Or, to be a little less glib, there are many ways
of implementing subroutine threaded systems and you've touched on some of the
techniques used.

First of all, a subroutine threaded Forth uses subroutine calls as the threading
mechanism.   Depending on the underlying architechture, this can be faster than
any other kind of threading because the inner interpreter has been eliminated.
Literals are handled by inline code that pushes the value to the stack.  Cond-
itional code uses the processor's native branch instructions instead of mani-
pulating the inner interpreter's instruction pointer.

The subroutine calls need not take up more space, either.  For a 68000 system,
you can use the two-byte or four-byte forms of BSR when compiling words that
are within 128 or 32K of the current location.  You only need to use a JSR
instruction when you call a word over 32K away.  If the code is sufficiently
modular, it shouldn't happen that often.

Inline code is usually used as an optimization in these systems.  Things like
DUP, DROP, +, and EXIT are usually very short words and do not increase the
size of code when inlined.

I recently wrote a subroutine-threaded Forth that added a new class of optim-
izations to those above.  I made the compiler do more than " CFA , " when
compiling a word.  Instead, sequences of words are found that can be collapsed
into short sequences of inline code.  For example:

	Forth			68000 code
	-----			----------
	DUP >R			MOV.L   (SP), -(RP)

	LIT +			ADDI.L  #N, (SP)     (ADDQ used when possible)

	= 0BRANCH		CMPM.L  (SP)+, (SP)+
				BNE.S   ??

The latter example illustrates how conditionals can be made even faster.  IF,
UNTIL, WHILE etc. all compile the primitive 0BRANCH (or ?BRANCH in other
Forths).  A relational operator such as = can be "folded into" the branch,
eliminating the need of pushing a true/false value to the stack only to be
popped off by 0BRANCH and tested.

Using a list of about 75 sequence "rules", the compiler produced code for a
particular application that ran about three times as fast as indirect threaded
Forth and was 12% smaller.

The optimizer was not that large, either.  It was written in about 250 lines
of assembler (for compilation speed).  The rules file was about 200 lines of
Forth.  No "inline" flags were required either.  The only hook was to replace
the  CFA ,  in the guts of INTERPRET with a new word, which I called (COMPILE).


(BTW: I'm glad to see that activity on comp.lang.forth has picked up recently!)
-- 
Andrew Scott			andrew@idacom
			- or -	{att, watmath, ubc-cs}!alberta!idacom!andrew