[comp.lang.forth] Forth optimization

andrew@idacom.uucp (Andrew Scott) (08/30/90)

Doug Philips writes:
> (I was quite surprised when you actually showed up on the net.  Another
> lurker out in the open!)

I've been around for quite some time.  I can usually be found in
rec.sport.hockey.   :-)

> I do have a question about your system.  For example:
>
> IF ( code-series-1 ) ELSE ( code-series-2 ) THEN ( code-series-3 )
>
> would it be reasonable, possible, sane, to be able to push copies of
> code-series-3 into each branch and see if the optimizer can do better that
> way?

Hmm.  Good point.  The code generated by:

( n \ condition ) IF  5  ELSE  9  THEN +

would be less optimal than

( n \ condition ) IF  5 +  ELSE  9 +  THEN

because the sequence LIT + combines into a single instruction.

I think it would be more trouble than the results warrant to handle this
situation.  I would have to save all the compilation tokens in the entire IF
construct before the decision could be made.  Currently, the optimizer is
quite small because I only have to queue up 4 tokens at most (the longest
sequence in the optimization rules set is 4 words).

There are many more optimizations that my compiler could tackle, but I don't
want to end up with a mammoth system.  I get decent results from the sequence
optimizations I perform now.

I was going to experiment with an optimizer that worked at the assembly
language level.  Each Forth primitive (like DUP) could be coded in "machine"
primitives (like POP and PUSH).  Redundant POPs and PUSHes could be eliminated
and values could remain in registers over a long span of code.  Movement
to the stack in memory would only take place when necessary.  This approach
might yield big speed improvements on non-Forth processors.
-- 
Andrew Scott	| mail:		andrew@idacom.uucp
		| - or -	..!uunet!ubc-cs!alberta!idacom!andrew

dwp@willett.pgh.pa.us (Doug Philips) (09/06/90)

In <1990Aug29.171704.15174@idacom.uucp>, andrew@idacom.uucp (Andrew Scott) writes:
>             I would have to save all the compilation tokens in the entire IF
> construct before the decision could be made.  Currently, the optimizer is
> quite small because I only have to queue up 4 tokens at most (the longest
> sequence in the optimization rules set is 4 words).

And you could always do the code motion by hand, or recode assembler if
it were really critical.

> There are many more optimizations that my compiler could tackle, but I don't
> want to end up with a mammoth system.  I get decent results from the sequence
> optimizations I perform now.

The first 90% takes 90%, the last 10% takes the other 90%

> I was going to experiment with an optimizer that worked at the assembly
> language level.  Each Forth primitive (like DUP) could be coded in "machine"
> primitives (like POP and PUSH).  Redundant POPs and PUSHes could be eliminated
> and values could remain in registers over a long span of code.  Movement
> to the stack in memory would only take place when necessary.  This approach
> might yield big speed improvements on non-Forth processors.

Do you mean that you'll have another virtual machine with PUSH POP etc.
or do you mean the real machine level instructions.  I think you could
probably do a good machine code peephole optimizer that way.  Of course,
if you do automatic inlining (instead of having a special word :: ) you
would want to delay "optimization" until the end.  Maybe with a multiple
cfa scheme you could have both the optimized and unoptimized code,
depending on what you need to use at the time.

-Doug

---
Preferred: ( dwp@willett.pgh.pa.us  OR  ...!{sei,pitt}!willett!dwp )
Daily: ...!{uunet,nfsun}!willett!dwp  [last resort: dwp@vega.fac.cs.cmu.edu]

andrew@idacom.uucp (Andrew Scott) (09/08/90)

Doug Philips writes:
>>I was going to experiment with an optimizer that worked at the assembly
>>language level.  Each Forth primitive (like DUP) could be coded in "machine"
>>primitives (like POP and PUSH).  Redundant POPs and PUSHes could be eliminated
>>and values could remain in registers over a long span of code.  Movement
>>to the stack in memory would only take place when necessary.  This approach
>>might yield big speed improvements on non-Forth processors.

> Do you mean that you'll have another virtual machine with PUSH POP etc.
> or do you mean the real machine level instructions.  I think you could
> probably do a good machine code peephole optimizer that way.

Currently, my optimizer makes the following translation:

	Forth sequences --> native code instructions

The idea was to create a multi-stage optimizer/translator:

	Forth word --> sequence of virtual primitives
	virtual primitive reduction (peephole optimization)
	sequence of virtual primitives --> native code instructions


This idea looks attractive for creating native code target compilers for
several CPUs.  The tough part would be in coming up with the set of atomic
primitives that could be combined into appropriate instructions for a wide
variety of CPU instruction sets.  The target compiler engine could be the
same for many CPUs; only the translation rules would differ.
-- 
Andrew Scott	| mail:		andrew@idacom.uucp
		| - or -	..!uunet!ubc-cs!alberta!idacom!andrew

stephen@estragon.uchicago.edu (Stephen P Spackman) (09/11/90)

It's possible you (meaning, people who are following this thread) want
to read:

Debaere & van Campenhout
_Interpretation and Instruction Path Coprocessing_
MIT Press 1990.
ISBN 0-262-04107-3
Price? I have no idea.

It's got some odd ideas in it, including some speedup techniques that
work WITHOUT going to a native compiler (and some other stuff about
doing the same thing in hardware).

There's a review of it in the latest SIGPLAN Notices (I wrote the
review, that's how I know). Very mixed feelings I have about this
book, but you might get off on it.

Just FYI.

stephen p spackman  stephen@estragon.uchicago.edu  312.702.3982

dwp@willett.pgh.pa.us (Doug Philips) (09/12/90)

In <1990Sep7.231911.20882@idacom.uucp>, Andrew Scott writes:
> Currently, my optimizer makes the following translation:
> 
> 	Forth sequences --> native code instructions
> 
> The idea was to create a multi-stage optimizer/translator:
> 
> 	Forth word --> sequence of virtual primitives
> 	virtual primitive reduction (peephole optimization)
> 	sequence of virtual primitives --> native code instructions
> 
> 
> This idea looks attractive for creating native code target compilers for
> several CPUs.  The tough part would be in coming up with the set of atomic
> primitives that could be combined into appropriate instructions for a wide
> variety of CPU instruction sets.  The target compiler engine could be the
> same for many CPUs; only the translation rules would differ.

So to paraphrase, you target compile into a "high-level-assembly" and then
from their have machine dependant optimizing-assemblers?

I wonder if eForth would be worth a look?  Assuming that it is a renamed
version of pigForth, see pages 19-22, "More on Forth Engines," Volume 12,
from which I quote without permission:

    Here is a list of important features we should strife (sic) to
    include in this Universal pigForth model:

    	High level word set, built on the kernel words.  The high level
	definitions must be portable to all target CPU's, including
	8, 16 and 32 bit machines.

	To enforce portability, words are tokenized to a single set of
	tokens.  The high level word set will be portable as a binary
	image to all CPU's.

	Tokenizing the Forth system also makes it easy to optimize an
	implementation after it is ported.

It seems to me like there are some goals in common.  (The eForth.ZIP
version doesn't contain the implementation manual.)

-Doug
---
Preferred:  dwp@willett.pgh.pa.us    Daily:  {uunet,nfsun}!willett!dwp