andrew@idacom.uucp (Andrew Scott) (08/30/90)
Doug Philips writes: > (I was quite surprised when you actually showed up on the net. Another > lurker out in the open!) I've been around for quite some time. I can usually be found in rec.sport.hockey. :-) > I do have a question about your system. For example: > > IF ( code-series-1 ) ELSE ( code-series-2 ) THEN ( code-series-3 ) > > would it be reasonable, possible, sane, to be able to push copies of > code-series-3 into each branch and see if the optimizer can do better that > way? Hmm. Good point. The code generated by: ( n \ condition ) IF 5 ELSE 9 THEN + would be less optimal than ( n \ condition ) IF 5 + ELSE 9 + THEN because the sequence LIT + combines into a single instruction. I think it would be more trouble than the results warrant to handle this situation. I would have to save all the compilation tokens in the entire IF construct before the decision could be made. Currently, the optimizer is quite small because I only have to queue up 4 tokens at most (the longest sequence in the optimization rules set is 4 words). There are many more optimizations that my compiler could tackle, but I don't want to end up with a mammoth system. I get decent results from the sequence optimizations I perform now. I was going to experiment with an optimizer that worked at the assembly language level. Each Forth primitive (like DUP) could be coded in "machine" primitives (like POP and PUSH). Redundant POPs and PUSHes could be eliminated and values could remain in registers over a long span of code. Movement to the stack in memory would only take place when necessary. This approach might yield big speed improvements on non-Forth processors. -- Andrew Scott | mail: andrew@idacom.uucp | - or - ..!uunet!ubc-cs!alberta!idacom!andrew
dwp@willett.pgh.pa.us (Doug Philips) (09/06/90)
In <1990Aug29.171704.15174@idacom.uucp>, andrew@idacom.uucp (Andrew Scott) writes: > I would have to save all the compilation tokens in the entire IF > construct before the decision could be made. Currently, the optimizer is > quite small because I only have to queue up 4 tokens at most (the longest > sequence in the optimization rules set is 4 words). And you could always do the code motion by hand, or recode assembler if it were really critical. > There are many more optimizations that my compiler could tackle, but I don't > want to end up with a mammoth system. I get decent results from the sequence > optimizations I perform now. The first 90% takes 90%, the last 10% takes the other 90% > I was going to experiment with an optimizer that worked at the assembly > language level. Each Forth primitive (like DUP) could be coded in "machine" > primitives (like POP and PUSH). Redundant POPs and PUSHes could be eliminated > and values could remain in registers over a long span of code. Movement > to the stack in memory would only take place when necessary. This approach > might yield big speed improvements on non-Forth processors. Do you mean that you'll have another virtual machine with PUSH POP etc. or do you mean the real machine level instructions. I think you could probably do a good machine code peephole optimizer that way. Of course, if you do automatic inlining (instead of having a special word :: ) you would want to delay "optimization" until the end. Maybe with a multiple cfa scheme you could have both the optimized and unoptimized code, depending on what you need to use at the time. -Doug --- Preferred: ( dwp@willett.pgh.pa.us OR ...!{sei,pitt}!willett!dwp ) Daily: ...!{uunet,nfsun}!willett!dwp [last resort: dwp@vega.fac.cs.cmu.edu]
andrew@idacom.uucp (Andrew Scott) (09/08/90)
Doug Philips writes: >>I was going to experiment with an optimizer that worked at the assembly >>language level. Each Forth primitive (like DUP) could be coded in "machine" >>primitives (like POP and PUSH). Redundant POPs and PUSHes could be eliminated >>and values could remain in registers over a long span of code. Movement >>to the stack in memory would only take place when necessary. This approach >>might yield big speed improvements on non-Forth processors. > Do you mean that you'll have another virtual machine with PUSH POP etc. > or do you mean the real machine level instructions. I think you could > probably do a good machine code peephole optimizer that way. Currently, my optimizer makes the following translation: Forth sequences --> native code instructions The idea was to create a multi-stage optimizer/translator: Forth word --> sequence of virtual primitives virtual primitive reduction (peephole optimization) sequence of virtual primitives --> native code instructions This idea looks attractive for creating native code target compilers for several CPUs. The tough part would be in coming up with the set of atomic primitives that could be combined into appropriate instructions for a wide variety of CPU instruction sets. The target compiler engine could be the same for many CPUs; only the translation rules would differ. -- Andrew Scott | mail: andrew@idacom.uucp | - or - ..!uunet!ubc-cs!alberta!idacom!andrew
stephen@estragon.uchicago.edu (Stephen P Spackman) (09/11/90)
It's possible you (meaning, people who are following this thread) want to read: Debaere & van Campenhout _Interpretation and Instruction Path Coprocessing_ MIT Press 1990. ISBN 0-262-04107-3 Price? I have no idea. It's got some odd ideas in it, including some speedup techniques that work WITHOUT going to a native compiler (and some other stuff about doing the same thing in hardware). There's a review of it in the latest SIGPLAN Notices (I wrote the review, that's how I know). Very mixed feelings I have about this book, but you might get off on it. Just FYI. stephen p spackman stephen@estragon.uchicago.edu 312.702.3982
dwp@willett.pgh.pa.us (Doug Philips) (09/12/90)
In <1990Sep7.231911.20882@idacom.uucp>, Andrew Scott writes: > Currently, my optimizer makes the following translation: > > Forth sequences --> native code instructions > > The idea was to create a multi-stage optimizer/translator: > > Forth word --> sequence of virtual primitives > virtual primitive reduction (peephole optimization) > sequence of virtual primitives --> native code instructions > > > This idea looks attractive for creating native code target compilers for > several CPUs. The tough part would be in coming up with the set of atomic > primitives that could be combined into appropriate instructions for a wide > variety of CPU instruction sets. The target compiler engine could be the > same for many CPUs; only the translation rules would differ. So to paraphrase, you target compile into a "high-level-assembly" and then from their have machine dependant optimizing-assemblers? I wonder if eForth would be worth a look? Assuming that it is a renamed version of pigForth, see pages 19-22, "More on Forth Engines," Volume 12, from which I quote without permission: Here is a list of important features we should strife (sic) to include in this Universal pigForth model: High level word set, built on the kernel words. The high level definitions must be portable to all target CPU's, including 8, 16 and 32 bit machines. To enforce portability, words are tokenized to a single set of tokens. The high level word set will be portable as a binary image to all CPU's. Tokenizing the Forth system also makes it easy to optimize an implementation after it is ported. It seems to me like there are some goals in common. (The eForth.ZIP version doesn't contain the implementation manual.) -Doug --- Preferred: dwp@willett.pgh.pa.us Daily: {uunet,nfsun}!willett!dwp