[comp.lang.forth] Threading

Mitch.Bradley@ENG.SUN.COM (04/20/91)
> Forth code is usually compiled as a threaded but you can quite
> easily convert it to subroutine threaded and even pure machine code.

On most processors, subroutine threaded code without in-line machine
code expansion is SLOWER than direct threaded code.  This is because
typical program thread from code word to code word 8 times more frequently
than they nest and unnest colon definitions.  The "jsr/rts" pair usually
has to push a return address on a stack, whereas typical direct-threaded
in-line compiled "NEXT" routines keep "IP" in a register.

However, subroutine threading opens the door to in-line machine code
expansion.  The tradeoffs in a nutshell:

        * If you don't plan to use in-line expansion of code words,
          don't use subroutine threading.

        * If you really must have the ultimate speed, then use subroutine
          threading with in-line code expansion and peephole optimization.
          (Be honest about this; most applications bottleneck on I/O, and
          most compute-bound applications spend nearly all their time in
          a very few inner loops.  It is often cost-effective to use
          threaded code for most of the application and hand-code a few
          critical words).

        * Threaded code is easier to debug.  It is possible to decompile
          in-line expanded code, but not easy, especially if peephole
          optimization has been performed.

Mitch Bradley, wmb@Eng.Sun.COM