[comp.lang.forth] FORTH and memory

ns@cat.cmu.edu (Nicholas Spies) (07/22/88)

The argument that FORTH is less desirable because large amounts of memory
are now common, is one-sided. Sure, FORTH is no longer the _only_ way to
make toy, home computers do useful work without resorting to assembler, but
the fact that Megs of memory are now available to be squandered doesn't
make it virtuous to do so...

Case in point: work with HyperCard on the Mac (which requires 750K) sometimes
boggles my mind, in that it uses 15 times the memory required by my old
MMSFORTH system on a TRS-80 Model I (48K RAM). On the Trash-80, the FORTH
was the operating system (another 256K of ROM on the Mac, plus 100+K of system
file). I regularly had in RAM a tracing Z-80 disassembler, an editor with a 
ring buffer (good for outlining), and other stuff, with about 30K left over.
Compare this with the peculiar manner in which HyperCard stacks balloon to
several 100K over their nominal size on disk, requiring periodic compacting,
which otherwise wastes gobs of space--more than the total space that I had
with the 80, even with 3 floppies.

Granted, I'm getting far more (graphics, sophisticated glitz) now than
before, but I sometimes wonder how much more I am missing because the Mac
OS wasn't written in FORTH. I suspect, alot.

Consider also, that with networking becoming a much more important aspect
of computing, that program bulk becomes again a real problem, for a bogged
down network it often percieved as worse than no network at all. In this
case, the performance of threaded vs. unthreaded code must include some
thought of overall system throughput, not just the time it takes a single
CPU to munch through code. How much better it might be to install on each
machine on a network a threaded dictionary of routines, whose execution
time might be slower, but which could be invoked by the few characters of
a FORTH word rather than heroic block moves of object code.

Summary: Small _still_ is beautiful. :-)

-- 
Nicholas Spies			ns@cat.cmu.edu.arpa
Center for Design of Educational Computing
Carnegie Mellon University

orr@cs.glasgow.ac.uk (Fraser Orr) (07/26/88)

In article <2353@pt.cs.cmu.edu> ns@cat.cmu.edu (Nicholas Spies) writes:
>The argument that FORTH is less desirable because large amounts of memory
>are now common, is one-sided. Sure, FORTH is no longer the _only_ way to
>make toy, home computers do useful work without resorting to assembler, but
>the fact that Megs of memory are now available to be squandered doesn't
>make it virtuous to do so...
>

This is most certainly true, but the point I was trying to make is that
memory is much cheaper than programmer time. I agree that memory should be
used as sparingly as possible (one point you didn't mention is that on systems
with virtual memory overuse of memory can cause performance problems), but it
is my contention that memory usage and (as I mentioned before) speed considerations,
should only be the concern of the compiler not the programmer. 
If this is so then the programmer can concentrate on space and
time efficient algorithims rather that detailed implementation problems.
(Surely everyone has heard the story of the programmer who was very proud of
his sorting routine written in assembler, that he had spent ages cutting one
microscend of the inner loop here, and 23 microseconds off the loop preamble.
what he forgot to mention was that it was a bubble sort, because n.log(n) sorts
were too hard to code in assembler:-)

>Consider also, that with networking becoming a much more important aspect
>of computing, that program bulk becomes again a real problem, for a bogged
>down network it often percieved as worse than no network at all. In this
>case, the performance of threaded vs. unthreaded code must include some
>thought of overall system throughput, not just the time it takes a single
>CPU to munch through code. How much better it might be to install on each
>machine on a network a threaded dictionary of routines, whose execution
>time might be slower, but which could be invoked by the few characters of
>a FORTH word rather than heroic block moves of object code.
>

     This is certainly true.. One of the things I always liked about 
forth was the way common functions were shared amongst all the `programs' 
on the system. I think though, this idea could be moved to high level 
languages. Indeed in UNIX this the system calls are shared amongst all
programs in the kernel. 
There is no reason why this could not be extended so that you had a 
`super kernel' with stuff like stdio, the maths library etc, in it. So you
would have the advantage mentioned above by having every machine on the 
network preload both the kernel and the `super kernel' and locking them 
in memory.
This of course could be extended to other operating systems and languages.

==Fraser Orr ( Dept C.S., Univ. Glasgow, Glasgow, G12 8QQ, UK)
UseNet: {uk}!cs.glasgow.ac.uk!orr       JANET: orr@uk.ac.glasgow.cs
ARPANet(preferred xAtlantic): orr%cs.glasgow.ac.uk@nss.cs.ucl.ac.uk

mea@kolvi.hut.fi (Matti Aarnio) (07/31/88)

In article <1530@crete.cs.glasgow.ac.uk> orr%cs.glasgow.ac.uk@nss.cs.ucl.ac.uk (Fraser Orr) writes:
>In article <2353@pt.cs.cmu.edu> ns@cat.cmu.edu (Nicholas Spies) writes:
>>The argument that FORTH is less desirable because large amounts of memory
>>are now common, is one-sided. Sure, FORTH is no longer the _only_ way to
>>make toy, home computers do useful work without resorting to assembler, but
>>the fact that Megs of memory are now available to be squandered doesn't
>>make it virtuous to do so...

  I believe you haven't heard about single-chip processors and their
limitations concerning amount of memories available.

  Presently I have a project (hobby :-) where I plan to use 68HC11 without
expensive mask programing (its my 3 units I need, not 10 000).
Alternatives I seem to have are to program in assembler (free ram is 128
BYTES, EEPROM 3-5 kB depening on version, mask programmable ROM left unused)
or grab an preprogrammed FORTH core for it, and add my application to EEPROM
area. (still that 3-5 kB!)

>(Surely everyone has heard the story of the programmer who was very proud of
>his sorting routine written in assembler, that he had spent ages cutting one
>microscend of the inner loop here, and 23 microseconds off the loop preamble.
>what he forgot to mention was that it was a bubble sort, because n.log(n)
>sorts were too hard to code in assembler:-)

 Shell-Metzner (sp?) (aka Shell) -sort isn't too complex.
Once I coded Shell to replace bubble in one assemblers symbol table output
routines -- using assembler.  That may be bad with IBM/370, but 6502 was nice.
(I was compiling FIG-FORTHs for Apple-2 back then :-) -- it had funny memory
 layout due to computers high resolution graphics memorys location, but it did
 work nice.)

    /Matti Aarnio
University of Turku, Wihuri physical laboratory, SF-20500 Turku, Finland
 UUCP: mea@kolvi.hut.fi BITNET: fys-ma@fintuvm.bitnet
(UUCP maybe: mea@fintuvm.utu.fi )
Present toys: IBM-3033 with UNIX V.0, Amiga 2000, some PCs

jbn@glacier.STANFORD.EDU (John B. Nagle) (08/01/88)

      I have used the New Micros Forth on the M68HC11.  It is painfully
slow.  10000 0 DO LOOP takes about one full second.  Interpreters are
slow, there's no getting around it.


					John Nagle

ns@cat.cmu.edu (Nicholas Spies) (08/02/88)

Mach2 on a Mac SE does 10000 0 do loop in 1/30 sec (2 ticks)--30 times faster.

I think it's hard to generalize about interpreters, because there is more than
one way they are implemented. For instance, some Forths store the top couple
of items of the parameter stack in registers, instead of memory. Some (like
Mach2) have a seperate stack for DO..LOOP use (instead of using the 68000
stack). Some Forths compile (in the Forth sense) a pointer to a routine to
allow the inner (or address) interpreter to thread Forth words, while other
Forths such as Mach2 compile Forth words to be executed as 68000 subroutines,
which are executed by JSR.

Several Forths permit colon definitions to be compiled as in-line code,
avoiding all threading during execution (but usually using more space). The
ones I know about are HSForth and PCForth (both for PCs) and Mach2.

One of the things that makes Forth so interesting is that it offers the
programmer knowledge and control over the details of compilation, and how
routines are factored into in-line code vs. subroutine calls,etc. As such,
it is an ideal vehicle for learning how a symbolic representation of a
"program" (source code) ends up begin executable (object code). Forth is far
better than Pascal (for instance) for appreciating the kinds of things that
really happen when you write a program in any language...
-- 
Nicholas Spies			ns@cat.cmu.edu.arpa
Center for Design of Educational Computing
Carnegie Mellon University

jax@well.UUCP (Jack J. Woehr) (08/02/88)

In article <17597@glacier.STANFORD.EDU> jbn@glacier.UUCP (John B. Nagle) writes:
>
>      I have used the New Micros Forth on the M68HC11.  It is painfully
>slow.  10000 0 DO LOOP takes about one full second.  Interpreters are
>slow, there's no getting around it.
>
>

	Fancy meeting you here, John! Anyway, good buddy, what does the
interpreter have to do with the execution speed of 10000 0 DO LOOP unless
you are saying that serial communication with the 68HaCk takes a full
second? The interpreter is not invoked in a DO ... LOOP in Forth.

	Jack "I wanted to buy 64 Novixen but one Super8 was cheaper" Woehr
	jax@well jax@chariot JAX on GEnie
	: SMARTER ( ---) 10000 0 DO READ-TING :-) LOOP ;

orr@cs.glasgow.ac.uk (Fraser Orr) (08/02/88)

In article <132@kolvi.hut.fi> mea@kolvi.UUCP (Matti Aarnio) writes:
>
>  I believe you haven't heard about single-chip processors and their
>limitations concerning amount of memories available.

I agree, in some very limited applications languages like forth are
most appripriate. That is not to say though that they are right for 
all (or even a lot of) applications.

[Anecdote about assembler crazed programmer deleted]
>
> Shell-Metzner (sp?) (aka Shell) -sort isn't too complex.
>Once I coded Shell to replace bubble in one assemblers symbol table output
>routines -- using assembler.  That may be bad with IBM/370, but 6502 was nice.

Sorry but you completely missed the point. Shell might be as easy to code
as bubble ( it isn't and it also isn't n.log(n)), what I was saying is that
coding in a high level language does not of necessity, produce less efficient
programs. The purpose of an HLL is to allow the programmer to concentrate
on the PROBLEM and not the PROGRAM. The sign of a good programmer is one
that writes algorithims, not code.

==Fraser Orr ( Dept C.S., Univ. Glasgow, Glasgow, G12 8QQ, UK)
UseNet: {uk}!cs.glasgow.ac.uk!orr       JANET: orr@uk.ac.glasgow.cs
ARPANet(preferred xAtlantic): orr%cs.glasgow.ac.uk@nss.cs.ucl.ac.uk

toma@tekgvs.GVS.TEK.COM (Tom Almy) (08/02/88)

In article <2529@pt.cs.cmu.edu> ns@cat.cmu.edu (Nicholas Spies) WRITES:
>I think it's hard to generalize about interpreters, because there is more than
>one way they are implemented. [...]

>Several Forths permit colon definitions to be compiled as in-line code,
>avoiding all threading during execution (but usually using more space). The
>ones I know about are HSForth and PCForth (both for PCs) and Mach2.

Well since my Native Code Compiler (in PCForth) was mentioned, I'll comment.
Normally PCForth (and the newer URForth, also by Laboratory Microsystems)
compile into threaded (the former indirect, the latter direct) code.  The
NCC allows compiling individual colon definitions into machine code.  The 
only change ususally required is to change ":" to "COMPILE:".  Of course,
all this is fully interactive, you can type in a COMPILE: definition from
the keyboard.  Space used ranges from slightly more (on average) for PCForth
to considerably less for UR/Forth 386, whose threaded code uses 32 bit
pointers.  With UR/Forth, which has segments, there is considerably less use
of the data segment, and considerably more use of the code segment.  This
is typically advantageous since most applications run out of data segment
space (which holds data and threads) well before code space (machine code).

The benchmark ": BENCH 10000000 0 DO LOOP ; " (YES, TEN MILLION) running
under UR/Forth 386 on a 20Mhz 0ws box got the following results:

			Speed		Space
Threaded Code		14.9 seconds	44 bytes
NCC			5.6 seconds	12 bytes

By the way, I have also written a batch FORTH compiler, which works like
"traditional" languages -- 1. edit, 2. compile, 3. debug, 4. goto step 1,
which compiles and links to an executable program in a single pass.  This
compiler has proven exceedingly unpopular because Forth programmers consider
traditional compilers anathema, while traditional compiler users consider
Forth anathema!

Tom Almy
toma@tekgvs.TEK.COM

Disclaimer -- The above is not connected in any way with my employer,
Tektronix.  I do get royalties from the Native Code Compiler and the CFORTH
Forth compiler.

gandreas@umn-d-ub.D.UMN.EDU (Glenn Andreas) (08/03/88)

In article <6700@well.UUCP> jax@well.UUCP (Jack J. Woehr) writes:
>In article <17597@glacier.STANFORD.EDU> jbn@glacier.UUCP (John B. Nagle) writes:
>>      I have used the New Micros Forth on the M68HC11.  It is painfully
>>slow.  10000 0 DO LOOP takes about one full second.  Interpreters are
>>slow, there's no getting around it.

>	Fancy meeting you here, John! Anyway, good buddy, what does the
>interpreter have to do with the execution speed of 10000 0 DO LOOP unless
>you are saying that serial communication with the 68HaCk takes a full
>second? The interpreter is not invoked in a DO ... LOOP in Forth.
	 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
>	Jack "I wanted to buy 64 Novixen but one Super8 was cheaper" Woehr

Usually it is however.  Remember that FORTH has TWO interpreters, the inner
and the outer.  The outer interpreter reads the input and creates the words
that are defined and threads them together.  The inner interpreter actually
executes these threads.  Some FORTHs however, create the equivalent of a
bunch of assembly instructions saying JSR a JSR b JSR c, etc...  This all
depends on what method of threading is done (direct, indirect, token,
etc...) and will vary from FORTH to FORTH.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
= "When I was young, all I wanted was to be  | - gandreas@ub.d.umn.edu -    =
=  ruler of the universe.  Now that isn't    |   Glenn Andreas              =
=  enough" - Alex P. Keaton                  |                              =
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

ns@cat.cmu.edu (Nicholas Spies) (08/03/88)

"The interpreter is not invoked in a DO...LOOP in Forth" is half true; the
Outer Interpreter (invoked by INTERPRET in some Forths) compiles or executes
Forth words if in compile or execute mode (indicated by the system variable
STATE) and is not involved with DO...LOOP. However, "compiled" Forth code
is executed by an Inner (or Address) Interpreter, which follows the threaded
Forth code (a list of pointers), eventually unthreading it down to the
machine-code primitives that perform the useful work of the program. Forth
execution overhead (the work involved in unthreading the list of pointers)
varies with the implementation and how many levels of colon definitions
that depend on earlier colon definitions there are.

However, as I mentioned in an earlier post, some Forth systems do have ways to
compile colon definitions as in-line machine code to avoid unthreading
overhead for those words during execution. 

All of this applies to "virtual Forth machines", not the Novix, etc.

-- 
Nicholas Spies			ns@cat.cmu.edu.arpa
Center for Design of Educational Computing
Carnegie Mellon University

jax@well.UUCP (Jack J. Woehr) (08/03/88)

In article <2529@pt.cs.cmu.edu> ns@cat.cmu.edu (Nicholas Spies) writes:
>Mach2 on a Mac SE does 10000 0 do loop in 1/30 sec (2 ticks)--30 times faster.
>
>I think it's hard to generalize about interpreters, because there is more than
>one way they are implemented. For instance, some Forths store the top couple
>of items of the parameter stack in registers, instead of memory. Some (like
>Mach2) have a seperate stack for DO..LOOP use (instead of using the 68000
>stack). Some Forths compile (in the Forth sense) a pointer to a routine to
>allow the inner (or address) interpreter to thread Forth words, while other
>Forths such as Mach2 compile Forth words to be executed as 68000 subroutines,
>which are executed by JSR.
>
>Several Forths permit colon definitions to be compiled as in-line code,
>avoiding all threading during execution (but usually using more space). The
>ones I know about are HSForth and PCForth (both for PCs) and Mach2.

	To which we might add, direct threading, as in some schemes
proposed in the seventies for the PDP-11 &c.; And that JForth for
the Amiga is another JSR-Threaded Forth; and that other schemes include
making ENTER ( NEST) EXIT ( UNNEST) and NEXT microcode ( Zilog Super8)
and the inner-intepreterless Novix and Johns Hopkins chips, which set a
bit in instruction ( or bits, in the case of JH) to indicate code|jump.

+_+_+_+_+_+==============================================================

jack woehr   "Pronounced as in `Armed Conflict'."
jax@well     "Host of the WELL Forth Conference."
jax@chariot  "I live in Colorado."
JAX on GEnie "Would you buy a Single Board Computer from this man?"

jax@well.UUCP (Jack J. Woehr) (08/04/88)

In article <2558@pt.cs.cmu.edu> ns@cat.cmu.edu (Nicholas Spies) writes:
>"The interpreter is not invoked in a DO...LOOP in Forth" is half true; the
>Outer Interpreter (invoked by INTERPRET in some Forths) compiles or executes
>Forth words if in compile or execute mode (indicated by the system variable
>STATE) and is not involved with DO...LOOP. However, "compiled" Forth code
>is executed by an Inner (or Address) Interpreter, which follows the threaded
>Forth code (a list of pointers), eventually unthreading it down to the
>machine-code primitives that perform the useful work of the program.
> ...
>However, as I mentioned in an earlier post, some Forth systems do have ways to
>compile colon definitions as in-line machine code to avoid unthreading
>overhead for those words during execution. 
>
>All of this applies to "virtual Forth machines", not the Novix, etc.
>

	Right, Nick! Had hoped to provoke a scold from the original
poster, at which point was to wax heretical. No sense wasting it, here
goes:

	Moore always says, "Forth is to me a a concept more than a language"
and loves to shock his faithful with remarks like "I was wrong to use
screen files; all you need is a good decompiler and you can throw away
the source" ( a la Ureli, JFAR IV,1 p231 ff)

	SOOO ... with chips like the Novix and to a lesser extent the
Super8 ... ( is NOTHING sacred?) bye-bye inner interpreter ...

***
jax@well  		." If G-d had writen Genesis in FORTH ..."
jax@chariot		." She could have rested on the *6th* day!"
JAX on GEnie