[comp.lang.forth] TIL design topics

elg@usl.UUCP (Eric Lee Green) (11/14/86)

Velcome, forthfans.

I am currently designing a small threaded language in which to program
my fairly new Commodore 128. I don't particularly like FORTH, it is
excruciatingly slow on a 6502, and the command names are just aweful
(comeon now, "!"? "."? ":"? Gimme a break!). But I do like the general
FORTH idea. So, I'm looking for comments on these implementation
decisions:

subroutine-threaded: Instead of threads being two-byte machine
addresses, have a jsr <address> machine instruction instead. Immediate
advantage is speed (NEXT is very slow on a 6502), at a cost of making
the program 3/2 larger. The largest FORTH I've ever seen for the 6502
was only 25K. My C-128 has 128K of RAM. Space is hardly a problem.

I noticed an additional goodie about subroutine threading -- you can
put assembly language code in the same procedure as regular code, and
you can call regular TIL words from within your assembly language
subroutines. This is one step further toward the goal of being able to
write your program completely in the TIL, then come back and re-write
critical routines and critical parts of routines in assembly language
to get the required speed. 

Reversing the links: In most FORTH systems, the "compiler" or
"interpreter" (threader?) starts searching for words starting at the
last word defines, and runs down the link till reaching the bottom or
finding the word. The most-used words, like "@" and "!", though, are
defined close to the bottom of the text stack.  So why not start at
the bottom, and go upwards? In Superforth-64, it sometimes takes 10
seconds to compile a block after reading it in from disk (which is
tediously slow in itself, with the notorious 1541 drive). Probably due
to the fact that it has some 250-300 words in its vocabulary.

But, what side effects will that have? Surely, there is SOME reason
that FORTH systems usually have the links going the wrong way!

Those are the major implementation questions. Now for a quick rundown
of some of the other features I'm considering:

Dynamic string handling a' la' BASIC, along with a seperate string
stack. I have the garbage collector designed, and it is very time
efficient, at the cost of being a bit space-inefficient (a string
variable consisists of a length, address, and link -- 5 bytes). Since
the application I have in mind is very string-intensive, this was a
very important feature to have.

And a simple module structure, somewhat like Forth "vocabularies",
	except with better syntax.
And a method of doing overlays, easily.
And array declaration and access words.... 

And a file-oriented system. Screens are the pits. The designer of
whatever hard drive etc. that I'm using has already gone to a bunch of
trouble making a compatible DOS driver, it's really a drag to have to
write a FORTH driver for that drive myself when the DOS driver is
sitting there... I have a SFD-1001 disk drive that I cannot use with
Superforth-64 simply because I do not have time or inclination to go
in and critically mangle the screens code. Not to mention that having
to cram all of your code into 16 lines of 64 characters tends to make
you omit comments and otherwise use poor style...
-- 

      Eric Green {akgua,ut-sally}!usl!elg, elg%usl.CSNET
        (Snail Mail P.O. Box 92191, Lafayette, LA 70509)

" In the beginning the Universe was created. This has made a lot of
 people very angry and been widely regarded as a bad move."

jin@hropus.UUCP (Jerry Natowitz) (11/17/86)

> Reversing the links: In most FORTH systems, the "compiler" or
> "interpreter" (threader?) starts searching for words starting at the
> last word defines, and runs down the link till reaching the bottom or
> finding the word. The most-used words, like "@" and "!", though, are
> defined close to the bottom of the text stack.  So why not start at
> the bottom, and go upwards? In Superforth-64, it sometimes takes 10
> seconds to compile a block after reading it in from disk (which is
> tediously slow in itself, with the notorious 1541 drive). Probably due
> to the fact that it has some 250-300 words in its vocabulary.
> 
> But, what side effects will that have? Surely, there is SOME reason
> that FORTH systems usually have the links going the wrong way!

One very good reason:  FORTH's extensibility permits you to redefine
a word but continue to use it's old definition in code previously
compiled.  People will probably argue about the correctness of using
this feature (I've used it and been burned by it, my vlist command
stars redefined words to warn of the condition) but it is there for
the using.

BTW: I don't program in FORTH anymore, writing a DBS in it was the most
excrusiating experience of my professional career.
-- 
     Jerry Natowitz (HASA - J division)
     Bell Labs - HR 2A-214
     201-615-5178 (no CORNET yet)
     ihnp4!houxm!hropus!jin (official)
     ihnp4!opus!jin         (better)

willner@cfa.harvard.edu (Steve Willner) (11/17/86)

In article <1001@usl.UUCP>, elg@usl.UUCP (Eric Lee Green) writes:
> 
> subroutine-threaded: Instead of threads being two-byte machine
> addresses, have a jsr <address> machine instruction instead. Immediate
> advantage is speed (NEXT is very slow on a 6502), at a cost of making
> the program 3/2 larger. The largest FORTH I've ever seen for the 6502
> was only 25K. My C-128 has 128K of RAM. Space is hardly a problem.
> 
I don't see any reason not to do this if you have the space, but I
can't imagine that the speed gain will be significant.  (But I'm not
familiar with the 6502; maybe it doesn't have decent indirect
addressing??) 

> Reversing the links: In most FORTH systems, the "compiler" or
> "interpreter" (threader?) starts searching for words starting at the
> last word defines, and runs down the link till reaching the bottom or
> finding the word. The most-used words, like "@" and "!", though, are
> defined close to the bottom of the text stack.  So why not start at
> the bottom, and go upwards? 
Because then if you redefine a word, you get the old definition and
not the new one.  Better to compile once and _save the compiled
program_; then you don't really care how long it takes to compile.
Even if you are developing programs and making frequent changes, you 
can make most of them directly in the compiled code rather than in
the source code provided you have a decent debugger.  Then when
everything's working, go back and change the source.  I can post a
rudimentary debugger if anyone wants to see it.
 
> Dynamic string handling a' la' BASIC, along with a seperate string
> stack. 
Sounds good to me.
> 
> And a simple module structure, somewhat like Forth "vocabularies",
> 	except with better syntax.
Sounds good too. Would you post the syntax that you like better?
> And a method of doing overlays, easily.
Definitely good.  I trust you have a good file system to manage the
overlay files.
> And array declaration and access words.... 
Sure.
> 
> And a file-oriented system. Screens are the pits. 
I couldn't agree more.  You should have a good screen editor too.
> Not to mention that having
> to cram all of your code into 16 lines of 64 characters tends to make
> you omit comments and otherwise use poor style...
It's not the organization into blocks that's the problem, it's the
lack of higher organization (i.e. files) and the lack of total space.
Our FORTH implementation accepts source files of up to 500 blocks
each, and one source file can load others using a filename-lookup
syntax.  I like to put a comment in line zero giving the contents of
the block; then I can create an index by printing line zero of each
block.  With plenty of total space, there is no reason to fill a
block, so organization does not suffer, and there is plenty of room
for comments.  My source files typically range from 10 to 30 blocks
each, and a complete program may contain 40 to 50 source files.
Normally, one provides a single file that does nothing but load all
the other files in the appropriate order.

Two more things to consider:
If you plan to do any floating point calculations, put the floating
point numbers on a separate stack.  This sounds unnecessary, but
having tried both kinds of implementations, I assure you that it
makes a huge difference.  A stack 15 elements deep should be plenty.
I can post more notes on implementation if there is sufficient interest.

Use more than the usual "3 characters plus length" to identify words.
I am informed (but have not verified) that "5 characters plus length"
is sufficient.  Making sure that, e.g. STATE and START are not the
same will eliminate many, many problems.
Steve Willner              Phone 617-495-7123        Bitnet: willner@cfa1
60 Garden St.              FTS:      830-7123         UUCP:   willner@cfa
Cambridge, MA 02138 USA  Telex:  921428 satellite cam

ccplumb@watnot.UUCP (Colin Plumb) (11/17/86)

In article <1001@usl.UUCP> elg@usl.UUCP (Eric Lee Green) writes:
>Velcome, forthfans.
>
>I am currently designing a small threaded language in which to program
>my fairly new Commodore 128. I don't particularly like FORTH, it is
>excruciatingly slow on a 6502, and the command names are just aweful
>(comeon now, "!"? "."? ":"? Gimme a break!).

Well, you can redefine them if you're not happy - but I find they're quite
legible after a little bit.

>                                              But I do like the general
>FORTH idea. So, I'm looking for comments on these implementation
>decisions:
>
>subroutine-threaded: Instead of threads being two-byte machine
>addresses, have a jsr <address> machine instruction instead. Immediate
>advantage is speed (NEXT is very slow on a 6502),

Yes, trying to handle lots of indirection using 16-bit pointers on a
processor without 16-bit registers is the pits.  (La Brea tar pits, that is :-)
Of course, you could always use the Z-80 in the 128.   (1/2 :-)

>                                                  at a cost of making
>the program 3/2 larger. The largest FORTH I've ever seen for the 6502
>was only 25K. My C-128 has 128K of RAM. Space is hardly a problem.

Just don't forget that spreading a dictionary across segments is tricky.
you might want to copy a bit from Commodore and put only arrays and strings
in bank 1, with the rest in bank 0, or some similar static allocation scheme.

>
>I noticed an additional goodie about subroutine threading -- you can
>put assembly language code in the same procedure as regular code, and
>you can call regular TIL words from within your assembly language
>subroutines. This is one step further toward the goal of being able to
>write your program completely in the TIL, then come back and re-write
>critical routines and critical parts of routines in assembly language
>to get the required speed. 

Actually, you can already do this in indirect-threaded Forth, with the
limitation that a given word must be either all Forth or all assembler.
It is, in fact, one of the great strengths of Forth - the calling procedure
doesn't change a bit from Forth words to assembler primitives.

>
>Reversing the links: In most FORTH systems, the "compiler" or
>"interpreter" (threader?) starts searching for words starting at the
>last word defines, and runs down the link till reaching the bottom or
>finding the word. The most-used words, like "@" and "!", though, are
>defined close to the bottom of the text stack.  So why not start at
>the bottom, and go upwards? In Superforth-64, it sometimes takes 10
>seconds to compile a block after reading it in from disk (which is
>tediously slow in itself, with the notorious 1541 drive). Probably due
>to the fact that it has some 250-300 words in its vocabulary.
>
>But, what side effects will that have? Surely, there is SOME reason
>that FORTH systems usually have the links going the wrong way!

The main problem you'll encounter is that your application words will get
lost.  For example, if you first load some utilities, they may use the word
FOO for some purpose.  You, the user, might not even know about it, since
it's an internal word.  Then you try to compile your program, which also
uses FOO, and you'll end up with a non-working application, since all
references to FOO were caught by the utility's definition.

One way to speed up dictionary searching is to split up each linked list into,
say, 4 or 8, and hash the word to choose which list to use. XORing the first
letter of the word with its length, and masking out the appropriate number of
bits is reasonably efficient and fast.

>
>Those are the major implementation questions. Now for a quick rundown
>of some of the other features I'm considering:
>
>Dynamic string handling a' la' BASIC, along with a seperate string
>stack. I have the garbage collector designed, and it is very time
>efficient, at the cost of being a bit space-inefficient (a string
>variable consisists of a length, address, and link -- 5 bytes). Since
>the application I have in mind is very string-intensive, this was a
>very important feature to have.

Sounds good - you might want to post details for other interested parties.

>
>And a simple module structure, somewhat like Forth "vocabularies",
>	except with better syntax.

I don't mind the standard Forth vocabulary words if you use a vocabulary
stack.   But if you've got something better, please post.

>And a method of doing overlays, easily.

Good.  This issue really ought to be addressed in a Forth Standard.

>And array declaration and access words.... 

I'm not so sure these are necessary in a general-purpose language.  There are
so many variants on arrays (bounds-checking, element size, number of
dimensions, address or value returned, etc.) that it's hard to come up with
efficient general words, and usually so few are used in any given application
that it's no chore to roll your own (start with  CREATE ARRAY  100 ALLOT
: ARRAY  2* ARRAY + ;  and add features as desired).  But you did mention that
you had a specific application in mind, so my comments may be inapplicable.

>
>And a file-oriented system. Screens are the pits. The designer of
>whatever hard drive etc. that I'm using has already gone to a bunch of
>trouble making a compatible DOS driver, it's really a drag to have to
>write a FORTH driver for that drive myself when the DOS driver is
>sitting there... I have a SFD-1001 disk drive that I cannot use with
>Superforth-64 simply because I do not have time or inclination to go
>in and critically mangle the screens code. Not to mention that having
>to cram all of your code into 16 lines of 64 characters tends to make
>you omit comments and otherwise use poor style...

Yes, screens are pretty bad compared to standard text files if you've got
a driver already written.  A blank line put in for readability should *not*
take up 64 bytes.  Again, please post any  bright ideas you have on the
subject.

>-- 
>
>      Eric Green {akgua,ut-sally}!usl!elg, elg%usl.CSNET
>        (Snail Mail P.O. Box 92191, Lafayette, LA 70509)
>
>" In the beginning the Universe was created. This has made a lot of
> people very angry and been widely regarded as a bad move."

	-Colin Plumb (ccplumb@watnot.UUCP)

Zippy says:
My CODE of ETHICS is vacationing at famed SCHROON LAKE in upstate New York!!

braner@batcomputer.tn.cornell.edu (braner) (11/17/86)

[]

In a 6502 Forth I've used ("microSpeed" on the Apple II) a compiled
Forth word is callable from AL, even though it is NOT subroutine threaded.
A pure Forth word starts with "JSR interpret" which jumps to the routine
that (using the address on the stack as a pointer) looks at the (16-bit)
tokens coming after the JSR and does a JSR to each in turn.  The last token
(actually address) in the word is the address of "NEXT" which knows how to 
backtrack this recursion.  AL words end with a JMP to a different version
of NEXT.  In that system there is no way to mix AL and Forth easily in one
word, but it could be easily extended to have an address the AL could JSR
to to switch to Forth for the rest of the word, and a token that would
switch to AL from Forth.

My measurments do show, however, that with the 6502 a JSR-threaded method
IS faster.  I have since moved to the 68000, on which a 16-bit displacement
as a token should be a lot faster, but alas I'm not using Forth right now.

How about modifying Forth by splitting the data stack into two: one for
data (16 bits wide) and one for addresses (32 bits wide on a 68000).
Then you can get rid of a lot of the "noise" words such as '@', if you
construct data structures right.  Also a lot of the extra SWAPs and such
needed to manipulate data and addresses on one stack are eliminated.
For speed, I suggest using very short stacks and keeping them in registers!
(e.g. data stack 4 deep, address stack 2 deep - easily done on the 68000,
leaving lotsa registers free even after you use two as stack pointers
and another as return-stack pointer).  Admittedly, that is NOT Forth anymore!
You would have to use explicit variables a lot more (due to the short
stacks).  But it would be more readable and a lot faster!  Forth was designed
for the hardware of the Seventies, and there is no reason to stick to all
of its details forever.  In this sense I agree that "screens" are obsolete,
especially when you are writing a language for a machine which already has
a traditional OS complete with "files".  In that case there is no penalty
for using what's already there (sometimes in ROM!) and there is an advantage
in making Forth's text files compatible with other uses of the machine.

- Moshe Braner,   Cornell University

steve@jplgodo.UUCP (Steve Schlaifer x43171 301/167) (11/17/86)

In article <1001@usl.UUCP>, elg@usl.UUCP (Eric Lee Green) writes:
> [....]
> So, I'm looking for comments on these implementation decisions:
> 
> subroutine-threaded: Instead of threads being two-byte machine
> addresses, have a jsr <address> machine instruction instead. Immediate
> advantage is speed (NEXT is very slow on a 6502), at a cost of making
> the program 3/2 larger. The largest FORTH I've ever seen for the 6502
> was only 25K. My C-128 has 128K of RAM. Space is hardly a problem.

I have implemented Forth on both the Apple //e and the Apple //GS and
used subroutine threading in both cases.  It makes the code run a *lot*
faster when you don't have such goodies as auto-increment indexed indirect
addressing modes.  It does mean that some code which was written
for indirect threaded implementations may have to be reworked somewhat.
In particular, words like ',' are sometimes used to compile a 16 bit value
and sometimes an address of a word to be run.  In the first case, you will
probably want to continue to compile the 16-bit value in, but in the 
second you will need to compile the 24 bit JSR.  I built a word called 'J,'
which compiles a JSR to the top of stack value and left ',' to just compile
the 16 bit value off of the top of stack.

> I noticed an additional goodie about subroutine threading -- you can
> put assembly language code in the same procedure as regular code, and
> you can call regular TIL words from within your assembly language
> subroutines. 
> [....]

This is one of the handier features of subroutine threading.  It allows
sequences of high level code to be optimized so that

	R> DROP		( drop the top value from the return stack )

can become the machine code

	PLA PLA

if you use the 6502 stack for the return stack.

> Reversing the links: In most FORTH systems, the "compiler" or
> "interpreter" (threader?) starts searching for words starting at the
> last word defines, and runs down the link till reaching the bottom or
> finding the word. The most-used words, like "@" and "!", though, are
> defined close to the bottom of the text stack.  So why not start at
> the bottom, and go upwards? In Superforth-64, it sometimes takes 10
> seconds to compile a block after reading it in from disk (which is
> tediously slow in itself, with the notorious 1541 drive). Probably due
> to the fact that it has some 250-300 words in its vocabulary.
> 
> But, what side effects will that have? Surely, there is SOME reason
> that FORTH systems usually have the links going the wrong way!

If you run the links from first to last rather than the normal last to first,
then you will not be able to redefine existing words.  A better way to speed
the dictionary search is either to hash the names or use fixed length names.
One scheme for the latter is to keep the first three and the last character
of a name.  I don't like using short names but know of implementations where
it is done that way (it also can save memory).  You could also redefine
the most commonly used words so that they appear near the end of the
dictionary and would be found quicker but at some cost in memory.  I.e. after
the system is booted up, make new definitions for '@', '!', etc. and save the
resulting dictionary for future use.

> [....]
> And a file-oriented system. Screens are the pits. 
> [....]

My Apple //e implementation used screens.  Screens are simple, easy to
implement and easy to understand.  My main objection to them on the Apple
was that the trailing blanks on each line took up space on the disk and
the //e disks are very small (143K bytes).  My //GS implementation uses
ProDOS files and the supplied operating system for access to the disk for
the reasons you give..I don't want to build the device drivers myself,
better use of the disk space, longer files.
-- 

...smeagol\			Steve Schlaifer
......wlbr->!jplgodo!steve	Advance Projects Group, Jet Propulsion Labs
....logico/			4800 Oak Grove Drive, M/S 301/165
				Pasadena, California, 91109
					+1 818 354 3171

howellg@idec.stc.co.uk (Gareth Howell) (11/18/86)

In article <1001@usl.UUCP> elg@usl.UUCP (Eric Lee Green) writes:
>Velcome, forthfans.
>
>I am currently designing a small threaded language in which to program
>my fairly new Commodore 128. I don't particularly like FORTH, it is
>excruciatingly slow on a 6502, and the command names are just aweful
> ....
>
>Reversing the links: In most FORTH systems, the "cgmpiler" or
>"interpreter" (threader?) starts searching for words starting at the
>last word defines, and runs down the link till reaching the bottom or
>finding the word. The most-used words, like "@" and "!", though, are
>defined close to the bottom of the text stack.  So why not start at
>the bottom, and go upwards? In Superforth-64, it sometimes takes 10
>seconds to compile a block after reading it in from disk (which is
>tediously slow in itself, with the notorious 1541 drive). Probably due
>to the fact that it has some 250-300 words in its vocabulary.
>
>But, what side effects will that have? Surely, there is SOME reason
>that FORTH systems usually have the links going the wrong way!
>
The major drawback that I can see is that the re-definition of a word
will not cause that new definition to be used in future invocations;
the old one will always be used.

howellg@idec.stc.co.uk (Gareth Howell) (11/20/86)

In article <1001@usl.UUCP> elg@usl.UUCP (Eric Lee Green) writes:
>
>I am currently designing a small threaded language in which to program
>my fairly new Commodore 128. I don't particularly like FORTH, it is
>excruciatingly slow on a 6502, and the command names are just aweful
>(comeon now, "!"? "."? ":"? Gimme a break!). But I do like the general
>FORTH idea. So, I'm looking for comments on these implementation
>decisions:
>
>subroutine-threaded: Instead of threads being two-byte machine
>addresses, have a jsr <address> machine instruction instead. Immediate
>advantage is speed (NEXT is very slow on a 6502), at a cost of making
>the program 3/2 larger. The largest FORTH I've ever seen for the 6502
>was only 25K. My C-128 has 128K of RAM. Space is hardly a problem.
>
Have you considered swapping your 6502 for a Western Digital 65802
processor.  This is a pin for pin replacement that has an expanded
register and instruction set; including instructions that speed up
words like NEXT.  Have a look at "Implementing FORTH on a New
Processor: FORTH for the 65816 and 65802" by John Bowling of Starlight
FORTH Systems.  It was presented at the 1984 Rochester FORTH
conference.

I don't have any personal experience of this mod. 'though it is on my
hit list for my BBC micro, but the author claims a doubling in
execution speed and a reduction of nearly a third in code space
occupancy without ny other hardware mods.
	gareth

sentinel@killer.UUCP (11/23/86)

In article <754@argon.idec.stc.co.uk>, howellg@idec.stc.co.uk (Gareth Howell) writes:
> Have you considered swapping your 6502 for a Western Digital 65802
> processor.  [etc]

    I would like to point out that this mod is most likely impossible on a
128.  The processor in the C-128 is actually an 8502, which is yet another
custom Commodore derivative.  The 8502 is apparently a 2 Mhz version of the
6510, which was used in the 64.
    Therein lies the problem.  The 6510 (and the 8502) has an 8-bit I/O
port built in, which are located at addresses $0000-$0001.  And needless to
say, the C-128 and C-64 use this port (for hardware configuration and tape
I/O).
    If there was a 65802 with the proper I/O port on board, it should be
possible to replace the 8502 with it, but otherwise this modification
is not feasible.

    By the way, if this has been done, you may ignore me.
----
Rob Tillotson
...ihnp4!killer!sentinel

ggw@ethos.UUCP (Gregory Woodbury) (11/26/86)

In article <754@argon.idec.stc.co.uk> howellg@idec.stc.co.uk (Gareth Howell) writes:
>In article <1001@usl.UUCP> elg@usl.UUCP (Eric Lee Green) writes:
}>
}>I am currently designing a small threaded language in which to program
}>my fairly new Commodore 128. I don't particularly like FORTH, it is
}>excruciatingly slow on a 6502, and the command names are just aweful
}>
>Have you considered swapping your 6502 for a Western Digital 65802
>processor.  This is a pin for pin replacement that has an expanded
>register and instruction set; including instructions that speed up
>words like NEXT.  Have a look at "Implementing FORTH on a New
>Processor: FORTH for the 65816 and 65802" by John Bowling of Starlight
>FORTH Systems.  It was presented at the 1984 Rochester FORTH
>conference.

The only problem is that the C-128 CPU is an 8502 which is a 6510 work-alike
that has additional lines for external control devices and co-processors.
I have been looking around for details of the 8502, but can't locate data sheets
for it.  Additionally, on the C-128, you should consider using the MMUs
ability to relocate the stack and page 0 to alternate locations, this frees
additional space for the implementations of stack based operations.

------------------------------------------
Gregory G. Woodbury				The usual disclaimers apply
CEO, Research Triangle C-64/128 User's Group
{duke|mcnc|rti-sel}!ethos!ggw                 The line eater is a boojum snark!
-- 
------------------------------------------
Gregory G. Woodbury				The usual disclaimers apply
CEO, Research Triangle C-64/128 User's Group
{duke|mcnc|rti-sel}!ethos!ggw                 The line eater is a boojum snark!