ZMLEB@SCFVM.BITNET (Lee Brotzman) (07/02/88)
The following message was submitted to BITNET's Forth Interest Group
International List (FIGIL). It is pretty long so beware.
-- Lee Brotzman (FIGIL Moderator)
--------------------
Date: Mon, 20 Jun 88 18:50:37 +0200
From: Andre PIRARD <A-PIRARD@BLIULG11>
Subject: Forth in an operating system
In this note, I shall try to make the difference between the
Forth "language" and a Forth "implementation". It is regretful
for portability that the Forth standard does not define the user
interface to implement some vital functions, in particular the
access to the host operating system functions. But I'll show how
easily it can be done and how an implementation has done it for a
wide variety of different systems.
I have often read on this list questions asking why should I
choose Forth instead of another language. Explaining why is very
difficult indeed. What is called the "Forth language" is such a
widespread concept that the language itself (even the syntax) can
be extended to something quite different and that the functions
(words) implemented can lead to many different programming
environments, each oriented to a specific purpose. Choosing a
"traditional" language is most often accepting a compiler for
what it can do and how it does it. Accepting the Forth philosophy
is to choose to have a system do what you want and the way you
like it.
But while the Forth philosophy is a dream to the languages
internals specialist or the system programmers, many programmers
however have no time nor taste to spend on that. Choosing Forth
for them is choosing an implementation to do the work or better
finding already made code that almost does what they want. And
what they want is probably read one of their host system file,
use their communication lines, get the time of day and things
like that.
Well, a good Forth implementation can be a very comfortable
development system, make programming amazingly easy and allow
getting the most of the Forth language versatility. But the Forth
standard has goals that hinder the attractiveness of Forth to
some users. Specifically: 1) to keep the nucleus (required words
set) as small as possible to make it usable for applications
where storage size is essential 2) to leave the most freedom to
an implementor or user imagination and allow experimentation.
What the standard allows to be called a Forth implementation
can be very minimal, very portable, but awkward to use and of
little use in an operating system environment. On the other hand,
some vendors have built on the Forth bases to provide the same
functions that are available in other languages. Unhappily,
compatibility between them is almost null, unless for different
versions of the same implementation on different machines. So, it
all depends on what one expects from compatibility. Pure Forth
will always be transportable. On the other hand, one cannot avoid
machine dependent specifics to be adapted, but they should be
confined to a well documented separate source file. But in
between, many functions are common to all operating systems and
should be a standard feature of operating systems oriented Forth.
At the time an ANSI standard is being worked on, it should
be realized that most of the programs are written for an
operating system environment. An operating system is a bridge
between applications that communicate trough files. Considering
that Forth is its own operating system isolates it from the huge
world of other applications and reduces its acceptance to very
specific usage. In addition to a minimal required word set, the
Forth standard should define additional layers describing how to
implement different orientation of the language. An operating
system interface is a major one.
For several years, I've been working with Comforth, a Forth
implementation available and implementing a common file access
system for such various systems as MSDOS, CPM/86, CPM/80, Amiga,
Atari ST, Apple-Dos, Commodore 64, Commodore 128 and Sinclair QL.
Having a common file system is really a trigger towards many
other facilities, because their source code can be strictly
identical, like those found in Comforth for all of the above
systems:
- Forth source code can be stored in host system variable length
named sequential ASCII files. These take much less storage than
storing in blocks screens, are much easier to update (no screen
splitting problems) and to manage (no screen numbers to maintain
nor allocate, files are given expressive names).
- An source file fullscreen editor is available in a relocatable
overlay so that one is able to edit source files from within the
Forth development system itself. This not only allows editing
source files without getting out of the Forth development system,
but also allows to request compilation from within the editor. If
a compilation error is detected, the editor can be reentered with
the cursor positioned at the point of the error. The fullscreen
management words used by the editor are easily customizable to
any hardware, but also available to applications.
- The source files can be interpreted by FLOAD which is recursive
so that application files can be organized hierarchally.
- The host system files can also be maintained from within the
development system (DIR ERA DEL REN FTYPE FCOPY FPRINT DOS).
- The blocks system itself (seldom used but necessary to comply
to the standard) is common to all systems because it uses the
files access primitives.
- Utilities can be pre-compiled and stored in overlay files that
can be loaded very fast at any address in the development system
and discarded when no longer needed. Overlay files are used for
such things as assemblers, smart decompiler and a breakpoint
facility (setting a break point anywhere in an already compiled
word and specifying the action word to be executed, by default a
special interpreter displaying the stack and where stopped and
interpreting any data display/correction commands; strokes of the
return key provide step by step tracing).
- The nucleus can be augmented with any desirable extension (for
example the overlay files) to tailor a development system that is
saved as a new module by the FORTHGEN utility.
- When an application has been written and tested with the
development system, a post-compiled module can be generated with
the command: "COMGEN source-file main-word module-file map-file".
COMGEN loads the application, selects only the words it uses and
relocates them to an executable file whose size is minimal. The
names of words and dictionary structure normally disappear, but
names for a selected set of words can be preserved so they can be
used for interpretation or keywords scan.
Comforth-83 provides a lot of other features (floating
point, TEMPORARY/HEADERLESS, chaining, input line editing and
recall, etc...) which are not my point here and would be too long
to explain anyway.
The rest of this text will focus on techniques that I feel
should be part of an operating system oriented Forth.
Local storage management
------------------------
The Forth standard defines how to allocate storage, but
that's only global storage in the dictionary. Suppose a word
receives a filename (address and size) from its caller and has to
use it to call a host system function, but that the particular
system call needs a string terminated with with a null character.
Or imagine that a word reading a file must use a file control
block. These are just two examples of the many situations where
temporary storage is needed. The words to implement the base of
local storage are very simple indeed, but they should be
integrated in the Forth system because a pointer (AP, aggregate
stack pointer) needs to be initialized or reset. Local storage is
essential LIFO storage and a quick to maintain stack. An
additional heap is even better, but it does not replace LIFO
storage, because it is slower and cannot be automatically freed
so easily as a stack.
xx USER AP \ running pointer to the top of aggregate stack
xx USER A0 \ initialized to the bottom of the aggregate stack
: AP@ \ -- addr \ address of top of aggregate stack
AP @ ;
: AP! \ addr -- \ initialize aggregate stack
1- EVEN AP ! ;
: RESERVE \ size -- addr \ allocate 'size' bytes of LIFO storage
EVEN ?ROOM AP -! AP@ ;
: FREE \ size -- \ free
EVEN AP +! ;
Comments: How simple. Only I don't like the name of A0, because
it may occasionally conflict with what is intended to be a
hexadecimal constant, even if starting constants with anything
else than a digit is bad practice. Examples of use below.
The file system
---------------
A file can be viewed as a string of numbered bytes kept on
named external storage, a host system file.
The access to a particular system file storage is made
available by the use of a file-control-block, a special data
element defined by the word 'FILE' or located at an address in
dynamically reserved storage. Once defined, such a word or
address can be used to OPEN a file referenced by a string
containing the same name as that used by the host system. The
size of the file-control-block is given by the system dependent
constant FCBSIZE. It cannot be moved away from the address where
it was open.
Examples:
FILE MYFILE
" B:TEST.ASM" MYFILE OPENI
The first line defines the static file-control-block MYFILE
to be used for file access. The second line connects that file-
control-block to the file named B:TEST.ASM by the host system.
Once connected, the word MYFILE can be used to access the open
file until closed.
: TEST
FCBSIZE RESERVE >R
" SAMPLE.DATA,TB" R@ OPENI
...
R> CLOSE ...
FCBSIZE FREE ;
The first line reserves storage from the aggregate stack for
file-control-block use (system dependent size FCBSIZE) and saves
the block address to the return stack for later file references.
This method relieves the dictionary from voluminous static
allocations.
The words OPENI, OPENO and OPENU are given a character
string and a file-control-block address to connect the latter to
a file. The contents of the string is host system specific and
follows the host system naming conventions. It conveys such
things as the filename, the drive identification, a directory
path, the file attributes, access options etc...
OPENI is used to access a file for input only. For
successful execution, the file must exist.
OPENO is used to access a file for output only, if the file
exists, it is erased. A new (empty) file is created.
OPENU is used for both reading and writing, if the file does
not exist, it is created.
Open functions return a value on the stack. If the operation
is successful, this value is zero and file access can start.
Else, the value can correspond to a host specific return code or
simply be TRUE, and the file-control-block cannot be used for any
file operation other than another open attempt.
The string of bytes making a file is numbered from zero to
the capacity of the host system. The value of a byte is defined
by writing. Later reading the file with the same byte number will
return the same value as the last write. Reading any unwritten
byte either returns an undefined value or produces an I/O error
depending on the particular system and the byte address. The word
INDATA returns a flag indicating, before read operations, if the
end of allocated external storage has been reached. Allocated
storage does not mean however that the current byte is defined.
The file can be read or written sequentially by repeatedly
using the words GET or PUT. After sequential reading or writing,
the next byte is accessed. The position in the file can be known
or changed at any time by NOTE and POINT, making direct access
possible.
Byte access is not efficient. To speed up file processing,
string operations are provided. WRITE writes a sequence of bytes
of given address and length. READ similarly retrieves a given
amount of data to a specified address and returns the amount
actually read. If that amount differs from that requested and no
I/O error occurred, the end of file has been reached. This allows
for fast bulk reading as for file copy.
A particular type of sequential files contains a sequence of
text lines and is named an ASCII file. Different systems use
different conventions to mark the end of ASCII records and files.
Comforth defines a system independent interface to process them.
When writing an ASCII file, the word PUTEOR will place a
record separator, the word PUTEOF will place an end-of-file
marker.
When reading an ASCII file, being at the end of file can be
tested by INFILE. INFILE returns a true value if the current byte
is not the end-of-file marker and if the external storage is
allocated. GETLINE reads the file up to the next end of record,
given a maximum record length and returns the amount actually
read and a flag indicating if the end of record was reached.
GETLINE input is a buffer-string and a file-control-block, its
output is a data-string and a flag. Example:
filename fcb OPENI OPEN?
BEGIN fcb INFILE WHILE
buffer-address buffer-size fcb GETLINE EXCESS? TYPE CR
REPEAT
fcb CLOSE CLOSED?
Any file read or write error (e. g. I/O, external storage or
directory shortage) will cause the issue of an error message and
the execution of ABORT. When such a brutal disruption of program
flow is not desirable, the error notification can be delayed by
MUTEIOER to a test made by invoking IOERR. No data should be used
before testing with IOERR and the use of IOERR is mandatory after
each data entity (even if not used) to clear the condition before
the next file operation. If no error occurred, IOERR returns
zero, else a TRUE flag or a system specific code.
When an I/O error is pending before or occurs during close,
CLOSE returns either a true flag or a system specific code.
Some host systems do not check for filename syntax at open
time. This can lead to creating files whose name contain invalid
characters rendering them impossible to manage (e. g. to delete)
by the normal host system commands. On such systems, OPEN may
either return a special return code or make system-like
modifications to the filename (e. g. uppercase).
When feasible, GETLINE as well as the other sequential I/O
primitives are implemented for character devices (example: RS232)
that can be opened as files by the host operating system, using
its same rules for such files access, filenames or preparatory
procedures, such as configuring a baud rate prior to use or
specifying it in the filename. On character devices, the file
system must pre-read a character so that INFILE can signal the
end-of-file before the next record is read.
The above words are defined in a glossary whose title lines have
been reproduced here only for the stack behaviour.
FILE -- (definition)
-- fcba (defined word execution)
FCBSIZE -- size
OPENI fna fnl fcba -- code
OPENO fna fnl fcba -- code
OPENU fna fnl fcba -- code
OPEN? flag --
CLOSE fcba -- code
CLOSED? flag --
GET fcba -- char
READ addr len1 fcba -- len2
GETLINE addr len1 fcba -- addr len2 flag
EXCESS? flag --
PUT char fcba --
WRITE addr len fcba --
PUTEOR fcba --
PUTEOF fcba --
INDATA fcba -- flag
INFILE fcba -- flag
POINT ud fcba --
NOTE fcba -- ud
MUTEIOER fcba --
IOERR fcba -- code
DELETE fna fnl -- code
RENAME fna1 fnl1 fna2 fnl2 -- code
Comments: I find this system suitable to cover most programming
needs. Some issues are not covered, like file sharing options and
record locking, but these can be added as special words to be
used before and after OPEN, some having a null action in systems
where the facility is not available.
The file control block concept is suitable to all systems. Some
systems use handles that can be stored in a file control block.
The opposite (managing system control blocks when using Forth
handles) is more difficult to achieve.
Some file system access use the concept of "currently accessed
file" to be changed to access another file. This is not adequate,
because it presumes that some words do not use file I/O, which is
certainly not the case if their execution is traced to a file.
It would be better that GETLINE be called READLINE and that READ
implement the buffer-input string-output by also returning the
address and count of the data.
Error recovery
--------------
Error recovery is an essential feature for strong
programming. Let us start with some definitions.
An error is a detected unusual condition preventing further
execution of a procedure. It can result of any event such as
hardware errors, data validation or user input action. The action
of the procedure detecting the error may be to ABORT the process
or to alert its caller by returning a condition code.
Continued execution of a process requires that it be not
ABORTed, but passing and testing return codes must be done at
each level of call. Such short programming units as usually found
in Forth could easily be more than doubled in complexity and size
if done so. On the other hand, even if ABORTing a process can be
accepted, for example in interactive mode, some procedures may
need to receive control to restore some system state they have
modified. For example a procedure that opened a file should close
it, else after a series of ABORTs, the system may be left with
too many open files to proceed normally.
The solution to both problems is found in ABORT recovery or,
by usual terminology, (ON)ERROR recovery. A procedure protecting
its execution with ONERROR receives control back despite ABORT.
It may either perform some cleanup of its own activity and ripple
the abort condition to its caller or provide continued execution.
Forth error recovery is just another concept of a control
structure, much like an IF-THEN-ELSE. It saves control data in a
variable ERRP (error recovery recording pointer, which again must
be zeroed on system initialization and restart) and on the return
stack. It takes care of cleaning the stacks so that a recovery
path be executed with a predictable stack depth.
ONERROR :C
DURING :C
NOERROR :C
Used in a colon definition in the form:
ONERROR recovery-words DURING protected-words NOERROR
When ONERROR is executed, the addresses of the previous
ONERROR environment, of the recovery words and the data, return
and aggregate stacks levels are pushed on the return stack and
make the new ONERROR environment whose address is stored in ERRP.
Then control is given to the words after DURING, the protected
words.
When NOERROR is reached, the ONERROR environment is
discarded and ERRP restored to its previous value.
If ABORT is invoked during the execution of the protected
words, the ONERROR environment is similarly discarded. The 3
stacks (levels) are restored to the same depth as when ONERROR
was executed, and the recovery-words receive control. When DURING
is reached, the words after NOERROR are executed.
The ONERROR recovery is a powerful means to protect a
program sequence from losing control when it altered and should
restore some critical system state. The only implemented ways to
shortcut recovery are QUIT, WARM and COLD which should be used in
extreme cases only during program development.
These words are fully structured and can be nested within
themselves or other structures. The two paths should have an
identical data stack behavior as for an IF ELSE THEN structure.
During the protected section, values are pushed onto the return
stack, and it is implicitly subject to the same rules concerning
the return stack as the DO loop (no EXIT or access to other
return stack values).
>ABORT -- addr :U
ABORT
If an ONERROR recovery environment is active, restore the stack
levels and the previous environment and give control to the
recovery section (see ONERROR). Else, execute the word whose cfa
is in the user variable >ABORT, normally QUIT in a development
system and SYSTEM in an application module.
ONERROR Implementation
----------------------
Compilation part:
: ONERROR \ compile (ONERROR) followed by an offset to DURING
COMPILE (ONERROR) >MARK 6 ; IMMEDIATE
: DURING \ fills the above offset and compile BRANCH and offset
4 - [COMPILE] ELSE 4 + ; IMMEDIATE
: NOERROR \ compile (NOERROR) and fill the above offset
COMPILE (NOERROR) 4 - [COMPILE] THEN ; IMMEDIATE
Execution part:
xx USER ERRP \ Error recovery recording pointer
: (ONERROR) ( Establish error environment )
R> DUP 2+ >R ( Push error return, after inline offset )
AP@ >R SP@ 2+ >R ( Checkpoint stack pointers )
ERRP @ >R RP@ ERRP ! ( replaced by our own )
DUP @ + >R ; ( Return after DURING )
: (NOERROR) ( Restore previous ONERROR environment )
R> R> ERRP ! ( Restore previous environment pointer )
RP@ 6 + RP! >R ; ( Drop stack pointers and return )
: ABORT \ modified to support ONERROR
ERRP @ ?DUP \ nonzero if error recovery is active
IF RP! R> ERRP ! R> SP! R> AP! EXIT THEN ( Error retry )
>ABORT PERFORM ;
Programming examples
--------------------
Here are some examples copied from the Comforth system itself.
255 CONSTANT MAXLL \ maximum line length for editor or FLOADing
xx USER LFCB \ current FLOAD file control block
: FLOADER \ fna fnl -- \ load host file with filename string
LFCB @ >R \ nest FLOAD, save file
FCBSIZE MAXLL + RESERVE DUP >R \ FCB and buffer space
OPENI OPEN? R> LFCB ! \ OPEN host file
ONERROR TRUE \ indicate error to exit
DURING
BEGIN LFCB @ INFILE WHILE \ test for end of file
LFCB @ FCBSIZE + MAXLL \ addr len of local buffer
LFCB @ GETLINE EXCESS? \ read next line to buffer
EVALUATE \ preserve input stream and interpret string read
REPEAT
FALSE \ indicate no error occurred
NOERROR \ following is cleanup, always executed
LFCB @ CLOSE \ close our file
R> LFCB ! \ restore nested FLOAD file control block
FCBSIZE MAXLL + FREE \ free our local data space
CLOSED? \ ABORT if our CLOSE failed
ABORT" " ; \ or ripple ABORT after cleanup
: FLOAD \ execute FLOADER with filename from input stream
"TOKEN FLOADER ;
\ Note: : TOKEN ( -- addr size ) BL WORD COUNT ;
\ : "TOKEN ( -- addr size ) ... ;
\ similar, but the input stream token can be enclosed in quotes,
\ allowing for blanks in filename.
: EDIT \ [<filename>] -- edit, load if requested, reedit if error
BEGIN
EDITOR \ invoke editor
IF \ it requests interpreting the file buffer
HERE >R \ checkpoint dictionary depth to forget on error
ONERROR
U0 ->RESET \ restore standard user variables, e. g. compiling off
R@ DP ! \ restore dictionary depth as before SLOAD
DISPOSE \ forget temporary words, including above restored DP
>SOURCE PERFORM L>IN @ + \ buffer address of error, hopefully
DUP A0 @ DTOP UWITHIN \ in edit buffer, not EVALUATEd
IF MAXLL BL SKIP DROP \ nice alignment to word
CURSOR.TO.CHARACTER THEN \ and place editor cursor there
1 WORD DROP \ flush input stream so no edit filename
CR ." Re-edit ? " REPLY ASCII Y <> \ ask user if he wants reedit
DURING
SLOAD \ interpret file buffer
CR ." Remember to SAVE "
TRUE \ no error, do not loop
NOERROR
R> DROP
ELSE TRUE THEN \ no debug request
UNTIL ;
: STOP \ action word used at breakpoint to suspend execution
SUSPEND HLD @ >R STATE @ >R SPAN @ >R \ save 8 program environment words
[COMPILE] [ \ make sure we are in interpretation mode
HERE HISHERE 80 CMOVE \ and save the area we destroy around PAD
BEGIN \ special interpreter loop
HOME.PFA @ CR ." AT " DUP BODY> >NAME NAME TYPE \ tell patched word
CALL.POINT SWAP - ." +" U. \ and offset within
CALL.POINT @ DUP HINGE.CFA =
IF DROP SAVED.CALL @ THEN >NAME NAME TYPE \ tell referenced word
." : " .S CR \ and stack contents
MYTIB DUP 80 EXPECT \ obtain user command line
SPAN @ DUP
IF NEWSTREAM \ set input stream to non-null line
ONERROR
CR ." Oops!!!" \ keep from losing test environment on user error
DURING
INTERPRET \ user input
NOERROR
ELSE 2DROP TRUE EXIT.TYPE DO.STEP B! THEN \ null, set stepping flag
EXIT.TYPE @ UNTIL \ loop until CONT, STEP or FREERUN commands executed
HISHERE HERE 80 CMOVE \ restore all what was saved and exit to breakpoint
R> SPAN ! R> STATE ! R> HLD ! RESUME ; \ manager to resume at patch
P. S.
Comforth is the work of the Southern Belgium Forth Chapter to whom
I belong. I am writing on their behalf, because they wish to have
some key ideas of our favourite tool shared to the Forth community.
And I sure liked to reflect our enthusiasm for Forth as well.
I hope my limited English will have done it.
Andre.ZMLEB@SCFVM.BITNET (Lee Brotzman) (07/02/88)
-------------------- Date: Thursday, 30 June 1988 0910-EST From: DAVID@PENNDRLS Subject: Re: FORTH in an operating system Thanks for a wonderful article, Andre! The ONERROR concept is a real gem. One of those things that once explained elicits the reaction 'Of course! That should have been obvious!' But it wasn't. I've been banging my head against the problems ONERROR solves for some little time, and am greatful to have a solution. The aggregate stack is also something I will probably adopt. Is Comforth available in some fashion? If so, and if the price isn't too high, I'd be interested in obtaining a copy. It sounds like good work. Your points about FORTH being handicapped by the perception that it is an operating system unto itself and therefore does not get consistently integrated with the host operating system is very well taken. I would certainly like to see the standards committee define a superset of the kernel that includes the syntax of words for accessing operating system functions. Even a small set like your FILE set, and simple things like TIME and DATE would go a long way toward making FORTH programs more transportable. How about standards for optional lexicons analogous to the standard sets of library routines for C? One minor operating-system-integration issue that I wonder if the standards committees have ever addressed is that of host character set. Operating as I do primarily on IBM equipment, I balked at having the word ASCII to generate the EBCDIC code for a character. I use CODEPOINT instead. I also define words to translate from both EBCDIC and ASCII into whatever the host character set happens to be, and vice-versa. On a given system, a pair of these operations are non-operations, but it does provide for character set independence. -- R. David Murray (DAVID@PENNDRLS.BITNET, DAVID@PENNDRLS.UPENN.EDU)