[comp.lang.forth] Forth in an operating system

ZMLEB@SCFVM.BITNET (Lee Brotzman) (07/02/88)

The following message was submitted to BITNET's Forth Interest Group
International List (FIGIL).  It is pretty long so beware.

-- Lee Brotzman (FIGIL Moderator)

--------------------
Date:         Mon, 20 Jun 88 18:50:37 +0200
From:         Andre PIRARD <A-PIRARD@BLIULG11>
Subject:      Forth in an operating system


     In this note, I shall try to make the difference between the
Forth  "language" and a Forth "implementation".  It is  regretful
for portability that the Forth standard does not define the  user
interface to implement some vital  functions,  in  particular the
access to the host operating system functions.  But I'll show how
easily it can be done and how an implementation has done it for a
wide variety of different systems.

     I have often read on this list questions asking why should I
choose Forth instead of another language.  Explaining why is very
difficult indeed.  What is called the "Forth language" is such  a
widespread concept that the language itself (even the syntax) can
be  extended to something quite different and that the  functions
(words)  implemented  can  lead  to  many  different  programming
environments,  each  oriented to a specific purpose.  Choosing  a
"traditional"  language  is most often accepting a  compiler  for
what it can do and how it does it. Accepting the Forth philosophy
is  to  choose to have a system do what you want and the way  you
like it.

     But  while the Forth philosophy is a dream to the  languages
internals specialist or the system programmers,  many programmers
however  have no time nor taste to spend on that.  Choosing Forth
for  them is choosing an implementation to do the work or  better
finding  already made code that almost does what they  want.  And
what  they  want is probably read one of their host system  file,
use  their communication lines,  get the time of day  and  things
like that.

     Well,  a good Forth implementation can be a very comfortable
development  system,  make programming amazingly easy  and  allow
getting the most of the Forth language versatility. But the Forth
standard  has  goals that hinder the attractiveness of  Forth  to
some users.  Specifically: 1) to keep the nucleus (required words
set)  as  small  as possible to make it usable  for  applications
where  storage size is essential 2) to leave the most freedom  to
an implementor or user imagination and allow experimentation.

     What the standard allows to be called a Forth implementation
can  be very minimal,  very portable,  but awkward to use and  of
little use in an operating system environment. On the other hand,
some  vendors  have built on the Forth bases to provide the  same
functions  that  are available  in  other  languages.  Unhappily,
compatibility between them is almost null,  unless for  different
versions of the same implementation on different machines. So, it
all  depends on what one expects from compatibility.  Pure  Forth
will always be transportable. On the other hand, one cannot avoid
machine  dependent  specifics to be adapted,  but they should  be
confined  to  a  well documented separate  source  file.  But  in
between,  many  functions are common to all operating systems and
should be a standard feature of operating systems oriented Forth.

     At  the time an ANSI standard is being worked on,  it should
be  realized  that  most  of the  programs  are  written  for  an
operating  system  environment.  An operating system is a  bridge
between applications that communicate trough  files.  Considering
that  Forth is its own operating system isolates it from the huge
world  of other applications and reduces its acceptance  to  very
specific usage.  In addition to a minimal required word set,  the
Forth  standard should define additional layers describing how to
implement  different orientation of the  language.  An  operating
system interface is a major one.

     For several years,  I've been working with Comforth, a Forth
implementation  available and implementing a common  file  access
system for such various systems as MSDOS,  CPM/86, CPM/80, Amiga,
Atari ST, Apple-Dos, Commodore 64, Commodore 128 and Sinclair QL.
Having  a  common  file system is really a trigger  towards  many
other  facilities,  because  their source code  can  be  strictly
identical,  like  those  found in Comforth for all of  the  above
systems:

- Forth  source code can be stored in host system variable length
named sequential ASCII files.  These take much less storage  than
storing  in blocks screens,  are much easier to update (no screen
splitting problems) and to manage (no screen numbers to  maintain
nor allocate, files are given expressive names).

- An  source file fullscreen editor is available in a relocatable
overlay so that one is able to edit source files from within  the
Forth  development  system itself.  This not only allows  editing
source files without getting out of the Forth development system,
but also allows to request compilation from within the editor. If
a compilation error is detected, the editor can be reentered with
the cursor positioned at the point of the error.  The  fullscreen
management  words  used by the editor are easily customizable  to
any hardware, but also available to applications.

- The source files can be interpreted by FLOAD which is recursive
so that application files can be organized hierarchally.

- The  host  system files can also be maintained from within  the
development system (DIR ERA DEL REN FTYPE FCOPY FPRINT DOS).

- The  blocks system itself (seldom used but necessary to  comply
to  the   standard) is common to all systems because it uses  the
files access primitives.

- Utilities can be pre-compiled and stored in overlay files  that
can be loaded very fast at any address in the development  system
and  discarded when no longer needed.  Overlay files are used for
such  things  as assemblers,  smart decompiler and  a  breakpoint
facility  (setting a break point anywhere in an already  compiled
word and specifying the action word to be executed,  by default a
special  interpreter  displaying the stack and where stopped  and
interpreting any data display/correction commands; strokes of the
return key provide step by step tracing).

- The nucleus can be augmented with any desirable extension  (for
example the overlay files) to tailor a development system that is
saved as a new module by the FORTHGEN utility.

- When  an  application  has  been written and  tested  with  the
development system,  a post-compiled module can be generated with
the command: "COMGEN source-file main-word module-file map-file".
COMGEN loads the application,  selects only the words it uses and
relocates them to an executable file whose size is  minimal.  The
names  of words and dictionary structure normally disappear,  but
names for a selected set of words can be preserved so they can be
used for interpretation or keywords scan.

     Comforth-83  provides  a  lot of  other  features  (floating
point,  TEMPORARY/HEADERLESS,  chaining,  input line editing  and
recall, etc...) which are not my point here and would be too long
to explain anyway.

     The  rest of this text will focus on techniques that I  feel
should be part of an operating system oriented Forth.


Local storage management
------------------------

     The  Forth  standard defines how to  allocate  storage,  but
that's  only  global storage in the dictionary.  Suppose  a  word
receives a filename (address and size) from its caller and has to
use  it to call a host system function,  but that the  particular
system call needs a string terminated with with a null character.
Or  imagine  that a word reading a file must use a  file  control
block.  These  are just two examples of the many situations where
temporary  storage is needed.  The words to implement the base of
local  storage  are  very  simple  indeed,  but  they  should  be
integrated  in the Forth system because a pointer (AP,  aggregate
stack pointer) needs to be initialized or reset. Local storage is
essential  LIFO  storage  and  a  quick  to  maintain  stack.  An
additional  heap  is even better,  but it does not  replace  LIFO
storage,  because  it is slower and cannot be automatically freed
so easily as a stack.

xx USER AP  \ running pointer to the top of aggregate stack
xx USER A0  \ initialized to the bottom of the aggregate stack

: AP@ \ -- addr \ address of top of aggregate stack
   AP @ ;

: AP! \ addr -- \ initialize aggregate stack
   1- EVEN AP ! ;

: RESERVE \ size -- addr \ allocate 'size' bytes of LIFO storage
   EVEN ?ROOM AP -! AP@ ;

: FREE \ size -- \ free
   EVEN AP +! ;

Comments:  How simple.  Only I don't like the name of A0, because
it  may  occasionally  conflict  with what is intended  to  be  a
hexadecimal  constant,  even if starting constants with  anything
else than a digit is bad practice. Examples of use below.


The file system
---------------

     A  file can be viewed as a string of numbered bytes kept  on
named external storage, a host system file.

     The  access  to  a particular system file  storage  is  made
available  by  the use of a file-control-block,  a  special  data
element  defined by the word 'FILE' or located at an  address  in
dynamically  reserved  storage.  Once  defined,  such a  word  or
address  can  be  used  to OPEN a file  referenced  by  a  string
containing  the same name as that used by the  host  system.  The
size  of the file-control-block is given by the system  dependent
constant FCBSIZE.  It cannot be moved away from the address where
it was open.

     Examples:

FILE MYFILE
" B:TEST.ASM" MYFILE OPENI

     The first line defines the static file-control-block  MYFILE
to  be used for file access.  The second line connects that file-
control-block  to the file named B:TEST.ASM by the  host  system.
Once  connected,  the word MYFILE can be used to access the  open
file until closed.

: TEST
   FCBSIZE RESERVE >R
   " SAMPLE.DATA,TB" R@ OPENI
   ...
   R> CLOSE ...
   FCBSIZE FREE ;

     The first line reserves storage from the aggregate stack for
file-control-block  use (system dependent size FCBSIZE) and saves
the block address to the return stack for later file  references.
This  method  relieves  the  dictionary  from  voluminous  static
allocations.

     The  words  OPENI,  OPENO and OPENU are  given  a  character
string  and a file-control-block address to connect the latter to
a  file.  The contents of the string is host system specific  and
follows  the  host system naming  conventions.  It  conveys  such
things  as the filename,  the drive identification,  a  directory
path, the file attributes, access options etc...

     OPENI  is  used  to  access  a  file  for  input  only.  For
successful execution, the file must exist.
     OPENO is used to access a file for output only,  if the file
exists, it is erased. A new (empty) file is created.
     OPENU is used for both reading and writing, if the file does
not exist, it is created.

     Open functions return a value on the stack. If the operation
is  successful,  this  value is zero and file access  can  start.
Else,  the value can correspond to a host specific return code or
simply be TRUE, and the file-control-block cannot be used for any
file operation other than another open attempt.

     The  string of bytes making a file is numbered from zero  to
the  capacity of the host system.  The value of a byte is defined
by writing. Later reading the file with the same byte number will
return  the same value as the last write.  Reading any  unwritten
byte  either returns an undefined value or produces an I/O  error
depending on the particular system and the byte address. The word
INDATA returns a flag indicating,  before read operations, if the
end  of allocated external storage has  been  reached.  Allocated
storage does not mean however that the current byte is defined.

     The  file can be read or written sequentially by  repeatedly
using the words GET or PUT.  After sequential reading or writing,
the next byte is accessed.  The position in the file can be known
or  changed at any time by NOTE and POINT,  making direct  access
possible.

     Byte  access is not efficient.  To speed up file processing,
string operations are provided.  WRITE writes a sequence of bytes
of  given  address and length.  READ similarly retrieves a  given
amount  of  data to a specified address and  returns  the  amount
actually read.  If that amount differs from that requested and no
I/O error occurred, the end of file has been reached. This allows
for fast bulk reading as for file copy.

     A particular type of sequential files contains a sequence of
text  lines  and is named an ASCII file.  Different  systems  use
different conventions to mark the end of ASCII records and files.
Comforth defines a system independent interface to process them.

     When  writing  an ASCII file,  the word PUTEOR will place  a
record  separator,  the  word PUTEOF will  place  an  end-of-file
marker.

     When reading an ASCII file,  being at the end of file can be
tested by INFILE. INFILE returns a true value if the current byte
is  not  the  end-of-file marker and if the external  storage  is
allocated.  GETLINE reads the file up to the next end of  record,
given  a  maximum record length and returns the  amount  actually
read  and  a flag indicating if the end of  record  was  reached.
GETLINE  input is a buffer-string and a  file-control-block,  its
output is a data-string and a flag. Example:

   filename fcb OPENI OPEN?
   BEGIN fcb INFILE WHILE
      buffer-address buffer-size fcb GETLINE EXCESS? TYPE CR
   REPEAT
   fcb CLOSE CLOSED?

     Any file read or write error (e. g. I/O, external storage or
directory  shortage) will cause the issue of an error message and
the execution of ABORT.  When such a brutal disruption of program
flow is not desirable,  the error notification can be delayed  by
MUTEIOER to a test made by invoking IOERR. No data should be used
before testing with IOERR and the use of IOERR is mandatory after
each data entity (even if not used) to clear the condition before
the  next  file operation.  If no error occurred,  IOERR  returns
zero, else a TRUE flag or a system specific code.

     When  an I/O error is pending before or occurs during close,
CLOSE returns either a true flag or a system specific code.

     Some  host systems do not check for filename syntax at  open
time.  This can lead to creating files whose name contain invalid
characters rendering them impossible to manage (e.  g. to delete)
by  the normal host system commands.  On such systems,  OPEN  may
either   return  a  special  return  code  or  make   system-like
modifications to the filename (e. g. uppercase).

     When feasible,  GETLINE as well as the other sequential  I/O
primitives are implemented for character devices (example: RS232)
that  can be opened as files by the host operating system,  using
its  same rules for such files access,  filenames or  preparatory
procedures,  such  as  configuring a baud rate prior  to  use  or
specifying  it in the filename.  On character devices,  the  file
system  must  pre-read a character so that INFILE can signal  the
end-of-file before the next record is read.

The  above words are defined in a glossary whose title lines have
been reproduced here only for the stack behaviour.

FILE      --      (definition)
          -- fcba (defined word execution)
FCBSIZE   -- size
OPENI     fna fnl fcba -- code
OPENO     fna fnl fcba -- code
OPENU     fna fnl fcba -- code
OPEN?     flag --
CLOSE     fcba -- code
CLOSED?   flag --
GET       fcba -- char
READ      addr len1 fcba -- len2
GETLINE   addr len1 fcba -- addr len2 flag
EXCESS?   flag --
PUT       char fcba --
WRITE     addr len fcba --
PUTEOR    fcba --
PUTEOF    fcba --
INDATA    fcba -- flag
INFILE    fcba -- flag
POINT     ud fcba --
NOTE      fcba -- ud
MUTEIOER  fcba --
IOERR     fcba -- code
DELETE    fna fnl -- code
RENAME    fna1 fnl1 fna2 fnl2 -- code

Comments:  I find this system suitable to cover most  programming
needs. Some issues are not covered, like file sharing options and
record  locking,  but  these can be added as special words to  be
used before and after OPEN,  some having a null action in systems
where the facility is not available.

The file control block concept is suitable to all  systems.  Some
systems  use handles that can be stored in a file control  block.
The  opposite  (managing system control blocks when  using  Forth
handles) is more difficult to achieve.

Some  file  system access use the concept of "currently  accessed
file" to be changed to access another file. This is not adequate,
because it presumes that some words do not use file I/O, which is
certainly not the case if their execution is traced to a file.

It  would be better that GETLINE be called READLINE and that READ
implement  the  buffer-input string-output by also returning  the
address and count of the data.


Error recovery
--------------

     Error   recovery   is  an  essential  feature   for   strong
programming. Let us start with some definitions.

     An  error is a detected unusual condition preventing further
execution  of  a procedure.  It can result of any event  such  as
hardware errors, data validation or user input action. The action
of the procedure detecting the error may be to ABORT the  process
or to alert its caller by returning a condition code.

     Continued  execution  of a process requires that it  be  not
ABORTed,  but  passing and testing return codes must be  done  at
each level of call. Such short programming units as usually found
in Forth could easily be more than doubled in complexity and size
if done so.  On the other hand, even if ABORTing a process can be
accepted,  for  example in interactive mode,  some procedures may
need  to receive control to restore some system state  they  have
modified. For example a procedure that opened a file should close
it,  else  after a series of ABORTs,  the system may be left with
too many open files to proceed normally.

     The solution to both problems is found in ABORT recovery or,
by usual terminology,  (ON)ERROR recovery. A procedure protecting
its  execution with ONERROR receives control back despite  ABORT.
It may either perform some cleanup of its own activity and ripple
the abort condition to its caller or provide continued execution.

     Forth  error recovery is just another concept of  a  control
structure,  much like an IF-THEN-ELSE. It saves control data in a
variable ERRP (error recovery recording pointer, which again must
be zeroed on system initialization and restart) and on the return
stack.  It  takes care of cleaning the stacks so that a  recovery
path be executed with a predictable stack depth.

ONERROR   :C
DURING    :C
NOERROR   :C

     Used in a colon definition in the form:
ONERROR recovery-words DURING protected-words NOERROR

     When  ONERROR  is executed,  the addresses of  the  previous
ONERROR environment,  of the recovery words and the data,  return
and  aggregate  stacks levels are pushed on the return stack  and
make the new ONERROR environment whose address is stored in ERRP.
Then  control is given to the words after DURING,  the  protected
words.
     When  NOERROR  is  reached,   the  ONERROR  environment   is
discarded and ERRP restored to its previous value.
     If  ABORT  is invoked during the execution of the  protected
words,  the  ONERROR environment is similarly  discarded.  The  3
stacks  (levels)  are restored to the same depth as when  ONERROR
was executed, and the recovery-words receive control. When DURING
is reached, the words after NOERROR are executed.
     The  ONERROR  recovery  is a powerful  means  to  protect  a
program  sequence from losing control when it altered and  should
restore some critical system state.  The only implemented ways to
shortcut recovery are QUIT, WARM and COLD which should be used in
extreme cases only during program development.
     These  words  are fully structured and can be nested  within
themselves  or  other structures.  The two paths should  have  an
identical  data stack behavior as for an IF ELSE THEN  structure.
During the protected section,  values are pushed onto the  return
stack,  and it is implicitly subject to the same rules concerning
the  return  stack  as the DO loop (no EXIT or  access  to  other
return stack values).

>ABORT    -- addr  :U
ABORT
If  an ONERROR recovery environment is active,  restore the stack
levels  and  the  previous environment and give  control  to  the
recovery section (see ONERROR).  Else, execute the word whose cfa
is  in the user variable >ABORT,  normally QUIT in a  development
system and SYSTEM in an application module.


ONERROR Implementation
----------------------

Compilation part:

: ONERROR \ compile (ONERROR) followed by an offset to DURING
   COMPILE (ONERROR) >MARK 6 ; IMMEDIATE

: DURING \ fills the above offset and compile BRANCH and offset
   4 - [COMPILE] ELSE 4 + ; IMMEDIATE

: NOERROR \ compile (NOERROR) and fill the above offset
   COMPILE (NOERROR) 4 - [COMPILE] THEN ; IMMEDIATE

Execution part:

xx USER ERRP  \ Error recovery recording pointer

: (ONERROR) ( Establish error environment )
   R>  DUP 2+ >R        ( Push error return, after inline offset )
   AP@ >R  SP@ 2+ >R    ( Checkpoint stack pointers )
   ERRP @ >R RP@ ERRP ! ( replaced by our own )
   DUP @ + >R ;         ( Return after DURING )

: (NOERROR) ( Restore previous ONERROR environment )
   R>  R> ERRP !     ( Restore previous environment pointer )
   RP@ 6 + RP!  >R ; ( Drop stack pointers and return )

: ABORT \ modified to support ONERROR
   ERRP @ ?DUP \ nonzero if error recovery is active
   IF RP!  R> ERRP !  R> SP!  R> AP!  EXIT THEN ( Error retry )
   >ABORT PERFORM ;


Programming examples
--------------------

Here are some examples copied from the Comforth system itself.

255 CONSTANT MAXLL \ maximum line length for editor or FLOADing
xx  USER     LFCB  \ current FLOAD file control block

: FLOADER \ fna fnl -- \ load host file with filename string
   LFCB @ >R                      \ nest FLOAD, save file
   FCBSIZE MAXLL + RESERVE DUP >R \ FCB and buffer space
   OPENI OPEN?  R> LFCB !         \ OPEN host file
   ONERROR  TRUE  \ indicate error to exit
   DURING
      BEGIN   LFCB @ INFILE WHILE \ test for end of file
         LFCB @ FCBSIZE +  MAXLL  \ addr len  of local buffer
         LFCB @  GETLINE EXCESS?  \ read next line to buffer
         EVALUATE \ preserve input stream and interpret string read
      REPEAT
      FALSE       \ indicate no error occurred
   NOERROR        \ following is cleanup, always executed
   LFCB @ CLOSE   \ close our file
   R> LFCB !      \ restore nested FLOAD file control block
   FCBSIZE MAXLL + FREE \ free our local data space
   CLOSED?        \ ABORT if our CLOSE failed
   ABORT" " ;     \ or ripple ABORT after cleanup

: FLOAD \ execute FLOADER with filename from input stream
   "TOKEN FLOADER ;

\ Note: : TOKEN  ( -- addr size )  BL WORD COUNT ;
\       : "TOKEN ( -- addr size )  ... ;
\ similar, but the input stream token can be enclosed in quotes,
\ allowing for blanks in filename.

: EDIT \ [<filename>] -- edit, load if requested, reedit if error
   BEGIN
      EDITOR    \ invoke editor
      IF        \ it requests interpreting the file buffer
         HERE >R \ checkpoint dictionary depth to forget on error
         ONERROR
            U0 ->RESET \ restore standard user variables, e. g. compiling off
            R@ DP !    \ restore dictionary depth as before SLOAD
            DISPOSE    \ forget temporary words, including above restored DP
            >SOURCE PERFORM L>IN @ + \ buffer address of error, hopefully
            DUP A0 @ DTOP UWITHIN \ in edit buffer, not EVALUATEd
            IF MAXLL BL SKIP DROP \ nice alignment to word
                CURSOR.TO.CHARACTER THEN \ and place editor cursor there
            1 WORD DROP           \ flush input stream so no edit filename
            CR ." Re-edit ? " REPLY ASCII Y <> \ ask user if he wants reedit
         DURING
            SLOAD      \ interpret file buffer
            CR ." Remember to SAVE  "
            TRUE       \ no error, do not loop
         NOERROR
        R> DROP
      ELSE TRUE THEN \ no debug request
   UNTIL ;


: STOP \ action word used at breakpoint to suspend execution
   SUSPEND HLD @ >R STATE @ >R  SPAN @ >R \ save 8 program environment words
   [COMPILE] [            \ make sure we are in interpretation mode
   HERE HISHERE 80 CMOVE  \ and save the area we destroy around PAD
   BEGIN                  \ special interpreter loop
      HOME.PFA @  CR ." AT " DUP BODY> >NAME NAME TYPE \ tell patched word
      CALL.POINT SWAP - ." +" U.                       \ and offset within
      CALL.POINT @ DUP HINGE.CFA =
      IF DROP SAVED.CALL @ THEN >NAME NAME TYPE        \ tell referenced word
      ."  : " .S CR         \ and stack contents
      MYTIB DUP 80 EXPECT   \ obtain user command line
      SPAN @ DUP
      IF NEWSTREAM          \ set input stream to non-null line
         ONERROR
            CR ." Oops!!!"  \ keep from losing test environment on user error
         DURING
            INTERPRET       \ user input
         NOERROR
      ELSE 2DROP  TRUE EXIT.TYPE DO.STEP B! THEN \ null, set stepping flag
      EXIT.TYPE @ UNTIL  \ loop until CONT, STEP or FREERUN commands executed
   HISHERE HERE 80 CMOVE \ restore all what was saved and exit to breakpoint
   R> SPAN !  R> STATE ! R> HLD !  RESUME ; \ manager to resume at patch

P. S.

Comforth is the work of the Southern Belgium Forth Chapter to whom
I belong. I am writing on their behalf, because they wish to have
some key ideas of our favourite tool shared to the Forth community.
And I sure liked to reflect our enthusiasm for Forth as well.
I hope my limited English will have done it.

Andre.

ZMLEB@SCFVM.BITNET (Lee Brotzman) (07/02/88)

--------------------
Date:     Thursday, 30 June 1988 0910-EST
From:     DAVID@PENNDRLS
Subject:  Re: FORTH in an operating system

Thanks for a wonderful article, Andre!  The ONERROR concept is a real
gem.  One of those things that once explained elicits the reaction
'Of course! That should have been obvious!'  But it wasn't.  I've been
banging my head against the problems ONERROR solves for some little
time, and am greatful to have a solution.  The aggregate stack is also
something I will probably adopt.

Is Comforth available in some fashion? If so, and if the price isn't too
high, I'd be interested in obtaining a copy.  It sounds like good
work.

Your points about FORTH being handicapped by the perception that it
is an operating system unto itself and therefore does not get
consistently integrated with the host operating system is very well
taken.  I would certainly like to see the standards committee define
a superset of the kernel that includes the syntax of words for
accessing operating system functions.  Even a small set like your
FILE set, and simple things like TIME and DATE would go a long
way toward making FORTH programs more transportable.  How about
standards for optional lexicons analogous to the standard sets of
library routines for C?

One minor operating-system-integration issue that I wonder if
the standards committees have ever addressed is that of host
character set.  Operating as I do primarily on IBM equipment, I
balked at having the word ASCII to generate the EBCDIC code for
a character.  I use CODEPOINT instead.  I also define words to
translate from both EBCDIC and ASCII into whatever the host
character set happens to be, and vice-versa.  On a given system,
a pair of these operations are non-operations, but it does provide
for character set independence.

-- R. David Murray    (DAVID@PENNDRLS.BITNET, DAVID@PENNDRLS.UPENN.EDU)