ZMLEB@SCFVM.BITNET (Lee Brotzman) (07/02/88)
The following message was submitted to BITNET's Forth Interest Group International List (FIGIL). It is pretty long so beware. -- Lee Brotzman (FIGIL Moderator) -------------------- Date: Mon, 20 Jun 88 18:50:37 +0200 From: Andre PIRARD <A-PIRARD@BLIULG11> Subject: Forth in an operating system In this note, I shall try to make the difference between the Forth "language" and a Forth "implementation". It is regretful for portability that the Forth standard does not define the user interface to implement some vital functions, in particular the access to the host operating system functions. But I'll show how easily it can be done and how an implementation has done it for a wide variety of different systems. I have often read on this list questions asking why should I choose Forth instead of another language. Explaining why is very difficult indeed. What is called the "Forth language" is such a widespread concept that the language itself (even the syntax) can be extended to something quite different and that the functions (words) implemented can lead to many different programming environments, each oriented to a specific purpose. Choosing a "traditional" language is most often accepting a compiler for what it can do and how it does it. Accepting the Forth philosophy is to choose to have a system do what you want and the way you like it. But while the Forth philosophy is a dream to the languages internals specialist or the system programmers, many programmers however have no time nor taste to spend on that. Choosing Forth for them is choosing an implementation to do the work or better finding already made code that almost does what they want. And what they want is probably read one of their host system file, use their communication lines, get the time of day and things like that. Well, a good Forth implementation can be a very comfortable development system, make programming amazingly easy and allow getting the most of the Forth language versatility. But the Forth standard has goals that hinder the attractiveness of Forth to some users. Specifically: 1) to keep the nucleus (required words set) as small as possible to make it usable for applications where storage size is essential 2) to leave the most freedom to an implementor or user imagination and allow experimentation. What the standard allows to be called a Forth implementation can be very minimal, very portable, but awkward to use and of little use in an operating system environment. On the other hand, some vendors have built on the Forth bases to provide the same functions that are available in other languages. Unhappily, compatibility between them is almost null, unless for different versions of the same implementation on different machines. So, it all depends on what one expects from compatibility. Pure Forth will always be transportable. On the other hand, one cannot avoid machine dependent specifics to be adapted, but they should be confined to a well documented separate source file. But in between, many functions are common to all operating systems and should be a standard feature of operating systems oriented Forth. At the time an ANSI standard is being worked on, it should be realized that most of the programs are written for an operating system environment. An operating system is a bridge between applications that communicate trough files. Considering that Forth is its own operating system isolates it from the huge world of other applications and reduces its acceptance to very specific usage. In addition to a minimal required word set, the Forth standard should define additional layers describing how to implement different orientation of the language. An operating system interface is a major one. For several years, I've been working with Comforth, a Forth implementation available and implementing a common file access system for such various systems as MSDOS, CPM/86, CPM/80, Amiga, Atari ST, Apple-Dos, Commodore 64, Commodore 128 and Sinclair QL. Having a common file system is really a trigger towards many other facilities, because their source code can be strictly identical, like those found in Comforth for all of the above systems: - Forth source code can be stored in host system variable length named sequential ASCII files. These take much less storage than storing in blocks screens, are much easier to update (no screen splitting problems) and to manage (no screen numbers to maintain nor allocate, files are given expressive names). - An source file fullscreen editor is available in a relocatable overlay so that one is able to edit source files from within the Forth development system itself. This not only allows editing source files without getting out of the Forth development system, but also allows to request compilation from within the editor. If a compilation error is detected, the editor can be reentered with the cursor positioned at the point of the error. The fullscreen management words used by the editor are easily customizable to any hardware, but also available to applications. - The source files can be interpreted by FLOAD which is recursive so that application files can be organized hierarchally. - The host system files can also be maintained from within the development system (DIR ERA DEL REN FTYPE FCOPY FPRINT DOS). - The blocks system itself (seldom used but necessary to comply to the standard) is common to all systems because it uses the files access primitives. - Utilities can be pre-compiled and stored in overlay files that can be loaded very fast at any address in the development system and discarded when no longer needed. Overlay files are used for such things as assemblers, smart decompiler and a breakpoint facility (setting a break point anywhere in an already compiled word and specifying the action word to be executed, by default a special interpreter displaying the stack and where stopped and interpreting any data display/correction commands; strokes of the return key provide step by step tracing). - The nucleus can be augmented with any desirable extension (for example the overlay files) to tailor a development system that is saved as a new module by the FORTHGEN utility. - When an application has been written and tested with the development system, a post-compiled module can be generated with the command: "COMGEN source-file main-word module-file map-file". COMGEN loads the application, selects only the words it uses and relocates them to an executable file whose size is minimal. The names of words and dictionary structure normally disappear, but names for a selected set of words can be preserved so they can be used for interpretation or keywords scan. Comforth-83 provides a lot of other features (floating point, TEMPORARY/HEADERLESS, chaining, input line editing and recall, etc...) which are not my point here and would be too long to explain anyway. The rest of this text will focus on techniques that I feel should be part of an operating system oriented Forth. Local storage management ------------------------ The Forth standard defines how to allocate storage, but that's only global storage in the dictionary. Suppose a word receives a filename (address and size) from its caller and has to use it to call a host system function, but that the particular system call needs a string terminated with with a null character. Or imagine that a word reading a file must use a file control block. These are just two examples of the many situations where temporary storage is needed. The words to implement the base of local storage are very simple indeed, but they should be integrated in the Forth system because a pointer (AP, aggregate stack pointer) needs to be initialized or reset. Local storage is essential LIFO storage and a quick to maintain stack. An additional heap is even better, but it does not replace LIFO storage, because it is slower and cannot be automatically freed so easily as a stack. xx USER AP \ running pointer to the top of aggregate stack xx USER A0 \ initialized to the bottom of the aggregate stack : AP@ \ -- addr \ address of top of aggregate stack AP @ ; : AP! \ addr -- \ initialize aggregate stack 1- EVEN AP ! ; : RESERVE \ size -- addr \ allocate 'size' bytes of LIFO storage EVEN ?ROOM AP -! AP@ ; : FREE \ size -- \ free EVEN AP +! ; Comments: How simple. Only I don't like the name of A0, because it may occasionally conflict with what is intended to be a hexadecimal constant, even if starting constants with anything else than a digit is bad practice. Examples of use below. The file system --------------- A file can be viewed as a string of numbered bytes kept on named external storage, a host system file. The access to a particular system file storage is made available by the use of a file-control-block, a special data element defined by the word 'FILE' or located at an address in dynamically reserved storage. Once defined, such a word or address can be used to OPEN a file referenced by a string containing the same name as that used by the host system. The size of the file-control-block is given by the system dependent constant FCBSIZE. It cannot be moved away from the address where it was open. Examples: FILE MYFILE " B:TEST.ASM" MYFILE OPENI The first line defines the static file-control-block MYFILE to be used for file access. The second line connects that file- control-block to the file named B:TEST.ASM by the host system. Once connected, the word MYFILE can be used to access the open file until closed. : TEST FCBSIZE RESERVE >R " SAMPLE.DATA,TB" R@ OPENI ... R> CLOSE ... FCBSIZE FREE ; The first line reserves storage from the aggregate stack for file-control-block use (system dependent size FCBSIZE) and saves the block address to the return stack for later file references. This method relieves the dictionary from voluminous static allocations. The words OPENI, OPENO and OPENU are given a character string and a file-control-block address to connect the latter to a file. The contents of the string is host system specific and follows the host system naming conventions. It conveys such things as the filename, the drive identification, a directory path, the file attributes, access options etc... OPENI is used to access a file for input only. For successful execution, the file must exist. OPENO is used to access a file for output only, if the file exists, it is erased. A new (empty) file is created. OPENU is used for both reading and writing, if the file does not exist, it is created. Open functions return a value on the stack. If the operation is successful, this value is zero and file access can start. Else, the value can correspond to a host specific return code or simply be TRUE, and the file-control-block cannot be used for any file operation other than another open attempt. The string of bytes making a file is numbered from zero to the capacity of the host system. The value of a byte is defined by writing. Later reading the file with the same byte number will return the same value as the last write. Reading any unwritten byte either returns an undefined value or produces an I/O error depending on the particular system and the byte address. The word INDATA returns a flag indicating, before read operations, if the end of allocated external storage has been reached. Allocated storage does not mean however that the current byte is defined. The file can be read or written sequentially by repeatedly using the words GET or PUT. After sequential reading or writing, the next byte is accessed. The position in the file can be known or changed at any time by NOTE and POINT, making direct access possible. Byte access is not efficient. To speed up file processing, string operations are provided. WRITE writes a sequence of bytes of given address and length. READ similarly retrieves a given amount of data to a specified address and returns the amount actually read. If that amount differs from that requested and no I/O error occurred, the end of file has been reached. This allows for fast bulk reading as for file copy. A particular type of sequential files contains a sequence of text lines and is named an ASCII file. Different systems use different conventions to mark the end of ASCII records and files. Comforth defines a system independent interface to process them. When writing an ASCII file, the word PUTEOR will place a record separator, the word PUTEOF will place an end-of-file marker. When reading an ASCII file, being at the end of file can be tested by INFILE. INFILE returns a true value if the current byte is not the end-of-file marker and if the external storage is allocated. GETLINE reads the file up to the next end of record, given a maximum record length and returns the amount actually read and a flag indicating if the end of record was reached. GETLINE input is a buffer-string and a file-control-block, its output is a data-string and a flag. Example: filename fcb OPENI OPEN? BEGIN fcb INFILE WHILE buffer-address buffer-size fcb GETLINE EXCESS? TYPE CR REPEAT fcb CLOSE CLOSED? Any file read or write error (e. g. I/O, external storage or directory shortage) will cause the issue of an error message and the execution of ABORT. When such a brutal disruption of program flow is not desirable, the error notification can be delayed by MUTEIOER to a test made by invoking IOERR. No data should be used before testing with IOERR and the use of IOERR is mandatory after each data entity (even if not used) to clear the condition before the next file operation. If no error occurred, IOERR returns zero, else a TRUE flag or a system specific code. When an I/O error is pending before or occurs during close, CLOSE returns either a true flag or a system specific code. Some host systems do not check for filename syntax at open time. This can lead to creating files whose name contain invalid characters rendering them impossible to manage (e. g. to delete) by the normal host system commands. On such systems, OPEN may either return a special return code or make system-like modifications to the filename (e. g. uppercase). When feasible, GETLINE as well as the other sequential I/O primitives are implemented for character devices (example: RS232) that can be opened as files by the host operating system, using its same rules for such files access, filenames or preparatory procedures, such as configuring a baud rate prior to use or specifying it in the filename. On character devices, the file system must pre-read a character so that INFILE can signal the end-of-file before the next record is read. The above words are defined in a glossary whose title lines have been reproduced here only for the stack behaviour. FILE -- (definition) -- fcba (defined word execution) FCBSIZE -- size OPENI fna fnl fcba -- code OPENO fna fnl fcba -- code OPENU fna fnl fcba -- code OPEN? flag -- CLOSE fcba -- code CLOSED? flag -- GET fcba -- char READ addr len1 fcba -- len2 GETLINE addr len1 fcba -- addr len2 flag EXCESS? flag -- PUT char fcba -- WRITE addr len fcba -- PUTEOR fcba -- PUTEOF fcba -- INDATA fcba -- flag INFILE fcba -- flag POINT ud fcba -- NOTE fcba -- ud MUTEIOER fcba -- IOERR fcba -- code DELETE fna fnl -- code RENAME fna1 fnl1 fna2 fnl2 -- code Comments: I find this system suitable to cover most programming needs. Some issues are not covered, like file sharing options and record locking, but these can be added as special words to be used before and after OPEN, some having a null action in systems where the facility is not available. The file control block concept is suitable to all systems. Some systems use handles that can be stored in a file control block. The opposite (managing system control blocks when using Forth handles) is more difficult to achieve. Some file system access use the concept of "currently accessed file" to be changed to access another file. This is not adequate, because it presumes that some words do not use file I/O, which is certainly not the case if their execution is traced to a file. It would be better that GETLINE be called READLINE and that READ implement the buffer-input string-output by also returning the address and count of the data. Error recovery -------------- Error recovery is an essential feature for strong programming. Let us start with some definitions. An error is a detected unusual condition preventing further execution of a procedure. It can result of any event such as hardware errors, data validation or user input action. The action of the procedure detecting the error may be to ABORT the process or to alert its caller by returning a condition code. Continued execution of a process requires that it be not ABORTed, but passing and testing return codes must be done at each level of call. Such short programming units as usually found in Forth could easily be more than doubled in complexity and size if done so. On the other hand, even if ABORTing a process can be accepted, for example in interactive mode, some procedures may need to receive control to restore some system state they have modified. For example a procedure that opened a file should close it, else after a series of ABORTs, the system may be left with too many open files to proceed normally. The solution to both problems is found in ABORT recovery or, by usual terminology, (ON)ERROR recovery. A procedure protecting its execution with ONERROR receives control back despite ABORT. It may either perform some cleanup of its own activity and ripple the abort condition to its caller or provide continued execution. Forth error recovery is just another concept of a control structure, much like an IF-THEN-ELSE. It saves control data in a variable ERRP (error recovery recording pointer, which again must be zeroed on system initialization and restart) and on the return stack. It takes care of cleaning the stacks so that a recovery path be executed with a predictable stack depth. ONERROR :C DURING :C NOERROR :C Used in a colon definition in the form: ONERROR recovery-words DURING protected-words NOERROR When ONERROR is executed, the addresses of the previous ONERROR environment, of the recovery words and the data, return and aggregate stacks levels are pushed on the return stack and make the new ONERROR environment whose address is stored in ERRP. Then control is given to the words after DURING, the protected words. When NOERROR is reached, the ONERROR environment is discarded and ERRP restored to its previous value. If ABORT is invoked during the execution of the protected words, the ONERROR environment is similarly discarded. The 3 stacks (levels) are restored to the same depth as when ONERROR was executed, and the recovery-words receive control. When DURING is reached, the words after NOERROR are executed. The ONERROR recovery is a powerful means to protect a program sequence from losing control when it altered and should restore some critical system state. The only implemented ways to shortcut recovery are QUIT, WARM and COLD which should be used in extreme cases only during program development. These words are fully structured and can be nested within themselves or other structures. The two paths should have an identical data stack behavior as for an IF ELSE THEN structure. During the protected section, values are pushed onto the return stack, and it is implicitly subject to the same rules concerning the return stack as the DO loop (no EXIT or access to other return stack values). >ABORT -- addr :U ABORT If an ONERROR recovery environment is active, restore the stack levels and the previous environment and give control to the recovery section (see ONERROR). Else, execute the word whose cfa is in the user variable >ABORT, normally QUIT in a development system and SYSTEM in an application module. ONERROR Implementation ---------------------- Compilation part: : ONERROR \ compile (ONERROR) followed by an offset to DURING COMPILE (ONERROR) >MARK 6 ; IMMEDIATE : DURING \ fills the above offset and compile BRANCH and offset 4 - [COMPILE] ELSE 4 + ; IMMEDIATE : NOERROR \ compile (NOERROR) and fill the above offset COMPILE (NOERROR) 4 - [COMPILE] THEN ; IMMEDIATE Execution part: xx USER ERRP \ Error recovery recording pointer : (ONERROR) ( Establish error environment ) R> DUP 2+ >R ( Push error return, after inline offset ) AP@ >R SP@ 2+ >R ( Checkpoint stack pointers ) ERRP @ >R RP@ ERRP ! ( replaced by our own ) DUP @ + >R ; ( Return after DURING ) : (NOERROR) ( Restore previous ONERROR environment ) R> R> ERRP ! ( Restore previous environment pointer ) RP@ 6 + RP! >R ; ( Drop stack pointers and return ) : ABORT \ modified to support ONERROR ERRP @ ?DUP \ nonzero if error recovery is active IF RP! R> ERRP ! R> SP! R> AP! EXIT THEN ( Error retry ) >ABORT PERFORM ; Programming examples -------------------- Here are some examples copied from the Comforth system itself. 255 CONSTANT MAXLL \ maximum line length for editor or FLOADing xx USER LFCB \ current FLOAD file control block : FLOADER \ fna fnl -- \ load host file with filename string LFCB @ >R \ nest FLOAD, save file FCBSIZE MAXLL + RESERVE DUP >R \ FCB and buffer space OPENI OPEN? R> LFCB ! \ OPEN host file ONERROR TRUE \ indicate error to exit DURING BEGIN LFCB @ INFILE WHILE \ test for end of file LFCB @ FCBSIZE + MAXLL \ addr len of local buffer LFCB @ GETLINE EXCESS? \ read next line to buffer EVALUATE \ preserve input stream and interpret string read REPEAT FALSE \ indicate no error occurred NOERROR \ following is cleanup, always executed LFCB @ CLOSE \ close our file R> LFCB ! \ restore nested FLOAD file control block FCBSIZE MAXLL + FREE \ free our local data space CLOSED? \ ABORT if our CLOSE failed ABORT" " ; \ or ripple ABORT after cleanup : FLOAD \ execute FLOADER with filename from input stream "TOKEN FLOADER ; \ Note: : TOKEN ( -- addr size ) BL WORD COUNT ; \ : "TOKEN ( -- addr size ) ... ; \ similar, but the input stream token can be enclosed in quotes, \ allowing for blanks in filename. : EDIT \ [<filename>] -- edit, load if requested, reedit if error BEGIN EDITOR \ invoke editor IF \ it requests interpreting the file buffer HERE >R \ checkpoint dictionary depth to forget on error ONERROR U0 ->RESET \ restore standard user variables, e. g. compiling off R@ DP ! \ restore dictionary depth as before SLOAD DISPOSE \ forget temporary words, including above restored DP >SOURCE PERFORM L>IN @ + \ buffer address of error, hopefully DUP A0 @ DTOP UWITHIN \ in edit buffer, not EVALUATEd IF MAXLL BL SKIP DROP \ nice alignment to word CURSOR.TO.CHARACTER THEN \ and place editor cursor there 1 WORD DROP \ flush input stream so no edit filename CR ." Re-edit ? " REPLY ASCII Y <> \ ask user if he wants reedit DURING SLOAD \ interpret file buffer CR ." Remember to SAVE " TRUE \ no error, do not loop NOERROR R> DROP ELSE TRUE THEN \ no debug request UNTIL ; : STOP \ action word used at breakpoint to suspend execution SUSPEND HLD @ >R STATE @ >R SPAN @ >R \ save 8 program environment words [COMPILE] [ \ make sure we are in interpretation mode HERE HISHERE 80 CMOVE \ and save the area we destroy around PAD BEGIN \ special interpreter loop HOME.PFA @ CR ." AT " DUP BODY> >NAME NAME TYPE \ tell patched word CALL.POINT SWAP - ." +" U. \ and offset within CALL.POINT @ DUP HINGE.CFA = IF DROP SAVED.CALL @ THEN >NAME NAME TYPE \ tell referenced word ." : " .S CR \ and stack contents MYTIB DUP 80 EXPECT \ obtain user command line SPAN @ DUP IF NEWSTREAM \ set input stream to non-null line ONERROR CR ." Oops!!!" \ keep from losing test environment on user error DURING INTERPRET \ user input NOERROR ELSE 2DROP TRUE EXIT.TYPE DO.STEP B! THEN \ null, set stepping flag EXIT.TYPE @ UNTIL \ loop until CONT, STEP or FREERUN commands executed HISHERE HERE 80 CMOVE \ restore all what was saved and exit to breakpoint R> SPAN ! R> STATE ! R> HLD ! RESUME ; \ manager to resume at patch P. S. Comforth is the work of the Southern Belgium Forth Chapter to whom I belong. I am writing on their behalf, because they wish to have some key ideas of our favourite tool shared to the Forth community. And I sure liked to reflect our enthusiasm for Forth as well. I hope my limited English will have done it. Andre.
ZMLEB@SCFVM.BITNET (Lee Brotzman) (07/02/88)
-------------------- Date: Thursday, 30 June 1988 0910-EST From: DAVID@PENNDRLS Subject: Re: FORTH in an operating system Thanks for a wonderful article, Andre! The ONERROR concept is a real gem. One of those things that once explained elicits the reaction 'Of course! That should have been obvious!' But it wasn't. I've been banging my head against the problems ONERROR solves for some little time, and am greatful to have a solution. The aggregate stack is also something I will probably adopt. Is Comforth available in some fashion? If so, and if the price isn't too high, I'd be interested in obtaining a copy. It sounds like good work. Your points about FORTH being handicapped by the perception that it is an operating system unto itself and therefore does not get consistently integrated with the host operating system is very well taken. I would certainly like to see the standards committee define a superset of the kernel that includes the syntax of words for accessing operating system functions. Even a small set like your FILE set, and simple things like TIME and DATE would go a long way toward making FORTH programs more transportable. How about standards for optional lexicons analogous to the standard sets of library routines for C? One minor operating-system-integration issue that I wonder if the standards committees have ever addressed is that of host character set. Operating as I do primarily on IBM equipment, I balked at having the word ASCII to generate the EBCDIC code for a character. I use CODEPOINT instead. I also define words to translate from both EBCDIC and ASCII into whatever the host character set happens to be, and vice-versa. On a given system, a pair of these operations are non-operations, but it does provide for character set independence. -- R. David Murray (DAVID@PENNDRLS.BITNET, DAVID@PENNDRLS.UPENN.EDU)