ccplumb@watnot.UUCP (Colin Plumb) (11/28/86)
I'm working on a 32-bit Forth for a VAX under BSD 4.2 Unix. I'm trying to figure out how to implement the system call interface. At the lowest level, the interface provided by the system looks like this: - Register 12 (r12, the argument pointer) is expected to be pointing to a stack frame containing the arguments taken by the system call. - The actual system call is implemented with the chmk (change mode to kernel) instruction, with the argument being the number of the call (given in <syscall.h>). - On return, if the carry flag is clear, then no error ocurred, and the return values are in r0 and r1 (as needed - some calls don't return values, most return one, in r0). - If the carry flag is set, then the error code is in r0. I'm trying to figure out a good way to implement this in Forth. In C, the library checks for an error, and returns -1 in that case, setting errno to the error number. If there are two return values (pipe(2) is an example), the usual solution is to require an array of 2 int's to be passed to the library routine (the actual pipe system call doesn't take arguments), which fills it in. I could either emulate the C interface, leaving the return value (even if meaningless, for calls that don't return values) on the stack, and use a VARIABLE ERRNO, or try for something Forthier. The idea I'm currently playing with is to leave a flag - -1 (true) for error, and 0 (false) for no error - on top of the stack, followed by either the error code, or the return values (if any). Would anyone like to comment on the above ideas, or suggest another? The first has the advantage that it's familiar to people with experience in C, and always leaves the same number of values on the stack, while the second is conceptually cleaner, and I don't think the two cases for the number of return values will matter too much - in most cases, one value is returned, making two stack values in all cases, and the test for error is generally immediately after the return, anyway. -Colin Plumb (ccplumb@watnot.UUCP) Zippy says: I was born in a Hostess Cupcake factory before the sexual revolution!
karl@haddock.UUCP (Karl Heuer) (12/04/86)
In article <12234@watnot.UUCP> ccplumb@watnot.UUCP (Colin Plumb) writes: >I'm working on a 32-bit Forth for a VAX under BSD 4.2 Unix. >I'm trying to figure out how to implement the system call interface. Well, I've already done it on SysV, and I think the same idea should work on BSD. On success, the return values (0, 1, or 2 of them) and a success indicator are placed on the stack; on failure, only the failure indicator. (I used 1 for ok vs. 0 for error, but 0 for ok would've made it easier to test.) The error code can be retrieved via "errno" (which is not a variable because I didn't think it needed to be user-writable); I didn't leave it on the stack because most of the time it isn't needed. For consistency, even calls that never fail (e.g. getpid) have the flag on top. (I also defined a word which drops the top of stack and bombs with an appropriate message if it was a failure.) I used a leading "$" (so "$dup" is a system call, while "dup" is a Forth word), or "$_" for the "real" system call if the C interface twiddles it (so $_getpid returns two values, while $getpid and $getppid select the one of interest). I didn't do this with $pipe and $wait; they just return two values (too bad C can't do that! Returning a struct by value doesn't count). I used one defining word which takes three arguments (number of arguments, number of results, and chmk number), and had one chunk of common code in assembly language. The asm code was the only part I needed to rewrite when I ported my TIL to a 3b2. Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint
ccplumb@watnot.UUCP (12/05/86)
In article <182@haddock.UUCP> karl@haddock.ISC.COM.UUCP (Karl Heuer) writes: >In article <12234@watnot.UUCP> ccplumb@watnot.UUCP (I) write: >>I'm working on a 32-bit Forth for a VAX under BSD 4.2 Unix. >>I'm trying to figure out how to implement the system call interface. > >Well, I've already done it on SysV, and I think the same idea should >work on BSD. On success, the return values (0, 1, or 2 of them) and >a success indicator are placed on the stack; on failure, only the >failure indicator. (I used 1 for ok vs. 0 for error, but 0 for ok >would've made it easier to test.) The error code can be retrieved >via "errno" (which is not a variable because I didn't think it needed >to be user-writable); I didn't leave it on the stack because most of >the time it isn't needed. For consistency, even calls that never >fail (e.g. getpid) have the flag on top. (I also defined a word >which drops the top of stack and bombs with an appropriate message if >it was a failure.) Thank you very much for the ideas... I think I'll use -1 (F-83 true) for "error", and 0 for "O.K.", since that lets me use ABORT" to bomb out, and is in agreement with C convention. I'd like to ask people with more Unix experience than my 4 months whether it's desirable to put the error in "errno", or leave it on the stack. My perception is that, while in most cases you simply bomb out (via some sort of ABORT word), which clears the stack, and thus never use the error number, if you try to handle it more gracefully, you almost always use the error number in some sort of case statement. (That is, you use it right away, and just in this one place.) So why stash it away somewhere? >I used a leading "$" (so "$dup" is a system call, while "dup" is a >Forth word), or "$_" for the "real" system call if the C interface >twiddles it (so $_getpid returns two values, while $getpid and >$getppid select the one of interest). I didn't do this with $pipe >and $wait; they just return two values (too bad C can't do that! >Returning a struct by value doesn't count). I was thinking of using a leading "_" for the system call naming convention, since that's the convention used by the innards C library, and I wanted to use $ for string extensions. Still, your idea of supporting both the C library syntax and the actual system call syntax is a good idea, and adding (in my case, another) "_" is in agreement with the way it's handled in C (exit() is a library routine that calls _exit(), the system call proper). I was worrying about what to do in some cases (like getpid and getuid), where the interface is messed up by the fact that C can't handle multiple return values. >I used one defining word which takes three arguments (number of >arguments, number of results, and chmk number), and had one chunk of >common code in assembly language. The asm code was the only part I >needed to rewrite when I ported my TIL to a 3b2. A defining word is definitely the way to go - although I'd replace "number of results" by flags telling the common code which registers to place on the stack. Of course, really badly behaved things like wait3 (which is a fancier version of wait, with options for nonblocking operation, etc.), which are really other system calls in disguise (the extra args are put in r0 and r1, and the flags in the PSW are set to indicate the fancy version, before chmk $SYS_WAIT. Can you say *ugly*?), are going to require special attention. >Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint -Colin plumb (ccplumb@watnot.UUCP) Zippy says: I'm rated PG-34!!
karl@haddock.UUCP (12/13/86)
In article <12261@watnot.UUCP> ccplumb@watnot.UUCP (Colin Plumb) writes: >I'd like to ask people with more Unix experience than my 4 months >whether it's desirable to put the error in "errno", or leave it on the >stack. My perception is that, while in most cases you simply bomb out >(via some sort of ABORT word), which clears the stack, and thus never >use the error number, if you try to handle it more gracefully, you >almost always use the error number in some sort of case statement. >(That is, you use it right away, and just in this one place.) So why >stash it away somewhere? Okay. First, if you're going to abort, it doesn't matter whether or not the errno is on the stack; so we can assume a more graceful error handler. It's been my experience in C that most such calls do *not* look at errno. In fact, the usual situation that if the system call fails, the function will return an error condition to its caller. (E.g. if fopen() fails to open(), it returns NULL.) If the system calls return the pair (FAILURE,errno), then any utility routines that use them will likewise have to leave errno on the stack. This can get a bit messy if you have other stack cleanup to do before returning. That's why I think stashing it is better. (Also, it means one less arg to the perror() routine, which again means less stack rearrangement.) >I was thinking of using a leading "_" for the system call naming >convention, since that's the convention used by the innards C library, If you mean the mapping "printf" -> "_printf", that's a convention used by some C *compilers*, and it applies to all external names. The other use of underscore (e.g. "exit" and "_exit") is what I was emulating with my notation of "$" and "$_"; in fact I did use "$_exit" for the "real" system call vs. "$exit" for the cleanup version (I was going to implement multiple cleanup routines, too). I don't think there's any reason to support the C library syntax for functions like wait(), unless you expect to mechanically translate code from C! Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint
wmb@sun.uucp (Mitch Bradley) (12/14/86)
At the 1985 Rochester Forth Conference, several of us Unix and Forth users had a working group and hammered out a set of Forth-to-Unix interface conventions. Following is a copy of the resulting working group report. In summary: 1) Forth word names for Unix system calls should start with underscore (_) 2) The leftmost C argument should appear on the top of the Forth stack 3) We define defining words SYSCALL: and SUBROUTINE: for constructing interfaces to system calls and library routines, respectively. There are also defining words to access C data storage areas and to allow C routines to call Forth words. 4) Argument type conversion is done automatically by the defining words, under control of a parameter type specification list. The number of arguments and the number of return values is not enough in the general case. 5) Error reporting is handled with a word ERRNO which returns a value, not an address. ERRNO returns 0 if no error occurred, or the Unix error number otherwise. 6) The report covers other areas such as case sensitivity, control characters in source code files, file naming conventions, etc. Mitch Bradley (the rest of the messages is the report) Forth and Unix Working Group Mitch Bradley ABSTRACT The Forth and Unix working group included a number of people who are currently using Forth with the UNIX- operating system, and a few interested observers. The group agreed on a set of guidelines for the interface between Forth and UNIX, based on the experience of the participants. Adoption of these guidelines should increase the ability of Forth users under UNIX to share code. System Call and C Language Interface It is frequently desireable to use UNIX system calls from within Forth. Also, since UNIX has an extensive set of library routines that are written in or callable from the C language, Forth can benefit from being able to execute C subroutines. The following wordset defines an interface between Forth, C, and the operating system. The scheme is quite general; it should serve equally well to integrate Forth into another operating system (other than UNIX), or another language envoronment (other than C). SYSCALL: ( -- ) ( Input Stream: system-call-name <parameter-list> ) A defining word used in the form: SYSCALL: <name> <parameter- list> Defines <name> so that when <name> is later executed, the UNIX system call of the same name will be invoked. This should only be used to define Forth interfaces to system calls (as opposed to C language subroutines). <name> should be the same as the name of a UNIX system call, but with an underscore (_) as the first charac- ter. For example, the read() system call, which reads from a file, would be interfaced to Forth with: SYSCALL _read <parameter list> <parameter list> will be described later. SUBROUTINE: ( ext-name -- ) ( Input Stream: <name> <parameter-list> ) Defines <name> so that when <name> is later executed, the external _________________________ - UNIX is a trademark of Bell Laboratories. December 13, 1986 - 2 - subroutine ext-name is invoked. ext-name is passed to SUBROUTINE: as the address of a packed string. ext-name is usually the exter- nal name of a C language subroutine. <parameter list> is described later. ENTRY: ( ext-name -- ) ( Input Stream: <name> <parameter list>) Builds an entry point so that the already-existing Forth word <name> may be called from outside of Forth. ext-name is the external name by which the Forth word will be known to the outside world. This is useful, for example, when Forth calls a C routine which per- forms output, but the programmer wants the C output to go through the Forth I/O system. This example might result in the following: " _putchar" ENTRY: EMIT <parameter list> <parameter list> is described later. Note that ENTRY: is not a defining word, in that it does not cause a new name to be created in the Forth dictionary. DATA: ( ext-name -- ) ( Input Stream: <name> ) Defines <name> so that when <name> is later invoked, the address of the data storage area associated with the external symbol ext- name is left on the stack. For example, if a C subroutine defines an external array: int primes = { 2, 3, 5, 7, 11, 13, 17, 19 }; that array could be accessed from Forth by declaring: " _primes" DATA: primes (The external name of C objects in the UNIX world is the name with an underscore prepended, hence _primes). Since the details of how arguments are passed to and from subrou- tines is usually different between Forth and the rest of UNIX, it is necessary to provide a means for moving arguments between the Forth stack(s) and wherever the UNIX and C language routines expect the arguments to be. Rather than requiring the Forth programmer to deal with this, the interface wordset provides a way to describe the argu- ments in such a way than appropriate conversions may be made automati- cally. A suggested implementation is to compile an appropriate bit of assembly code for each SYSCALL: , SUBROUTINE: , or ENTRY:, which would perform the argument conversions/movements. The argument specifica- tion is done with a <parameter list>. A <parameter list> specifies the type and order of the input and output arguments. The <parameter list> is a list of the types of the input arguments, followed by "--", followed by the type of the output argument, followed by "END". The possible types are from this table: December 13, 1986 - 3 - void_ty null type addr_ty address (a pointer to something) int_ty "standard" or "normal" integer (1 stack cell) float_ty floating point dfloat_ty double precision float string_ty string char_ty 1 byte uchar_ty 1 byte unsigned short_ty 2 bytes signed ushort_ty 2 bytes unsigned long_ty 4 bytes signed ulong_ty 4 bytes unsigned The order of the input arguments is opposite from that of the C specification; i.e. the rightmost C argument is mentioned first in the Forth <parameter list>. This is due to the fact that most C compilers actually process arguments from right to left, so this scheme is likely to cause fewer potentional problems. Example The UNIX system call to create a new file is called creat. It's C language description is: int creat(name,mode) char *name; int mode; This means that it takes 2 arguments: a string (char *) which is the name of the file to create, and an integer "mode" which controls which users have various access permissions on the new file. The return value is an integer which is a UNIX file descriptor useful for subse- qunetly accessing the file, or -1 if an error occurred. The Forth interface to creat is specified as follows: ( mode name fd ) SYSCALL: _creat int_ty string_ty -- int_ty END Errors In UNIX there is a global variable errno which generally contains an extra error status code if the last system call failed for some rea- son. The Forth interface to this is the Forth word: ERRNO( -- error-code ) After each UNIX system call, the value left on the stack by ERRNO will be 0 if the system call succeeded, or the contents of the UNIX global variable errno if the call failed. Any data storage required by ERRNO should be in the USER area, so that different Forth tasks may independently perform system calls without con- flict. December 13, 1986 - 4 - Case Sensitivity? The group had mixed feelings about this issue. The following (incom- plete) set of guidelines were agreed-upon: 1 The Forth system should be able to accept either upper case or lower case input. 2 At the users option, upper case and lower case input should be treated as either distinct or indistinct. 3 Programmers are strongly encouraged to avoid the use of names that differ only in the case of the letters used; e.g., don't name one variable "blockno" and another different variable "BLOCKNO". Input Delimiters The Forth phrase BL WORD should treat all control characters, as well as the ascii blank character, as delimiters, both when skipping ini- tial delimiters, and when scanning for the delimiter which terminates a word. This greatly simplifies the interpretation of ordinary text files, which may contain tabs, linefeeds, carriage returns, and formfeeds as separator characters in addition to ordinary blanks. This may be efficiently implementing by testing for "( char ) BL <=" instead of "( char ) BL =" when skipping or scanning for delimiters. WORD with any character other than BL as the delimiter should treat only that character as the delimiter. In this case, leading delim- iters should NOT be skipped. Not skipping leading delimiters prevents a common Forth bug whereby a zero-length string is not processed correctly. For example, some systems will not do the obvious thing when confronted with ( ) or ." " The author has NEVER seen a case where WORD with a non-blank delimiter should have skipped leading del- imiters. The actual delimiter encountered which terminates the scanning of WORD should be stored in the USER variable: DELIMITER ( -- addr ) addr is the address of a USER variable which contains the actual delimiter encountered when executing the previous invocation of WORD . If the delimiter encountered was the end of the input stream, the value contained in the USER variable is -1. This makes it easy to check for a number of end conditions. Environment A Forth program can access the UNIX shell Environment Variables with: GETENV ( str1 -- [ str ] flag ) str1 is the address of a counted string which is the name of the desired environment variable. flag is true if that environment variable is set, and str is the address of a counted string which December 13, 1986 - 5 - contains the value of that environment variable. flag is false if that environment variable is not set, and str is not present. The user may set the environment variable FPATH in his shell environ- ment. If set, Forth may use the value of this variable as a list of directory names in which to search for files. Example (csh syntax): setenv FPATH .:/usr/wmb/lib/forth/:/usr/local/lib/forth If this envoronment variable is not set, Forth may use a system- dependent default list of directories in which to search for files. The default list contains the current directory as its first com- ponent, but the rest of the list is system-dependent. Filename Extensions Ordinary UNIX text files containing Forth source code (not in block format) should have names ending with the extension ".fth". (".f" would be nice but Fortran got it first!). Files containing Forth blocks shoule have names which end with ".blk". Subprocesses The following words provide the capability of executing UNIX sub- processes from within Forth: SH ( -- ) ( Rest of Line: string arguments to process ) A subshell is spawned to execute the UNIX command line which is the remainder of the Forth input line. If the user's SHELL environment variable is set, it's value controls which shell to use (Bourne shell or C-shell or Korn shell). Otherwise the Bourne shell (/bin/sh) is used. As a possible optimization, the imple- mentation of SH is allowed to directly execute the command line in a subprocess rather than spawn a subshell, if it can determine that no special shell metacharacter expansions (like wildcards, for instance) are required. SH[ ( -- ) ( Input Stream: characters up to next ] ) Similar to SH , but only those characters between the brackets [ ] are included in the command line. -SH ( command-string -- ) Similar to SH, but the command line is taken as the address of a packed string from the stack. CHILD-STATUS ( -- status ) status is the return status returned by the most-recently executed subprocess. The implementation should keep any data associated with CHILD-STATUS in the USER area so that different tasks may execute subprocesses without conflict. Open Issues Many issues remain to be addressed, to wit: Terminal independent December 13, 1986 - 6 - display control - TERMCAP vs Termio vs something else? Object file formats and Forth words for controlling the dynamic loading of them (as opposed to the specification of the interface points, which is covered here). Signal handling. Multitasking and the interface with (blocking) UNIX I/O system calls. Participants Mitch Bradley Bill Sebok Peter Blake Tom Almy Dave Hooley Harry Arnold December 13, 1986