mem@sii.UUCP (Mark Mallett) (05/27/85)
Greetings. About a year ago, I wrote a set of C routines to implement TOPS-20 style parsing on my CP/M system. I have used these routines for a number of programs, such as a reminder program and software which composes my bulletin board system here in NH. With that, I am now convinced that the routines work (at least to the point where I can use them), and have put them together as a kit. What follows is the document for the library. If you have comments, please send me mail. If there is enough interest, I shall post the sources. I hope that if this happens, someone will tell me whether the net.micro.cpm or the net.sources area is appropriate; I am sending this message off to both. Cheers, Mark Mallett decvax!sii!mem or ittvax!sii!mem C O M N D A TOPS-20 style command parsing library for personal computers Documentation and source code Copyright (C) 1985 by Mark E. Mallett; permission is granted to distribute this document and the code indiscriminately. Please leave credits in place, and add your own as appropriate. This Document This document contains the following sections: o Document overview (this here section) o Introduction and history o Functional overview o How to write programs using the subroutine library o How to make the library work on your system Introduction and History This document describes the COMND subroutine package for C programmers. COMND is a subroutine library to effect consistent parsing of user input, and in general is well suited for verb-argument style command interfaces. The library provides a consistent user interface as well as a program interface which, I believe, could well remain unchanged if the parsing library were re-written to support different interface requirements (such as menu interaction). The COMND interface is based on the TOPS-20 model. TOPS-20 is an operating system which is/was used by Digital Equipment Corporation on their PDP-20 computer. TOPS-20 was based on TENEX, written by BBN (I think, I think). TOPS-20 COMND is much more robust and consistent than the library which this document describes; this library being intended for small computer applications, it provides the most commonly used functions. This library was written on a Z-80 system running Digital Research Corporation's CP/M operating system version 3.0 (CPM+). I have also compiled and tried it on a VAX 11/780 running VMS. It is completely written in the C language, and contains only a few operating system specific elements. The COMND JSYS section of the TOPS-20 Monitor Calls manual is probably a good thing to read. Please note: while there are a few unimplemented sections of this library, I felt that it was nevertheless worthwhile to submit it to public domain since it is usable for almost all general command parsing and since the call interface is well defined. I have used this library extensively since sometime in 1984. Functional Overview The COMND subroutine library provides a command-oriented user interface which is consistent at the programmer level and at the user level. At the program level, it gives an algorithmically controlled parsing flow, where a call to the library exists for each field or choice of fields to be parsed. At the user level, the interface provides: o Command prompting. o Consistent command line editing. The user may use editing keys to erase the last character or word, and to echo the current input line and prompt. o Input abbreviation and defaulting. The user may type abbreviations of keywords, or may type nothing to have defaults applied. o Incremental help. By pressing a known key (usually a question mark), the user can find out what choices s/he has. o Guide strings. Parenthesized guide words are shown at the users option. o Command completion. Where the subroutine library can judge what the succesful completion of a portion of user input will be, the user can elect to have this input completed and shown automatically. Using the COMND Library While you read this part of the document, you might want to look at the sample program named TEST.C which has been included with this package. It is an over-commented guide to the use of the COMND library. Any module which makes use of this library shall include the definition file named "comnd.h". This file contains definitions which are necessary to the caller-library interface. Mnemonics (structures and constants) mentioned in relation to this interface are defined in this file. The philosophy of parsing with the COMND library is that a command line is typed, the program inspects it, then the program acts on the directions given in that line. This process is repeated until the program finishes. The COMND library assists the user in typing the command line and the program in inspecting it. Acting on it is left up to the calling program. The typing and parsing of fields in the command line go essentially hand-in-hand with this library. The single subroutine COMND() is used to effect all parsing. This routine is called for each element of the input line to be parsed. Parsing is done according to a current parse state, which is maintained in a parameter block passed between caller and library. The state block contains the following sort of information (described in detail later): o What to use for a prompt string. o Addresses of scratch buffers for user input and atom storage. o How much the user has entered. o How much of the line the program has parsed. An important thing to note is that the indexes (how much entered and parsed) are both variable. The program begins parsing of the input line upon a break signal by the user (such as the typing of a carriage return, question mark, etc). The user may then resume typing and erase characters back to a point BEFORE that already parsed. It is very important that the program does not take any action on what has been parsed until the line has been completely processed, otherwise that action could be undesired. Since the user may back up the command input to a point before that already processed by the application program, a mechanism must be provided to backup the program to the correct point. Rather than going to the point backed up to, the COMND library expects the application program to return to the beginning of the line, and start again. The user's input has remained in the command line buffer, and the library will take care of buffering the rest of the input when that parse point is again reached. However, this means that there must be a method of communicating to the calling program that this "reparse" is necessary. Actually there are two methods provided, as follows: o Each call to the command parsing routine COMND() yields a result code. The result may indicate that a reparse has to take place. The program shall then back up to the point where the parse of the line began, and start again. o The application program may specify the address of a setjmp buffer which identifies the reparse point. (Note setjmp is a facility provided as part of most standard C libraries. It allows you to mark a point in the procedure flow [call frame, registers, and whatever else is involved in a context], and return to that point from another part of the program as if control had never proceeded. If you are unfamiliar with this facility, you might want to find a description in your C manual.) It is up to the caller to setup the setjmp environment at the reparse point. In either case, the reparse point (the point at which the parse will be restarted if necessary) is the point at which the first element of the command line is parsed. This is after the initialization call which starts every parse. Every call to the COMND() subroutine involves two arguments: a command state block, in which is kept track of the parse state, and a command function block, which describes what sort of thing to parse next. The command state block is given a structure called "CSBs", and a typedef called "CSB". Each element of the structure is named with a form "CSB_xxx", where "xxx" is representative of the element's purpose. The following are the elements of the command state block, in the order that they appear in the structure. o CSB_PFL is a BYTE. This contains flags which are set by the caller to indicate specifics of the command processing. These flags are: o _CFNEC: Do not echo user input. o _CFRAI: Convert lowercase input to uppercase. o CSB_RFL, a BYTE value, contains flags which are kept by the library in the performance of the parse. Generally, these flags are of no interest to the caller since their information can be gleaned from the result code of the COMND() call. However, they are: o _CFNOP: No parse. Nothing matched, i.e., an error occured. o _CFESC: Field terminated by escape. o _CFEOC: Field terminated by CR. o _CFRPT: Reparse required. o _CRSWT: Switch ended with colon. o _CFPFE: Previous field terminated with escape. o CSB_RSB is the address of a setjmp buffer describing the environment at the reparse point. If this value is non-NULL, then if a reparse is required, a longjmp() operation is performed using this setjmp buffer. o CSB_INP is the address of the input-character routine to use. If this value is non-NULL, then this routine is called to get each character of input. No line editing or special interactive characters are recognized in this mode, since it is assumed that this will be used for file input. Note especially: this facility is not yet implemented, however the definition is provided for future expansion. Thou shalt always leave this NULL, or write the facility thyself. o CSB_OUT is the inverse correspondent to the previous element (CSB_INP). It is the address of a routine to process output from the command library. Please see the warning in the CSB_INP description about not being implemented. o CSB_PMT is the address of the prompt string to use for command parsing. The command library takes care of prompting, so make sure this is filled in. o CSB_BUF is the address of the buffer to put user input into as s/he is typing it in. o CSB_BSZ, an int, is the number of bytes which can be stored in CSB_BUF; i.e., it is the buffer size. o CSB_ABF is the address of an atom buffer. Some (if not all) parsing functions involve extracting some number of characters from the input buffer and interpreting or simply returning this extracted string. This buffer is necessary for those operations. It should probably be as large as the input buffer (CSB_BUF), but it is really up to you. o CSB_ASZ, an int, is the number of characters which can be stored in CSB_ABF; i.e., it is the size of that buffer. ** Note ** CSB elements from here to the end do not have to be initialized by the calling program. They are used to store state information and are initialized as required by the library. o CSB_PRS, an int, contains the parse index. This is the point in the command buffer up to which parsing has been achieved. o CSB_FLN, an int, is the filled length of the command buffer. This is the number of characters which have been typed by the user. o CSB_RCD, an int, is a result code of the parse. This is the same value which is returned as the result of the COMND() procedure call. o CSB_RVL is a union which is used to contain either an int or a long value. The names of the union elements are: _INT for int, _ADR for address (note that a typecast should be used for proper address assignment). This element contains a value returned from some parse functions which return values which are single values. For example, if an integer is parsed, its value is returned here. o CSB_CFB is the address of a command function block for which a parse was successful. This is significant in cases where there are alternative possible interpretations of the next command element. The parse of each element in a command line involves, as well as the Command State Block just described, a Command Function Block which identifies the sort of thing to be parsed. This block is defined in a structure named "CFBs", which has a corresponding typedef named "CFB". Elements of the CFB, named "CFB_xxx", are as follows (in the order they appear in the structure): o CFB_FNC, a BYTE, is the function code. This defines the function to be performed. The function codes are listed, and their actions described, a little later. o CFB_FLG, a BYTE, contains flags which the caller specifies to the library. These are very significant, and in most cases affect the presentation to the user. The flag bits are: o _CFHPP: A help string has been supplied and should be given when the user types the help character ("?"). o _CFDPP: A default string has been supplied, and shall be used if the user does not type anything at this point (typing nothing means typing a return or requesting command completion). Note that this flag (and the default string) is ONLY significant for the CFB passed in the call to the COMND() routine, and not for any others referenced as alternatives by that CFB. o _CFSDH: The default help message should be supressed if the user types the help character ("?"). This is normally used in conjunction with the _CFHPP flag. However, if this flag is present and the _CFHPP is not selected, then the help operation is inhibited, and the help character becomes insignificant (just like any other character). o _CFCC: A character characteristic table has been provided. A CC table identifies which characters may be part of the element being recognized. Not all functions support this table (for example, it does not make sense to re-specify which characters may compose decimal numbers). This table also specifies which characters are break characters, causing the parser to "wake up" the calling program when one of them is typed. If this bit is not set (as is usually the case), a default table is associated according to the function code. o _CFDTD: For parsing date and time, specifies that the date should be parsed. o _CFDTT: For parsing date and time, specifies that the time should be parsed. o CFB_CFB is the address of another CFB which may be invoked if the user input does not satisfy this CFB. CFBs may be chained in this manner at will. Recognize, however, that the ORDER of the chain plays an important part in how input is handled, particularly in disambiguation of input. Note also that only the first CFB of the chain is used for specifying a default string and CC table (for command wake-up). CFB chaining is a very important part of parsing with this library. o CFB_DAT is defined as a long, since it is used to contain address or int values. It should be referenced via typecast. It is not defined as a union because it is inconvenient or impossible to initialize unions at compile time with most (all?) C compilers, and initialization of these blocks at runtime is not desirable. This element contains data used in parsing of a field in the command line. For instance, in parsing an integer, the caller specifies the default radix of the integer here. o CFB_HLP is the address of a caller-supplied help string. This is only significant if the flag bit _CFHPP is set in the CFB_FLG byte. o CFB_DEF is the address of a caller-supplied default string. This is only significant if the flag bit _CFDPP is set in the CFB_FLG byte, and only for the first CFB in the CFB chain. o CFB_CC is the address of a character characteristics table. This is only significant if the flag bit _CFCC is set in the CFB_FLG byte. This is the address of a 16-word table, each word containing 16 bits which are interpreted as 8 2-bit characteristic entries. The most significant bits correspond to the lower ASCII values, etc. The 2-bit binary value has the following meaning, per character: o 00: Character may not be part of the element being parsed. o 01: Character may be part of the element only if it is not the first character of that element. o 02: Character may be part of the element. o 03: Character may not be part of the element; furthermore, when it is typed, it will case parsing to begin immediately (a wake-up character). The function code in the CFB_FC element of the command function block specifies the operation to be performed on behalf of that function block. Functions are described now. CFB function _CMINI: Initialize Every parse of a command line must begin with an initialization call. This tells the command library to reset its indexes, that the user must be prompted, etc. There may be NO other CFBs chained to this one, because if they are, they are ignored. The reparse point is the point right after this call. If the setjmp method is used, then the setjmp environment should be defined here. After the reparse point, any variables etc which may be the victims of parsing side-effects should be initialized. CFB function _CMKEY: Keyword parse _CMKEY parses a keyword from a given list. The CFB_DAT element of the function block should point to a table of string pointers, ending with a NULL pointer. The user may type any unique abbreviation of a keyword, and may use completion to fill out the rest of a known match. The address of the pointer to the matching string is returned in the CSB_RVL element of the command state block. The value is returned this way so that the index can be easily calculated, and because it is consistent with the general keyword parsing mechanism (_CMGSK). The incremental help associated with keyword parsing is somewhat special. The default help string is "Keyword, one of:" followed by a list of keywords which match anything already typed. If a help string has been supplied (indicated by _CFHPP) and no suppression of the default help is specified, then the initial part ("Keyword, ") is replaced with the supplied help string and the help is otherwise the same. If a help string has been supplied and the default has been supressed, then the given help string is presented unaltered. CFB function _CMNUM: number This parses a number. The caller supplies a radix in the CFB_DAT element of the function block. The number parsed is returned (as an int) in the CSB_RVL element of the state block. CFB function _CMNOI: guide word string This function parses a guide word string (noise words). Guide words appear between significant parts of the command line, if they are in parentheses. They do not have to be typed, but if they are, they must match what is expected. If the previous field ended with command completion, then the guide words are shown automatically by the parser. An interesting use of guide word strings is to provide alternate sets with the command chaining feature. The parse (and program) flow can be altered depending on which string was matched. CFB function _CMCFM: confirmation A confirmation is a carriage return. The caller should parse a confirmation as the last thing before processing what was parsed. Since carriage return is by default a wake-up character, requiring a confirmation will (if you don't change this wake-up attribute) require that the parse be completed with no extra characters typed. A parse with this function code returns only a status. CFB function _CMGSK: General storage keyword This call provides for parsing of one of a set of keywords which are not arranged in a table. Often, keywords are actually stored in a file or in a linked list. The caller fills in the CFB_DAT element of the command function block with the address of a structure named CGKs (typedef CGK), which contains the following elements: o CGK_BAS: A base address to give to the fetch routine. Does not matter what this is, as long as the fetch routine understands it. o CFK_CFR: The address of a keyword fetch routine. The routine is called with the CGK_BAS value, and the address of the pointer to the previous keyword. It is expected to return the address of the pointer to the next keyword, or with the first one if the passed value for the previous pointer is NULL. When this function completes successfully, it returns the address of the pointer to the string in the CSB_RVL element in the command state block. Please see the description of the _CMKEY function code for a description of help and other processing. CFB function _CMSWI: Parse a switch. This is functionally equivalent to _CMKEY, and exists to fill a need for switch parsing. Basically it is a placeholder for an unimplemented function. CFB function _CMTXT: Rest of line This function parses the text to the end of the line. Note that this does not parse the trailing break character (i.e. the carriage return). The text is returned in the atom buffer which is defined (by the caller) by the CSB_ABF and CSB_ASZ elements of the command state block. CFB function _CMTOK: token This function will parse an exact match of a particular token. A token is a string of characters, whose address is supplied by the caller in the CFB_DAT element of the command function block. This function is mainly useful for parsing such things as commas and other separators, especially where it is one of several alternative parse functions. It returns no value other than its status. CFB function _CMUQS: unquoted string This function parses an unquoted string, consisting of any characters other than spaces, tabs, slashes, or commas. This set may of course be changed by supplying a CC table. The unquoted string is returned in the atom buffer associated with the command state block. CFB function _CMDAT: parse date/time This function parses a date and/or time. The caller specifies, via flag bits in the CFB_FLG byte of the command function block (as identified above) which of date, time, or both, are to be parsed. The date and time are returned as the first two ints in the atom buffer which is associated with the command state block. Note that both date and time are returned, regardless of which were requested. Note further that this routine is not fully implemented as of this writing. Calling the COMND library All that you need to know to use the above information is how to call the command library. Basically, there is one support routine: COMND(). It is used like this: status = COMND (csbp, cfbp); Here, "csbp" is the address of the command state block, and "cfbp" is the address of the command function block. The COMND() routine returns an int status value, which is one of the following: o _CROK: The call succeeded; a requested function was performed. The address of the matching function block is returned in the CSB_CFB element of the command state block, and other information is returned as described above. o _CRNOP: The call did not succeed; nothing matched. o _CRRPT: The call did not succeed because the user took back some of what had already been parsed. In other words, a reparse is required, and your program must back up to the reparse point. Note that if you specify a setjmp buffer address in the CSB_RSB element of the command state block, you will never see this value because the COMND library will execute a longjmp() operation using that setjmp buffer. o _CRIFC: The call failed because you provided an invalid function code in the command function block (or in one which is chained to it). You have made a programming error. o _CRBOF: Buffer overflow. The atom buffer is too small to contain the parsed field. o _CRBAS: Invalid radix for number parse. o _CRAGN: You should not see this code. It is reserved for a support-mode call to the subroutine library. Installing the COMND library This part of the document describes the modules which come with the COMND library kit, and what you might have to look at if the code does not instantly work on your system (which will probably be the case if your system is not the same kind as the one which you got it from). The files which come in the COMND kit are as follows: o COMND.R - Source for this document, in a form suitable for the public domain formatting program called "roff4". o COMND.DOC - This document. o MEM.H - A file of my (Mark Mallett) definitions which are used by the code in the command subroutine library. o COMND.H - Command library interface definitions. o COMNDI.H - Command library implementation definitions. o COMND.C - Primary module of the COMND library. Contains user input buffering and various library support routines. o CMDPF1.C - First module of parse function processing routines. o CMDPF2.C - Second module of parse function processing routines. o CMDPFD.C - Contains the date/time parse function routines. This is included in a separate module so that it can be replaced with a stub, since few programs (that I have written, anyway) use this function, and it does take up a bit of code. o CMDPSD.C - A stub for the date/time parsing functions. This can be linked with programs which do not actually use the date/time parse function. o CMDOSS.CPM - Operating system specific code which works for CP/M. This is provided as a model for the routines which you will have to write for your system. o CMDDTM.CPM - Date/time support routines for version 3.0 of CP/M. This is a module containing routines to get the date and time from the operating system, and to encode/decode these values to and from internal form. This is provided as a model; you will probably have to rewrite them for your system.