byron@archone.tamu.edu (Byron Rakitzis) (06/04/91)
After compiling perl on my system and being nauseated by the syntax of the language, I've decided to try to come up with my own alternative. I'm going to call it ap, or anti-perl. Right now I'm thinking that ap will be a super-awk that is less confusing for a C programmer to learn. I'm not sure if I want the implicit looping over stdin (though that's kind of nice) and I definitely don't want the pattern { action } syntax that awk has. It will have an integer and a string datatype, and you should be able to build arrays out of those objects (associative arrays too). Functions would be a nice thing to have, but it must always be easy to toss off a quick one-line ap script, i.e., in the most trivial case I would like something like ypcat hosts | ap 'print $1' or something similar to work just right. I hate having to place braces around that simple statement as one has to do in awk. Most importantly, ap will be driven by an easy-to-understand grammar with C-like syntax. There may be 2 or 3 ways to perform a particular task, but there will not be 10,000 as there are in perl. The main deficiency of awk that I see is its inability to interface well with Unix. Up until recently, awk did not even have ARGC and ARGV, not to mention things like file redirection. This is where perl has taken a step in the "right" direction. Of course, it could be argued, why put symlink(2) into ap when you have ln(1)? Well, this is why perl was written: Unix today just cannot provide any performance with shell scripts; for better or for worse this has to be coded into the command interpreter. Ideas are welcome. I really want to write this thing; perl is a disgrace to the Unix community. -- Byron Rakitzis byron@archone.tamu.edu
tml@extro.ucc.su.OZ.AU (Tim Long) (06/12/91)
I read Byron's comments on perl and awk with some sympathy. I have had thoughts along similar, although not identical, lines for some time. By coincidence, I just designed and implemented a language to address similar issues; which I would be grateful to here peoples opinion on. But first I'll just mention my own motivations: 1) To have a freely available general purpose interpretive language on UNIX systems. (As opposed to the many more special purpose ones such as awk and the shell). This can be re-phrased as: To have a UNIX language like DOS has BASIC. 2) To have a freely available language suitable for embedding in other programs and systems. 3) To allow programming on UNIX systems which do not have development systems (which are becoming very common). So I guess the design spec was to make a freely available general purpose language suitable both for system supported, and embedded use. By embeded use I mean both within stand-alone devices (like PostScript) and as an adjunct to applications. The source is arranged to be ammenable to this. Although I have been brooding on it for some time I have only actually done it in the last month. I'm reasonably happy with the result at this stage but welcome comment. There is a preliminary manual entry which describes the language, but it's just a manual entry. I'll try to give some more background here. The language, which I am calling ICI for the time being, has dynamic typing and object management, with all the flavour (flow control constructs, operators and syntax) of C. You can write very C like code, if you wish (pointers work), but you can take advantage of the more flexible data handling to make things a lot easier. I have tried to keep the design carefully divided into the language and its fundamental functions and then other groups of functions which relate to the operating environment. Naturally the UNIX shell level version has almost all of these included. I could try to convey the nature of the language here, but it is probably better just to skim the manual entry. So I'll include it here and continue the general discussion after that. Its about 14 pages, but you can start skipping after you get to the standard functions (it finishes after the next line of minuses)... ---------------------------------------------------------------------- ICI(1) ICI(1) NAME ici - General purpose interpretive programming language SYNOPSIS ici [ file ] [ -f file ] [ -i prog ] [ -digit ] [ -l lib ] [ args... ] DESCRIPTION Ici parses ICI program modules as indicated by its arguments. They may or may not cause code to execute as they are parsed. But after the modules have been read, if main is defined as an external function it will be called with the otherwise unused arguments (as an integer count and a pointer to the first element of an array of strings). The options are: file If the first argument does not start with a hyphen it is taken to be a program module as if specified with the -f flag. This may be used to allows ICI programs to execute directly with the #! facility. -f file The file is parsed as an ICI module. -i prog The prog argument is parsed directly as an ICI module. -digit An ICI module is read from the file descriptor digit. -l lib An ICI module is read from $ICILIB/liblib.ici. If ICILIB is not defined as an environment variables, /usr/ici will be used. other Any argument not listed above is gathered into the arguments which will be available to the program. -- All further arguments are gathered into the arguments which will be available to the program. Note that argument parsing is two pass, all the "unused" arguments are determined and assigned to argc and argv before the first module is parsed. If an error occurs which is not dealt with by the program itself, a suitable error message will be printed and ici will exit. The remainder of this manual entry is a brief description of the language. OVERVIEW ICI has dynamic typing and flexible data types with the flow control constructs and operators of C. It is designed to allow all types of programs to be written without the programmer having to take responsibility for memory management and error handling. There are standard functions to provided the sort of support provided by the standard I/O and the C libraries, as well as additional types and functions to support common needs such as simple data bases and character based screen handling. A programmer familiar with C should be able to write ICI programs after reading this document. STATEMENTS An ICI source module consists of a sequence of statements. Statements may be any of the following: expression ; compound-statement if ( expression ) statement if ( expression ) statement else statement while ( expression ) statement do statement while ( expression ) ; for ( exp(opt) ; exp(opt) ; exp(opt) ) statement switch ( expression ) compound-statement case constant-expression : default : break expression(opt) ; continue expression(opt) ; return expression(opt) ; ; storage-class ident function-body storage-class decl-list ; In contrast to C, all statement forms are allowed at all scopes. But in order to distinguish declarations and function definitions from ordinary expressions, the storage class (extern, static or auto) is compulsory. There is no goto statement, but break and continue statements may have an optional expression signifying how many levels to effect. (Not in this version.) The term constant-expression above refers to an expression that is evaluated exactly once, at parse time. In other respects it is unrestricted, it may call functions and have side-effects. Switch statements must be followed by a compound statement, not just any statement as in C. Furthermore, each case- label and the default must label statements at the top level of this compound statement. OBJECTS AND LVALUES In ICI objects are dissociated from the storage locations (variables, for instance) which refer to them. That is, any place which stores a value actually stores a reference to the value. The value itself, whether it is a simple integer or a large structure, has an independent existence. The type of an object is associated with the value, not with any storage locations which may be referring to it. Thus ICI variables are dynamically typed. The separation of storage location and value is transparent in most situations, but in some ways is distinguishable from the case in a language such as C where an object is isomorphic with the storage it occupies. ICI assignment and function argument passing does not transfer a copy of an object, but transfers a reference to the object (that is, the new variable refers to the same object). Thus it is straight forward to have two variables referring to the same object; but this does not mean that assigning to one effects the value of the other. Assignment, even in its most heavily disguised forms, always assigns a new object to a storage location. (Even an operation such a "++i" makes the variable "i" refer to the object whos value is one larger than the object which it previously referred to.) The normal storage locations are the elements of arrays and structures. Simple variables are actually structure elements, although this is not apparent in everyday programming. Some object types are "atomic" (scalar), that is their internal structure is not modifiable. Atomic data types have the property that all objects with the same value are in fact the same object. Integers, floating point numbers, strings and functions are atomic by nature. The only standard non-atomic data types are arrays and structures. An atomic (constant) version of any aggregate type (array or structure) can be obtained. Several of the intrinsicly atomic types do allow read-only access to their interior through indexes, structure keys or pointers. (Strings for example allow indexing to obtain one character sub-strings.) TYPES Each of the following paragraphs is tagged with the internal name of the type, as returned by the typeof() function: int Integers are 32 bit signed integers. All the usual C integer operations work on them. When they are combined with a float, a promoted value is used in the usual C style. Integers are atomic. float All floating point is carried out in the host machine's double precision format. All the usual C floating point operations work. Floats are atomic. string Strings are atomic sequences of characters. Strings may be indexed and have the address taken of internal elements. The value of fetching a sub-element of a string is the one character string at that position unless the index is outside the bounds of the string, in which case the result is the empty string. The first character of a string has index 0. Strings may be used with comparison operators, addition operators (which concatenate) and regular expression matching operators. The standard function sprintf is a good way of generating and formatting strings from mixed data. NULL The NULL type only has one value, NULL (the same name as the type). The NULL value is the general undefined value. Anything uninitialised is generally NULL. array Arrays always start at 0 but extend to positive indexes dynamically as elements are written to them. A read of any element either not yet assigned to or outside the bounds of the array will produce NULL. A write to negative indexes will produce an error, while a write to positive indexes will extend the array. Note that arrays do not attract an implicit ampersand as in C. Use &a[0] to obtain a pointer to the first element of an array "a". The function array() and array constants (see below) can be used to create new arrays. struct Structures are collections of storage locations named by arbitrary keys. Structures acquire storage locations and member names as they are assigned to. Elements which do not exist read as NULL. Pointers may be taken to any member, but pointer arithmetic is only possible amongst element names which are simple integers. Note that normal structure dereferencing with struct.member is as per C, and the member name is a string. Member names which are determined at run time may be specified by enclosing the key in brackets as per: struct.(expr), in which case the key may be any object (derived from any expression). Thus struct.("mem" + "ber") is the same as struct.member. An "index" may also be used, as per: struct[expr], and has the same meaning as struct.(expr). (This is true in general, all data types which allow any indexing of their internal structure operate through the same mechanism and these are only notational variations.) The function struct() and structure constants (see below) can be used to create new structures. From a theoretical standpoint structures are a more general type than arrays. But in practice arrays have some properties structures do not (intrinsic order, length and different concatenation semantics, as well as less storage overhead). Note that by ignoring the value associated with a key, structures are sets (and addition performs set union, see below). ptr Pointers point to places where things are stored, but a pointer may be taken to any object and a nameless storage location will be fabricated if necessary. They allow all the usual C operations. Pointer arithmetic works as long as the pointer points to an aggregate element which is indexed by an integer (for instance all elements of arrays, and amongst structure elements which have integer keys). Pointers are atomic. Note that pointers point to a storage location, not to the value of an object itself. Thus if "a" is an array, after "p = &a;", the expression "*p" will have the same value as "a" even if "a" becomes a structure (through assignment). Note that it is not possible to generate pointers which are in any way illegal or dangling. Also note that because assignment and argument passing does not copy values, pointers are not required as often as they are in C. func Functions are the result of a function declaration and function constants. They are generally only applicable to the function call operation and equality testing. They do not attract an implicit ampersand as in C. Functions are atomic. (Code fragments within functions are also atomic and thus shared amongst all functions.) regexp Regular expressions are atomic items produced by either regular expression constants (see below) or compiled at run-time from a string. They are applicable to the regular expression comparison operators described below. file Files are returned and used by some of the standard functions. See below. window Windows are produced and used by some of the standard functions. See below. Other types (pc, catch, mark, op, module and src) are used internally and are not likely to be encountered in ordinary programming. LEXICON Lexicon is as per C, although there is no preprocessor yet, with the following additions: Adjacent string constants separated only by white space form one concatenated string literal (as per ANSI C). The sequence of a "#" character (not on the start of line), followed by any character except a newline up to the next "#" is a compiled regular expression. The sequences !~, ~~, ~~=, ~~~, $, @, [{, }], [<, and >] are new tokens. The names NULL and onerror are keywords. EXPRESSIONS Expressions are full C expressions (with standard precedence and associativity) with some additions. The overall syntax of an expression is: expression: primary prefix-unary expression expression postfix-unary expression binop expression primary: NULL int-literal float-literal char-literal string-literal regular-expression [ expression-list ] [< assignment-list >] [{ function-body }] ident ( expression ) primary ( expression-list(opt) ) primary [ expression ] primary . struct-key primary -> struct-key struct-key: ident ( expression ) prefix-unary: * & + - ! ~ ++ -- $ @ postfix-unary: ++ -- binop: * / % + - >> << < > <= >= == != ~ !~ ~~ ~~~ & ^ | && || : ? = += -= *= /= %= >>= <<= &= ^= |= ~~= , expression-list: expression expression , expression-list assignment-list: assignment assignment , assignment-list assignment: struct-key = expression The effect and properties of various expression elements are discussed in groups below: simple constants integers and floats are recognised and interpreted as they are in C. Character literals (such as 'a') have the same meaning as in C (ie. they are integers, not characters). String literals have the same lexicon as C except that they produce strings (see Types above). Both character and string literals allow the additional ANSI C backslash escapes (\e \v \a \? \xhh) Regular expressions are those of ed(1). complex constants [ expression-list ] [< assignment-list >] [{ function-body }] Because variables are intrinsically typeless it is necessary that initialisers, even of aggregates, be completely self-describing. This is one of the reasons these forms of constants have been introduced. The first is an array initialised to the given values, the second is a structure with the given keys initialised to the given values. The third is a function. The values in the first two are all computed as constant expressions (not meaning that they are made atomic or may only contain constants, just that they are computed once when they are first parsed). primary ( expression-list(opt) ) Function calls have the usual semantics. But if there are more actual parameters than there are formal parameters in the function's definition, and the function has an auto variable called "vargs", the remaining actual parameters will be formed into an array and assigned to this variable. If there is no excess of actual parameters any "vargs" variable will be undisturbed, in particular, any initialisation it has will be effective. prefix-unary (* & + - ! ~ ++ -- $ @) Apart from "$" and "@", the prefix unary operators have the same meaning as they do in C. The "*" operator requires a ptr as an argument. The "-" operator requires an int or float. "!" and "~" require ints. "++" and "--" work with any values which can be placed on the left of a "+ 1" or "- 1" operation (see below). The rest ("&", "+", "$", "@") work with any types. A "+" always has no effect. If the operand of an "&" is not an lvalue in the usual sense, a one element array will be fabricated to hold the value and a pointer to this element will result. The "$" operator causes the effected expression to evaluated at parse time (thus "$sin(0.5)" will cause the value to be computed once no matter how many times the term is used). The "@" operator returns the "atomic" form of an object. This is a no-op for simple types. When applied to an aggregate the result is a read-only version of the same, which will be the same object as all other atomic forms of equal aggregates (as per ==). regular expression matches (~ !~ ~~ ~~= ~~~) These binary operators perform regular expression matches. In all cases one operand must be a string and the other a regular expression. The operator ~ performs the match and returns 1 or 0 depending whether the string did, or didn't match the expression. Likewise for !~ with opposite values. The operator ~~ matches the string and regular expression and returns the portion of the string matched by the \(...\) enclosed portion of the regular expression, or NULL if the match failed. The ~~= operator is the equivalent assignment operator and follows the usual rules. The ~~~ operator matches the string and the regular expression and returns an array of the portions of the string matched by the \(...\) portions of the regular expression, or NULL if the match failed. (This may move to a function.) assignment operators As previously mentioned, assignment always sets a storage location to a new object. The old value is irrelevant (although it may have been used in the process of a compound assignment operator). Thus there is no implicit cast on assignment, so assigning an int to what is currently a float will result in an int. Assigning to a currently unknown variable will implicitly declare the variable as static. other binary operators The usual C binary operators work as they do in C and on the same range of types. In addition: The == and != operators work on all types. Arrays and structures are equal if they contain the same objects in the same positions. The + and += operators will concatenate string, arrays and structures (in the last case, where identical keys occur the values of the right hand operand take precedence). The << and <<= operator will shift an array, loosing elements from the front and shortening the array as a whole. The <, >, <=, >= operators work on strings, making lexical comparisons. VARIABLES, SCOPES AND INITIALISERS There are exactly three levels of scope. Extern (visible globally by all code), static (visible by code in the module), and auto (visible by code in the function). The variables in the first two are persistent and static. Auto variables have a fresh instantiation created each time a function is entered, and lost on exit (unless there are references to them). Implicitly declared variabled are static. All types of declarations may occur anywhere, they are simple statements unlike in C. They have their effect entirely at parse time and thus produce no code. But the rules about scope still apply. No matter where an extern declaration is made, once it is parsed that variable is visible globally. Similarly once an auto declaration is parsed that variable is visible throughout the scope of the function. Note that initialisers are constant expressions. They are evaluated once at parse time. Even initialisers of autos. Every time a set of auto variables is instantiated (by function entry) the variables are set to these initial values, NULL if there is no initialiser. STANDARD FUNCTIONS The following functions form part of the language definition and should be present in all implementations, including embedded systems. call(func, array) Calls the function with arguments taken from the array. Thus the statement call(func, ["a", "b"]); is equivalent to func("a", "b");. Returns the return value of the function. array(...) Returns a new array formed from the arguments, of which there may be any number, including zero. struct([key, value...]) Returns a structure initialised with the paired keys and values given as arguments, of which there may be any even number, including zero. string = sprintf(format, args...) Returns a sting formatted as per printf(3S) from the format and arguments. All flags and conversions are supported up to System 5.3's. The new ANSI n and p conversions are not provided. Precision and field width * specifications are allowed. Type checking is strict. copy(any) Returns a copy of its argument. A null operation for all types except arrays and structures. To simulate C's structure assignment use: "a = copy(b)" in place of "a = b". Note that this is a "top level" copy. Sub- aggregates are the same sub-aggregates in the copy as in the original. eval(any) Evaluates its argument in the current scope. This is a null operation for any type except strings. For these it will return the value of the variable of that name as looked up in the current scope. exit(int) Exits with the given status. fail(str) Generates a failure with the given message (see Error handling above). float(any) Returns a floating point interpretation of its argument (an int, string or float else it will return 0.0). int(any) Returns an integer interpretation of its argument (a float, string or int else it will return 0). string(any) Returns a string interpretation of its argument (an int, float or string, else it will return the type name in angle brackets). typeof(any) Returns the type name of an object (a string). parse(file/string [,module]) Parses the file or string in a new module, or the context of the given module if supplied. regexp(string) Return the regular expression compiled from the string. sizeof(any) Return the number of elements the object has (Ie. elements of an array or key/value pairs in a struct or characters in a string, returns 1 for all other types). push(array, any) Adds the object to the end of the array, extending it in the process. pop(array) Return the last object in the array and shortens the array by one in the process. It will return NULL is the array is empty already. keys(struct) Returns an array of the keys (ie. member names) of the struct. smash(string1, string2) Returns an array of sub strings from string1 which were delimited by the first character from string2. str = subst(string1, regexp, string2 [, flag]) (Coming soon.) Returns a copy of string1 with sections that matched regexp replaced by string2, globally if flag is given as 1. str = tochar(int) Retuns a one character string made from the integer character code. int = toint(str) Return the character code of the first character of the string. int = rand([int]) Returns a pseudo-random number in the range 0 .. 2^15 - 1. If an argument is supplied this is used to seed the random number generator. string/array = interval(string/array, start [,len]) Returns the interval of the string or array starting at index start an continuing till the end or len elements if len is supplied. Interval extraction outside the bounds of the object will merely leave out the absent elements. array = explode(string) Return an array of the integer character codes of the characters in the string. string = implode(array) Returns a string formed from the concatenation of the integer character codes and strings found in the array. Objects of other types are ignored. file = sopen(string, mode) Returns a file (read only) which when read will return successive characters from the string. module = module(string) Return a new module with its name taken from the string argument. obj = waitfor(obj...) Blocks (waits) until an event indicated by any of its arguments occurs, then returns that argument. The interpretation of an event depends on the nature of each argument. A file argument is triggered when input is available on the file. A float argument waits for that many seconds to expire, an int for that many millisecond (they then return 0, not the argument given). Other interpretations are implementation dependent. Where several events occur simultaneously, the first as listed in the arguments will be returned. Note that in implementations that support many basic file types, some file types may always appear ready for input, despite the fact they are not. unixfuncs() When first called, will define as external functions the unix system interface functions described below (if available). Subsequent calls are ignored. vstack() Return a copy of the variable (scope) stack. Index 0 is the outermost scope. It will contain functions, each optionally followed by a structure of the local variables. (Only for debuggers obviously.) STANDARD EXTERNAL VARIABLES externs A structure of all the extern variables. argc A count of the otherwise unused arguments to the interpreter. argv An array of strings, which are the otherwise unused arguments to the interpreter. (Note this is different from the argument to main, which is a pointer to the first element of this array as it is in C. It is probably easier to use the globals in general.) stdin Standard input. stdout Standard output. stderr Standard error output. OTHER FUNCTIONS The following functions will be present on systems where the environment permits. Missing file arguments are interpreted as standard input or output as appropriate. Pretty obvious, but more details latter. printf(fmt, args...) fprintf(file, fmt, args...) file = fopen(name, mode) file = popen(cmd, mode) /* UNIX only. */ status = system(cmd) str = getchar([file]) str = getline([file]) str = getfile([file]) put(str [,file]) fflush([file]) fclose(file) UNIX FUNCTIONS The following functions will be available on UNIX systems or systems that can mimic UNIX. See unixfuncs() above. They all return an integer. On failure they raise a failure with the error set to the appropriate system error message derived from errno. These interfaces are raw. Use at your own risk. access(), alarm(), acct(), alarm(), chdir(), chmod(), chown(), chroot(), close(), creat(), dup(), _exit(), fork(), getpid(), getpgrp(), getppid(), getuid(), geteuid(), getgid(), getegid(), kill(), link(), lseek(), mkdir(), mknod(), nice(), open(), pause(), rmdir(), setpgrp(), setuid(), setgid(), setgid(), signal(), sync(), ulimit(), umask(), umask(), unlink(), clock(), system(), lockf(), sleep(), /* Rest on the way. */ DATA BASE FUNCTIONS Simple non-indexed, but otherwise fully locked and functional data base support. Not for speed. If your application needs a serious data base, get one, don't use this. Use this for configuration info and all that peripheral stuff. The array's are arrays of strings, which are the fields of a record. The "keyfieldno" is which field number of the record is the key for this operation. The "dbname" is a file name, one table per file. It will be created if it does not exists, but an empty file is ok too. Use UNIX permissions for access control. Read access on read-only files is ok. db_get() returns NULL if not found. More details later. array = db_get(dbname, keyfieldno, value) array = db_delete(dbname, keyfieldno, value) /* Returns old data. */ db_set(dbname, keyfieldno, array) db_add(dbname, array) WINDOWS Upon first reference to any of the window routines standard input is placed in the appropriate modes for non-echoing character at a time input. All input from the terminal should be fetched with w_getchar() and w_edit(). Upon exit (including interrupt) all modes will be restored. win = w_push(line, col, nlines, ncols) Pushes an opaque rectangular window on the screen at the given line and col, which are in screen coordinates. But special values of -1 or -2 for line or col indicate centering or right justification (bottom justification for line) for that aspect of the position. The window will have the given number of lines and columns, unless line or col are less than or equal to zero, in which case they will be that much less than the full screen size. The window is initially clear and on top of all previous windows. w_pop(win) "Pops" the window from the screen; re-exposing anything which the window was hiding. Any window may be popped from the screen, whether it is the top window or not. After a window has been popped it is dead and cannot be put back. Make a new window to do this. Note that if a window is not referenced it will get popped when the next garbage collection occurs, but windows should always be popped explicitly. w_paint(win, line, col, text [,tabs]) Paints the text on the window at the given line and column (in the window's space), with auto-indent on subsequent lines (indicated by a \n character in the text). A string tab specification reminisent of troff (and most word processors) may be given. If supplied it must be a concatenation of tab-specs. Each tab-spec consists of an optional "+" character, followed by a decimal number, followed by an optional leader character, followed one of the letters "L", "C" or "R". If the "+" is supplied the tab position is at a relative offset from the previous one, else it is an distance from the left margin of this text block. If a leader character is given the distance between the current column and the start of the next text will be filled with that character, else a direct motion will be used (use an explicit space leader to clear an area). If an "L" tab is set, the next field of text will start at the tab stop, if a "C" tab is set the next field of text will be centered on the tab stop, and if an "R" tab is set the next field of text will end on the tab stop. The "next field of text" is the text after the tab character up to the next tab, newline or end of string. The last tab-spec in the string will be used repeatedly. Scanning of the tabs starts again on each new line. If no tab specification is given multiple- of-8 column tabs are used, but relative to the start position. For example, a three part title in an 80 column window could be painted with the tab spec "40C80R". win = w_textwin(line, col, text [,tabs]) Pushes a window in the same manner as w_push() (with the same interpretation of line and col) of just sufficient size to hold the given text as it is set by w_paint() with a box around it. It is allowable for column positions in the text being set to have negative numbers during the sizeing phase of this operation. w_mesg(str) Pushes a boxed one line window centred at the bottom of the screen and containing the string. It will be automatically removed after the next keystroke. w_cursorat(win, line, col) Sets the cursor position for this window (in the window's space). When the window is the top window on the screen, the real screen cursor will be at this position. str = w_getchar() Returns the next character from the terminal, without echo and without canonical input processing. For ordinary ASCII characters a one character string is returned. For special keys an appropriate multi character string is returned (currently "F0", "F1" ... "F32", "LEFT", "RIGHT", "UP", "DOWN", "HOME", "END", "PGUP", "PGDOWN"). The screen is refreshed before the waiting for user input. w_ungetchar(str) Pushes a character back. Only one character of push- back is allowed. Only the first 16 characters of the string will be significant (all "characters" returned by w_getchar() are shorter than this). str = w_edit(win, line, col, width, str) Allows traditional editing of an input field at the given position and width and initially containing the given string. Editing will proceed until any unusual character is pressed (that is, not a printing ASCII character or one of the field editing keys such as backspace). At that point the character which caused termination will be pushed back on the input stream and the current text of the field returned. The next call to w_getchar() will return the key which terminated editing. w_box(win) Draws a box around the inside edge of the window. w_clear(win) w_refresh() w_suspend() Restores the terminal to normal modes and moves the cursor to the bottom left. The next window operation will revive the screen. EXAMPLES The following shell command line will print Hello world. ici -p 'printf("Hello world.\n");' The following program prints the basename of its argument: #!ici printf("%s0, argv[1] ~~ #\([^/]*\)$#); The following example illustrates a simple grep like program. The first line makes a Bourne shell pump the program in through file descriptor 3, and passes any arguments to the shell script on to the ICI program. File descriptor 3 is used to avoid disturbing the standard input. This works on all UNIX's but of course 4.2+ and 5.4+ can use #! stuff. Note that errors (such as those encountered upon failure to open a file) are not checked for. The program can be expected to exit with an appropriate message should they occur. exec ici -3 -- "$0" "$@" 3<<'!' extern main(argc, argv) { if (argc < 2) fail(sprintf("usage: %s pattern [files...]", argv[0])); pattern = regexp(argv[1]); if (argc == 2) grep("", stdin); else { for (i = 2; i < argc; ++i) grep(sprintf("%s:", argv[i]), fopen(argv[i], "r")); } } static grep(prefix, file) { while ((s = getline(file)) != NULL) { if (s ~ pattern) printf("%s%s\n", prefix, s); } if (file != stdin) fclose(file); } SEE ALSO awk(1), ed(1), printf(3S), etc. BUGS There is a problem with the right-associativity of ? : stuff. Use brackets when combining multiple ? : operators for the time being. There is an occasional problem with the screen updating with multiple windows. A && or || expression may not result in exactly 0/1 if it gets to the last clause. AUTHOR Tim Long, May '91. ---------------------------------------------------------------------- Returning to the general; My intention was not to replace any of the special purpose tools like the shell, awk, sed etc, nor was it to make a replacement for real programming languages like C. Rather, I regard it as a casual programming tool filling much the same niche as BASIC. As such it doesn't have specific language features dedicated to special tasks (like doing something for each line of input text). But it does (or will) have a broad base of simple primitives to make most routine tasks easy. And of course it is extensible. But you will notice that almost none of its "library" features are the ultimate expression of that area of software technology. In practice every major application has some principle, or piece of software technology, or bit of hardware which is its reason for existence as a product. But products can't run on one leg. Inevitably the endless series of tack-on bits has to be supplied. usually with a great deal of re-invention taking place. I have thought of ICI as assisting in that area. The theory is that if something is a major focus of an application, you won't be using these dicky little features to doit. But for all those other bits, which aren't your real business, you can just use the stuff provided and hack up the rest in a somewhat more amenable programming environment than raw C. Getting back to the language itself... You can easily see from the above how it is like C. What is probably not so obvious is how it is not like C. Here is a grab bag of things to convey some of the flavour. A lot of the usual messing around with strings can be handled by the regular expression operators. The ~~= operator is particularly usful. For example, to reduce a string s which holds a file name to its basename: s ~~= #\([^/]*\)$#; I know it looks a bit insane, but regular expression are like that. I'm not going to apologise for using # rather than / to delimit regular expressions. It was necessary to avoid lexical ambiguity and you get used to it in no time. I don't seem to have written the bit in the manual on error handling. I'll quickly describe it here. The actual syntax of a compound statement is: compound-statement: { statement(rpt) } { statement(rpt) } onerror statement In other words compound statements may have an optional "onerror" followed by a statement. Errors work on the principle that the lower levels of a program know what happened, but the higher levels know what to do about it. When an error occurs, either raised by the interpreter because of something the program did or explicitly by the program, an error message is generated and stored in the global variable "error". The execution stack is then automatically unwound until an onerror clause is found, and execution resumes there. The unwinding will unwind past function calls, recursive calls to the interpreter (through the parse function) etc. If there is no onerror clause in the scope of the execution, the main interpreter loop will bounce the error message all the way out to the invoking system. In the UNIX case this will print the message along with the the source file name, the function name and the line number (which is also available). Although the manual entry doesn't go into that sort of detail it is important to know what things raise errors in what circumstances. But the basic philosophy is that the casual programmer can just ignore the possibility of errors (like failure to open a file) and expect the finished program to exit with a suitable message when things go wrong. The grep program given in the manual is an example of this. One error is checked for explicitly so it can give its own usage message, but failures to open files or syntactically incorrect regular expressions are allowed to fall out naturally. I seem to be wandering a bit here, back to some examples... Functions are of course just another datum. A function called "fred" is just a variable which has been assigned a function. You could re-define the getchar function (even though it is an intrinsic function coded in C) with either: extern getchar() { return tochar(rand() % 256); } OR extern getchar = [{(){return tochar(rand() % 256);}}]; The second it a little perverse, but function constants make more sense in examples like: sort(stuff, [{(a, b){return a < b ? -1 : (a > b ? 1 : 0);}}]); Where the sort comparison function is given in-line so you don't have to go chasing all over the code to find the two line function. (There is a growing library which contains functions like sort, but it is not in a fit state for discussion yet.) They also make more sense when doing object oriented stuff. Suppose you want to define a set of methods in a type. You can just assign the functions directly into the type with: static type = struct(); type.add = [{ (a, b) { return ....; } }]; type.sub = [{ (a, b) { return ....; } }]; Or you could build it in one hit like: type = [< add = [{ (a, b) { return ....; } }], sub = [{ (a, b) { return ....; } }], >]; The variable argument support handles all possibilities. One nice example of it's use comes from the way libraries are done. Because code is parsed at run-time, you don't want have to parse thousands of lines of libraries for every one line program. Instead, a library will just define stub functions, which invoke a standard (library) function called autoload(). They look like this: extern sort() {auto vargs; return autoload("sort", sort, vargs);} Because the function has an auto variable called "vargs", any unused arguments (ie. all of them) are assigned to it. These are than passed on to autoload. The arguments to autoload are a file name (it will prefix it with the standard lib dir), the function being re-defined and the arguments. It will parse the file, check that it redefined the function and then call it with the arguments. From then on of course the new function is defined and the old one gets garbage collected like all lost data. The loaded file could define several functions, and any autoload definition they have will also be replaced a the same time. The current version of autoload looks like this: /* * Parse the given file and transfer control to the newly loaded version * of the function as if that was what was called in the first place. * A loaded file can define more than one function. They will all * be replaced on the first load. See examples below. */ extern autoload(file, func, args) { auto stream; file = "/usr/ici/" + file; parse(stream = fopen(file, "r")); fclose(stream); if (func == eval(func.name)) fail(sprintf("function %s was not found in %s", func.name, file)); return call(eval(func.name), args); } Notice that it references a sub-field of the function like a structure field. This is something that the manual entry doesn't go into details about but you can do things like that. A function, for instance, has sub fields of: "name" a name for the function (for simple declarations this is the name the function was first declared as), "args" an atomic array of the declared formal parameters, "autos" an atomic struct of all the autos and their initial values, and there are a few other fields too. Also notice how it uses the "eval" function to check the value of a variable whos name is determined at run time, and then its use of the call function to call a function with a run-time determined variable argument list. Again notice that it doesn't need to worry about any errors except those it wants to check for explicitly. The others will happen correctly automatically. This one feature can save a lot of code. The sequence of operations on function entry is very deliberate and you can do some neat things with it. In particular, formal parameters are just auto variables which are initialised with the corresponding actual parameter. But they are initialised with this after the explicit initialisations have been applied. Thus you can use an explicit initialisation to give a default value to an argument which is optional, without messing about with the "vargs" variable. For example: static getstuff(file) { auto file = stdin; .... } Structure keys (and switch statements which use a struct) work on the key being the same object as the tag. Thus switching on strings, ints, floats, functions etc. is fine. But you can also use aggregate keys by always using atomic versions of them: switch (@array(this, that)) { case @["one thing", "the other"]: ... case @[1, 2]: ... case @[x, y]: ... } You will notice that because things refer to other things, rather than actually holding them, you use pointers far less often than you do in C. In fact you can start to treat structured data types in a much more casual fashion. I have hardly scratched the surface here, but this is getting a bit long so I'll terminate this section. A few practicalities: on my 386 the initial load image (text+data) comes in at around 110K (85K text + 25K data, of which a disconcerting amount comes from curses, even though all I want it to do is read a terminfo entry. After that, time and space is as proportional to the needs of the program as I could make it (These sort of interprerative languages often have nasty non-linear time or space performance characteristics due to garbage collection and stuff. I have tried to be careful to avoid this sort of behaviour.) For some tasks memory use can be better than expected, because of object sharing... Memory is only needed to hold distinct atomic objects, so although technically there are reasonable memory overheads for, say, an integer, in practice most programs don't have very many distinct integers at any given point in time. After the first instance of a given number you are only paying the overhead of the storage location which refers to each additional reference, which is 4 bytes for array elements and 8 bytes for structure elements. In fact it can happen that large arrays of floating point numbers (which are 8 bytes each) can occupy less space than you would at first expect. I have been thinking of shifting integers to 64 bits, because there would be no overhead in memory use (they already use the same size data block as floats) and I suspect the performance loss would be marginal. But more to the point, 32 bits is just not enough. (A set of good portable 64 bit routines will be gratefully accepted.) I think I have mentioned that it also designed for embedded systems. This means that: a) It is easy to link the interpreter into other C programs, there are as few external symbols as I could manage and it uses just a few classic library functions. b) It is easy to write intrinsic functions (ie. functions written in C which can be called from ICI code). c) It is easy to call ICI functions from C (although at the moment there is slightly more overhead than the inverse direction). d) Where necessary, additional types can be introduced without disturbing the rest of the interpreter. (An example of this is the character based screen handler. It is done in a single source module with only one reference to it (in a configuration array), yet its "window" type integrates fully with the rest of the interpreter.) I think this will have to do for now. I'll post the source, the manual and some sample programs somewhere soon. By the way, I have always regarded designing a programming language as the height of arrogance. And I can only defend this by saying I did it for me. -- Tim Long tml@extro.ucc.su.OZ.AU
phil@ux1.cso.uiuc.edu (Phil Howard KA9WGN) (06/12/91)
tml@extro.ucc.su.OZ.AU (Tim Long) writes: >1) To have a freely available general purpose interpretive language on >UNIX systems. (As opposed to the many more special purpose ones such >as awk and the shell). This can be re-phrased as: To have a UNIX >language like DOS has BASIC. First thing I do after installing DOS on a PC is find BASIC and erase it. -- /***************************************************************************\ / Phil Howard -- KA9WGN -- phil@ux1.cso.uiuc.edu | Guns don't aim guns at \ \ Lietuva laisva -- Brivu Latviju -- Eesti vabaks | people; CRIMINALS do!! / \***************************************************************************/
ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (06/15/91)
In article <1991Jun11.173907.28331@metro.ucc.su.OZ.AU>, tml@extro.ucc.su.OZ.AU (Tim Long) writes: > NAME > ici - General purpose interpretive programming language > > SYNOPSIS > ici [ file ] [ -f file ] [ -i prog ] [ -digit ] [ -l lib ] [ args... ] > > -f file The file is parsed as an ICI module. > > -i prog The prog argument is parsed directly as an ICI > module. I for one would find it rather less confusing if you used the same option name as AWK and sed, namely "-e", as in the following examples I just tried: awk -e 'END {print "Hello, world."}' </dev/null (echo a; echo b; echo c) | sed -n -e 1p -e 2p > -digit An ICI module is read from the file descriptor > digit. May I suggest a slightly more long-winded but rather prettier scheme? Allow a file name (anywhere at all) to have the form /dev/fd# where # is an integer with however many digits it needs. Some research versions of UNIX already support this directly. People familiar with it won't thank you for introducing a new notation. And it takes less than half a page of code to implement your own "f_or_fd_open(string, mode)" function in C, and then use that throughout the implementation of ICI instead of fopen(). [I have done this, and know what I'm talking about.] > structure) can be obtained. Several of the intrinsicly That's intrinsically ^^^^^^^^^^^ > int Integers are 32 bit signed integers. All the usual C > integer operations work on them. When they are > combined with a float, a promoted value is used in the > usual C style. Integers are atomic. Oh *no*! What's the good of using an interpreted language if it only gives me 32-bit integers? If you use any of the PD or redistributable bignum packages around, then it is *EASY* to provide bignum arithmetic in an interpreter. Yes, the bitwise operations &, |, ^, ~ all make perfect sense on integers of any size, and if we define x << y = floor(x * 2**y) x >> y = floor(x * 2**(-y)) then even the shifts make sense. (The shifts won't agree with C, but then shifts in C aren't as portable as you might think.) *Please* give very serious consideration to bignums. For a scripting language, why the flaming xxxx should I *care* what size a register is? > Note that initialisers are constant expressions. They are > evaluated once at parse time. Even initialisers of autos. Why? The restriction to constant initialisers for static and external variables in C made sense, because the initialisation was done by the linker. But that doesn't apply to ICI. About 80% of my initialisations to auto variables in C are -not- constant expressions. Why introduce a restriction that an interpreter like ICI doesn't need and that doesn't give the ICI programmer any extra safety? > The array's are arrays of strings, which are the fields of a The array's what? > EXAMPLES > The following shell command line will print Hello world. > ici -p 'printf("Hello world.\n");' The manual page said nothing about a "-p" option. > The first line makes a Bourne shell pump the > program in through file descriptor 3, and passes any > arguments to the shell script on to the ICI program. I tried something like that on an Apollo once. Didn't work. The shell already had several descriptors other than 0, 1, and 2 open. > A few practicalities: on my 386 the initial load image (text+data) > comes in at around 110K (85K text + 25K data, of which a disconcerting > amount comes from curses, even though all I want it to do is read > a terminfo entry. Surely you can get at a terminfo entry by using just -ltermlib; you don't have to load -lcurses as well. According to the SVID the "Terminfo Level Routines" are setupterm(), tparm(), tputs(), putp(), vidputs(), vidattr(), and mvcur(). setupterm() defines a bunch of variables. If you just use those routines, you shouldn't get much else from Curses. If not, complain. -- Q: What should I know about quicksort? A: That it is *slow*. Q: When should I use it? A: When you have only 256 words of main storage.
dws@margay.cs.wisc.edu (DaviD W. Sanderson) (06/15/91)
In article <6354@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes: >Allow a file name (anywhere at all) to have the form > /dev/fd# >where # is an integer with however many digits it needs. Some research >versions of UNIX already support this directly. And SVR4.0. I do NOT think you mean /dev/fdN, but rather /dev/fd/N. This is how it is in Research UNIX and SVR4. Also, systems that support /dev/fd typically have synonyms for the standard file descriptors: /dev/stdin == /dev/fd/0 /dev/stdout == /dev/fd/1 /dev/stderr == /dev/fd/2 -- ___ / __\ U N S H I N E DaviD W. Sanderson | | | I N E dws@cs.wisc.edu _____| | |_____ ________ \ / \ |__/ /////__ Fusion Powered Locomotives Made to Order \____/ \__|_/ \\\\\______ (TARDIS model available at extra cost)