byron@crux.Princeton.EDU (Byron Rakitzis) (03/05/91)
(the news software at Texas A&M is down right now, so I am posting from an old account at Princeton. Please reply to byron@archone.tamu.edu) I'm announcing the availability of my own implementation of rc, the AT&T plan 9 and Unix v10 shell in use at Bell Labs. I'm releasing "version 0.9", which means that rc has seen a lot of testing by a few people, but that now it's time to freeze any development completely and let a larger number of users shake the remaining bugs out. When this happens, I will (with hope, in the near future) release "version 1.0". What is rc? rc is a small shell, similar to the Bourne shell. It has powerful variable manipulation primitives, however, which makes it a very useful shell language. It is *not* another bash/ksh/tcsh which tries to do everything but fetch your slippers. You can use it interactively (as I have been doing for the last several months) or you can use it to write fast-starting and easy-to-read shell scripts; rc's syntax is based on C, much more so than the so-called C-shell. Where can I get rc? The shell is available by anonymous ftp from archone.tamu.edu, in ~ftp/pub/rc. I also honor personal email requests for an email copy. How do I find out more about rc? I've enclosed here an introduction to rc which comes with the rc source distribution. It is an outline of its main features; some experience with Unix shells is assumed. You can also read about AT&T's rc in the Unix Research 10th edition manuals. These are available in bookstores in two volumes, published by Saunders College Publishing. Enjoy. Byron Rakitzis. ----------- AN INTRODUCTION TO RC Byron Rakitzis (byron@archone.tamu.edu) rc is the AT&T plan 9 shell. I have taken a copy of the published manual pages (UNIX research system, 10th ed.) and developed my own public implementation from the description of rc in these documents. What follows is a short introduction to the features of rc, in order to underscore the differences between this shell and the standard sh and csh. rc is a shell similar in spirit to sh: on my Sun-4, rc compiles to a 72k stripped, statically linked executable; this is even smaller than the Bourne shell. rc's power lies not in the fact that it has a large number of features (it does not) but in that it is small, fast and predictable. What features it does have are very general and powerful. rc's parsing is performed by a yacc-generated parser, so the precise syntax of rc is no mystery to its users, as in the case of sh and csh. VARIABLES: The area in which rc differs most (apart from syntactically) from the other shells is in the way variables are implemented. Internally, variables are linked lists of words. Thus: a=(this is a list of words) assignes a 6-element list to $a. The parentheses are necessary for grouping, but they are stripped during parsing; this definition of a is no different from: a=(this (is a) (list) of words) Quoting in rc is performed using a single quoting character: the single quote ('). There is no backslash-quoting, and no double quoting. To type a special character, include it in quotes. To type a ' inside quotes, type it twice: ; echo 'How''s it going?' How's it going? Backslash is special only at the end of the line, for traditional backslash-continuation. So, type: ; tex foo \bye not ; tex foo \\bye A quoted string is treated as a single word in rc. Thus: ; a=(this is a list of words) ; echo $#a 6 ; a='this is a list of words??' ; echo $#a 1 Lists (and the variables that represent them) may be concatenated with the '^' operator. Concatenation behaves according to these rules: if two lists have the same number of elements, then they are joined pairwise. Otherwise if one of the lists has one element, the concatenation is distributive. Any other combination is an error: ; echo (one two three)^.c one.c two.c three.c ; echo (one two three)^(.a .b .c) one.a two.b three.c Sometimes the ^ operation can be gotten for free, for example, by juxtaposing two variables together, or by juxtaposing a variable with a word: ; opts=(O g c) ; files=(alloca malloc talloc) ; cc -$opts $files.c results in the execution of the command cc -O -g -c alloca.c malloc.c talloc.c (the full (quirky) semantics of this free concatenation are described in the Unix v10 manuals. Suffice it to say that most of the time it does the job right. This is one of the (few) unclean parts of rc) An uninitialized variable has the value of the null list, (). Note that this is different from the *string* ''. The first is represented internally by a null pointer, the second by a pointer to a null string. So: ; a=() ; echo $#a 0 ; a='' ; echo $#a 1 Variables are exported into the environment by default; in fact, there is no way to create a variable private to rc. This obviates the need for "setenv" and "export" keywords and makes variable handling simpler and cleaner. [The jury is still out on this; this may or may not be a good thing. On a Sun, it seems to be the right thing, given that the environment can be of arbitrary size.] Variables may be subscripted also: $foo(2) refers to the second element of foo. Subscripts are lists of words, so $foo(2 2 2 1 1 1) refers to a six element list made up of three foo(2)'s and three foo(1)'s. Variables can be "defererenced" with more than just one level of indirection: ; a=foo ; b=a ; echo $$b foo My rc also offers an extra feature for manipulating variables: it is an operator ($^) which provides a one-word list comprised of the elements of a variable space-separated. Thus: ; echo 'foo'$^path^'bar' foo. /u/byron/bin/sh /u/byron/bin/sun4 /usr/arch/bin /usr/ucb /binbar this operator was provided since the only way to do this in "classic" rc is to type the non-obvious command: ; ifs=() echo 'foo'^`{echo -n $path}^'bar' The above example raises a point about variables; as in the Bourne shell, a variable may be made local to a command simply by preceding the command with an assignment. This notion has been extended in rc to allow an arbitrary number of local assignments on the same line. Thus: a=foo b=bar c=baz { # these are local ... etc. } FUNCTIONS: rc also offers shell functions, to replace csh aliases and sh shell functions. rc functions take the form: fn foo { definition } where definition is a sequence of rc commands. $* is set to the argument list of the function for the duration of the command. As an example, a function I usually set up is: ; fn l { ls -FC $* } Shell functions are better than aliases in a number of ways: for one, the definition may be an arbitrary rc script; parsing is performed up to the closing brace. The definition need not fit all on one line. The definition of the shell function is also exported into the environment. Thus functions (aliases) can be preserved in subshells without the costly re-interpreting of a .cshrc-like file. Functions may be deleted by supplying no definition: ; fn l deletes the above definition. (A note: commands inside functions need semicolons after them ONLY if they are grouped many-per-line. It's just like interactive rc; this is a difference from sh where the ; are mandatory. Thus: fn foo { echo one echo two echo three } is perfectly valid.) An addition to att's rc is the "return" keyword, which returns from a function with a given status. Thus: fn foo { if (! test -f bar) return 1 ... etc. } CONTROL FLOW: rc's syntax is very clean and simple. Grouping is performed with bracing, and a brace-delemited command will always work where a single command suffices. if (command) command else command while (command) command switch (word) { case pattern command case pattern command ... } for (word [in words]) command ! command command && command command || command Let's start with if: if behaves in the usual way, except when it comes to the treatment of else. The problem is this: rc parses no differently when it's reading a file or when it's reading a terminal. Therefore, there is no way after reading if (grep foo bar) { echo zoo echo goo foo } to tell if there is going to be an "else" keyword after the '}'. The "solution" ATT used is to have a special "if not" command which looks back at the last "if" to see if it succeeded or not. If it did not succeed, then the body of the "if not" command is executed. That solution is clumsy, and has been changed in this incarnation of rc. "if" now has an optional "else" clause, with the proviso that else must immediately follow a close-brace: if (foo) { ... ... } else { ... ... } while works like C's while: while (test) command or while (test) command or while (test) { command ... } and so on. switch behaves as follows: the word in parentheses is looked for in the patterns specified in each of the case statements. The first match that succeeds will cause the body of that case statement to be executed; after that the switch statement terminates. There is no falling-through case statements as in C: switch ($foo) { case goo one case f* g* two case * three } In the above example, if $foo has the value 'goo', 'one' is executed. If $foo begins with an 'f' or a 'g', then 'two' is executed, otherwise 'three' is executed. A note: if $foo has more than one element, then it is a sufficient condition for matching purposes if one of the elements matches the pattern. Thus, 'goo' matches the list '(foo goo zoo)'. for works like sh "for", only the syntax is C-like. So: for (i in one two three) echo $i prints "one\n two\n three\n". If you leave out the "in foo", "in $*" is implied. So to parse the arguments to a shell script: for (i) switch($i) { case -l foo ... } The "break" keyword breaks out of the innermost "for" or "while", similarly to C. Since case statements do *not* fall through, "break" does *not* break out of a switch statement. ! negates the exit status of its argument. Useful in if and while statements. Like the C operator. || and && behave like || and && in sh: foo && bar executes bar if and only if foo is successful. foo || bar executes bar if and only if foo fails. JOB CONTROL: As in sh, job control is primitive. Terminating a command with & starts it asynchronously. The pid of the job is printed the value of this job is assigned to $apid. Some changes over csh: & and ; are terminators in rc syntax, so: ; long command; echo finished & does not have the desired effect; in order to do this, type: ; {long command; echo finished} & in the first case, 'long command' is performed before 'echo finished &' is interpreted (the semicolon terminates the command). Jobs may be placed in subshells using the @ operator: ; @{cd ..; echo this command is being performed in ..; foo} performs the command in a subshell. Note that @ is almost never used in rc, since most of the time a set of braces will implicitly define a subshell, for example, in the command: tar fc - foo | {cd /elsewhere; tar fx -} REDIRECTIONS AND PIPES, ETC.: redirections and pipes are as usual (< and >, << and >>) but some features have been added/changed: foo |[2] bar has the effect of piping file descriptor 2 (stderr) from foo to bar. Similarly: foo > bar >[2] zar has the effect of placing standard out in bar and standard error in zar. File descriptors can be copied also: echo 'this is appearing on standard error' >[1=2] and pipes may pipe arbitrary file descriptors on both sides: foo |[8=12] bar has the obvious effect. To put both stdout and stderr in the same file, use foo >bar >[2=1] *not* foo >[2=1] >bar (this is the same redirection weirdness that sh has) Backquote is a unary operator in rc: echo `date prints the output of date. To execute a complex command (longer than one word), include it in braces: echo `{cat /etc/motd} $ifs is used to split the output of ` into words. $ifs defaults to space-tab-newline. Thus: `{echo one two three} returns a three-element list if $ifs is set to its default value. One advantage of the fact that backquote is unary is that quoting of backquotes inside backquotes is no longer necessary: ls `{foo | grep `{bar zar}} There is also a new kind of redirection in rc: it allows one to specify a command as an argument to another command. Thus: wc <{cat /usr/dict/words} is functionally equivalent to cat /usr/dict/words | wc It works by opening a FIFO in /tmp (for systems that have FIFO's) or opening the appropriate file descriptors in /dev/fd (for systems that have /dev/fd) and passing that fifo or fd as an argument to the command. This form of redirection is perhaps most commonly used with "diff" to compare the output of two programs: diff <{foo} <{bar} Note that diff is reading from pipes, not files, so it must not lseek() on its input; some diffs do this, so the above example will not work. (This is not rc's fault; it is a feature of UNIX pipes) cmp generally does not. My favorite use of this redirection so far has been to find which C files do *not* contain a particular instance of '#include "foo.h"': diff <{ls *.c} <{grep -l '#include "foo.h"' *.c} RC VARIABLES here are the variables rc uses to guide its execution: $prompt holds the prompt to print. $prompt(1) and $prompt(2) are the rc eqivalent of PS1 and PS2 in sh. $home is the home directory of the user. This is aliased to $HOME, so assigning a value to one directly assigns the value to the other. $path is a list of path elements to search. it defaults to (. /bin /usr/bin) or something similar (/usr/ucb is there on berkeley machines). This value is aliased to PATH so typing in path=(. /bin) will have the effect of assigning PATH=.:/bin as well. $cdpath is a list of directories to search in order to change directories. $pid is the pid of the shell. $apid is the pid of the last job started with &. $ifs defines the field splitting characters for backquote substitution. $status is the exit status of the last command executed. This is used to guide the execution of rc in if, while, && and || as well. If the last command ended with a signal, $status is set to the lowercase unix-header-file name for that signal. For example, a segmentation violation will assign "sigsegv" to $status. A core dump will append "+core" to $status. In addition, in a pipeline $status is set to a list of the exit statuses of all the pipeline components: ; ls | wc 50 50 365 ; echo $status 0 0 $history defines a file for rc to append each command before executing it. This is the ATT-sanctioned way of doing history. It has its merits; you can run grep or other unix commands on your history file, for example. Eventually rc will be bundled with some standalone "history" programs to facilitate full history. BUILTINS rc has the following builtins: echo: performs a basic echo; echo -n is supported. cd: changes direcrory to the argument supplied. If the path cd takes is found by searching cdpath, then this path is printed. break: breaks from the innermost for or while loop return: returns from a function with the supplied status, or $status if none is given. limit: similar in spirit to csh limit, but the implementation is a little cruder. limit works as csh limit, but the limit name must be typed in full. To restore a resource to "unlimited", type "limit <resource> unlimited", as in: limit memoryuse unlimited umask: prints (without arguments) or sets (with an octal argument) the file creation mask in octal. exec: replaces rc with the specified command. exit: does the obvious. exit with an argument exits with that exit status. shift: deletes the specified number of words from $*. Defaults to 1. builtin: executes the specified builtin. Useful if a builtin has been redefined as a function. For example: fn cd { builtin cd $* && prompt=`newprompt } wait: waits for the specified pid, otherwise for all child processes belonging to rc. whatis: prints the definition of the supplied arguments. This can be the definition of a variable, a function, or the pathname of an executable. Supercedes 'which' in utility. eval: re-interprets its arguments as a space-separated list of words. For example: a=b eval $a '=' 1 has the effect of assigning '1' to $b. ".": like sh "."; the filename supplied is interpreted in the current shell. "~": "~" is not a builtin, but it is built in to rc; it is a keyword which replaces the some of the functionality of /bin/test. It works as follows: ~ subject pattern [patterns ... ] This sets $status to true if and only if the subject is matched by one of the patterns. For example, an idiom for checking to see if an rc variable is set is: if (! ~ $#foo 0) which may be read 'if the count of $foo's value does not match '0', then..' Note that rc does not have any datatypes other than the list; $#foo supplies the word count of $foo as a single-element list. Note that the above example is equivalent to if (! ~ $foo ()) # the parentheses () denote a null list SIGNALS: rc can be told to catch certain signals and to invoke a shell function on them. This is accomplished by writing a shell function by the name of that signal in lower case. A typical use might be: fn sigint sigterm sigquit { echo 'interrupted; cleaning up and exiting' >[1=2] rm /tmp/foo.$pid.* exit 1 } Setting a handler to the value {} causes that signal to be ignored. Deleting the definition of a signal causes the handler to return to its default value. This scheme has an advantage over sh "trap" in that syntax errors are caught at parse time, not execution time. GLOBBING AND RELATED ISSUES: rc's globbing mechanism is a little different than usual. For example: ; a='*' ; echo $a * For a metacharacter to work (*, ? and [ are the usual shell metacharacters) then it must appear literally and unquoted. This is a consequence of the fact that rc input is scanned only once, at the very beginning. All subsequent interpretation (for example, in backquotes or variable substitutions) comes from the parse tree. In order to get rc to explicitly re-scan its arguments, use eval; that's what it's there for. A note on metacharacters: [a-z] denotes the usual character class matching, but because '^' is the concatenation operator in rc, character classes must be negated with a different character. ~ is used, so echo [~a-z] matches all files of one character not consisting of a lowercase letter. OPTIONS -l source $home/.rcrc before normal execution. Having argv[0][0] set to - also causes .rcrc to be sourced. -e exit on nonzero exit status. -i start an interactive shell; print $prompt(1) before every command is read, and don't exit on interrupts SIGQUIT and SIGINT -v echo input on file descriptor 2 -x echo commands before they are executed -d do not catch SIGQUIT for debugging purposes, so that SIGQUIT will cause a core dump. -c use the following string as the command(s) to execute. Use the remaining arguments to set $*. FURTHER READING I refer the reader to the Unix v10 manuals, available in many bookstores. They are published in two volumes by Sanders College Publishing, and they make for good reading in their own right. In those volumes you can find a man page for rc, and a paper on rc. Any differences between Tom Duff's rc (*as described in those papers*) and mine should be documented in the file FEEPERS. -----------