guido@mcvax.UUCP (Guido van Rossum) (02/09/84)
Here are some rules for writing shell scripts in such a way that they are more readable, robust and still not too slow. 1. If at all possible, use the Bourne shell (/bin/sh), not the C shell (/bin/csh), even if your login shell is the latter. The Bourne shell has a better way of treating multi-line control structures (if, for, case etc.) and better substitution rules. Bourne shell scripts are also easier to port to other sites than C shell scripts are. (Exception: then C shell is superiour if lots of arithmetic must be done.) 2. Use blank lines, comments and indentation like you would in C or Pascal programs. 3. Whenever parameter substitution (e.g., $1) or variable substitution (such as $HOME) are used, decide whether there should be double quotes ("") around it. If it is expected that there may be embedded blanks in the actual value (theoretically this can even occur in filenames) and it is passed as a single argument to another program, quote it! Also note the very useful difference between "$*" and "$@", which both expand to the concatenation of all arguments. When passed to another program, "$*" is always one argument; "$@" is as many arguments as there were originally, if there was at least one. $* (without quotes) omits empty or blank arguments (created by passing, e.g., "" as argument) and splits arguments in thwo when they contain blanks. (Note that "" passed as a file name means the current directory, which is almost certainly not what was meant!) E.g., to print all arguments, by default the standard input: case $# in 0) print;; *) print "$@";; esac 4. Use "case" for string comparisons rather than "if". That is, to see e.g. whether $1 equals "-a", use: case "$1" in -a) then-part;; *) else-part;; esac rather than if [ "$1" == "-a" ]; then <then-part> else <else-part> fi The reason is mainly that "[" "]" executes as a separate process, while the case is executed by the shell. (I know that some shell derivates avoid the extra process in this case; but the vanilla V7 Bourne shell is the subject of this article.) 5. Avoid the commands "true" and "false". They are implemented through separate processes. The next paragraph shows an alternative for "while true": 6. Be careful to design a parameter convention which mimics that of other well-known Un*x programs; e.g. command [-flag] ... [file] ... . If files can be given as arguments, the script should read its standard in put instead. Example of how to program this: while : # ":" is a built-in do-nothing command do case "$1" in -a) <process-a-flag>; shift;; -b) <process-b-flag>; shift;; -*) echo "Usage: $0 [-a] [-b] [file] ..." 1>&2; exit 1;; *) break;; # breaks out of loop esac done And then proceed with the strategy pointed out in paragraph 3. I could go on indefinitely with this, but try to stop here. Any comments? Contrary visions? Other hints? (Flames?, I would add...) Guido van Rossum Centrum voor Wiskunde en Informatica, Amsterdam ...!{decvax,philabs}!mcvax!guido
guy@rlgvax.UUCP (Guy Harris) (02/11/84)
A couple of minor points: 1) /bin/[ should be linked to /bin/test (on non-USG systems) in order to make if [ "$1" = "foo ] then ... else ... fi work; I have seen systems in which /bin/test (which is documented in the V7 manual) works but /bin/[ (which isn't documented, but works if the link is made) doesn't. 2) The "#" comment convention is only in the 4.xBSD and USG shells; the standard V7 shell only implements ":" comments - NOTE that it's not a real comment, but a command which throws its arguments away and returns an "exit status" of 0 (which is why "while :" works). You can't say things like : This isn't valid because the shell gets upset at the unbalanced single quote. Guy Harris {seismo,ihnp4,allegra}!rlgvax!guy
mce@teldata.UUCP (Brian McElhinney) (02/12/84)
*Sigh* I agree that sh is much more portable, but reading sh scripts is painful... "case" and "esac"??? UNIX supports C, a standard UNIX shell should at least resemble C. I never have understood why the Bourne shell looks like Algol. (Not that I think a change is possible, just that this is one more reason UNIX is not easily accepted)
stan@teltone.UUCP () (02/12/84)
> From: mce@teldata.UUCP (Brian McElhinney) > > *Sigh* I agree that sh is much more portable, but reading sh scripts > is painful... "case" and "esac"??? UNIX supports C, a standard UNIX > shell should at least resemble C. I never have understood why the Bourne > shell looks like Algol. (Not that I think a change is possible, just that > this is one more reason UNIX is not easily accepted) Instead of case ... in ... esac you can do case X { X) . . } You can also replace the for .. [ in ... ] do ... done with for .. [ in ... ] { ... }. This doesn't work for the while loop though (darn!). This works on 4.1bsd and Venix Bourne shells, but I don't know how portable it is to other versions.
wolfe@mprvaxa.UUCP (Peter Wolfe) (02/15/84)
I agree with most of your comments except that they are very specific to the Bourne shell. You brush of C-shell as being very 'unstructured' - I beg to differ but I find that C-shell syntax is more obvious to a C programmer than shell. In practice I use the Bourne shell for small scripts that don't do a lot of complicated logic (ie. to avoid another process) because it is faster than C-shell in execution (yes even if C-shell has -f in command line). I have written some C-shell scripts which are 5-7 pages long and found C-shell helpful in doing what I want in terms of filename manipulation, logic expressions etc. I feel that yet another shell would be appropriate for the UNIX(tm) environment. This shell would allow me to do most of the things I can do in 'C' (eg. subroutines, local variables, file i/o easily) and also be able to be 'compiled' to execute as fast as possible. It doesn't need (in my opinion at least) all the user interface stuff of the history mechanism and event specification of C-shell. (I guess I am dreaming - but why not) -- Peter Wolfe Microtel Pacific Research ..decvax!microsoft!ubc-vision!mprvaxa!wolfe
guy@rlgvax.UUCP (Guy Harris) (02/17/84)
A reply to several articles: 1) the #! /bin/csh -f construct is not only not portable because of the kernel change to run shell files being a Berkeleyism, but because the C shell isn't on all UNIX systems. The Bourne shell *is* on all UNIX systems worth talking about in this day and age. 2) The C shell resembles C about as much as the Bourne shell resembles Algol 68, so claims that the C shell is better than the Bourne shell because it looks more like the UNIX implementation language are bogus. 3) > Instead of case ... in ... esac you can do case X { X) . . } > You can also replace the for .. [ in ... ] do ... done > with for .. [ in ... ] { ... }. > This doesn't work for the while loop though (darn!). > This works on 4.1bsd and Venix Bourne shells, but I don't know how > portable it is to other versions. It works on the System III shell, and probably will work on any V7 or post-V7 Bourne shell. 4) > The Bourne shell resembles Algol apparently because S. R. Bourne likes it. > The source code in C is written in the same style, with #define's for > IF, ELSE, and so on. I find it difficult to read. E.g: Yes, Bourne likes Algol 68. He wrote a compiler for Algol 68C, which I believe was for the Cambridge University CAP machine. I think he may have written a PDP-11 UNIX Algol 68 compiler, and an associated debugger called - surprise, surprise - Algol DeBugger, or "adb" for short... he definitely wrote "adb", as one can tell by the same heavy use of the "let's make C look like Algol 68" #defines. PDP-11 "adb" does have a "$a" command to print an "Algol 68 stack trace". By the way, those #defines make it *very* difficult to find unmatched IF...FI pairs and the like, as the error messages are confusing due to wierdities in line numbers and the like. Then again, you have to admire someone who uses the same technique for growing a processes' data space as UNIX uses for growing a processes' stack space... Guy Harris {seismo,ihnp4,allegra}!rlgvax!guy
gwyn%brl-vld@sri-unix.UUCP (02/22/84)
From: Doug Gwyn (VLD/VMB) <gwyn@brl-vld> The Bourne shell is not only much faster than the Cshell, it is more suitable for programming scripts for several reasons, the main one of which is the ability to redirect I/O of control loop commands. There is a form of "subroutine" in the UNIX System V Release 2 Bourne shell. The only thing that the Bourne shell still does not have that the Cshell does that is worth having is a "history" mechanism. Korn put a hack into the shell to support a generalization of the idea of command history editing; he wrote the last several commands to a file and then invoked the editor (your choice via the EDITOR environment variable) to allow you to edit the history before it was re-executed. Some people would include "job control" in the Cshell advantages but this doesn't matter if you have a nice terminal like a Teletype 5620.
gwyn%brl-vld@sri-unix.UUCP (02/26/84)
From: Doug Gwyn (VLD/VMB) <gwyn@brl-vld> Here are two important considerations for people writing Bourne shell scripts that may have to be run by other people who perhaps are on a Berkeley UNIX using the Cshell as a command interpreter: (1) The first line of every Bourne shell script should be: #!/bin/sh (2) Before invoking ANY system commands, set the expected command search path. This is usually: PATH=/bin:/usr/bin but on BRL UNIXes, UNIX System V compatible shell scripts must use the following since /bin and /usr/bin may have incompatible commands such as "echo" and "pr": PATH=/usr/5bin:/bin:/usr/bin
matt%ucla-locus@sri-unix.UUCP (02/26/84)
From: Matthew J. Weinstein <matt@ucla-locus> Two notes: I believe that Bourne shell IS the default on a BSD system. The #! may not be recognized on non-Berkeley systems. Better that it be added locally if it's wanted... If PATH varies among Unices, it might be better to define all of the programs you are going to use at the top of the shell script, as one does in make scripts; e.g.: SORT=/bin/sort set SORT = /bin/sort SED=/bin/sed set SED = /bin/sed The owner of the target system can easily tell what has to be changed; this also makes the script writer think about it too... Maybe there should be an ``Elements of Shell Programming Style'' ... - Matt
gwyn%brl-vld@sri-unix.UUCP (02/26/84)
From: Doug Gwyn (VLD/VMB) <gwyn@brl-vld> You missed the point. #!/bin/sh will be treated as a comment by the Bourne shell, so its presence cannot possibly hurt. Further, if this is NOT the first line in a Bourne shell script that is invoked by a Cshell user, the Cshell will attempt to interpret the commands, which usually results in a portion of the contents of the script actually being executed before it bombs. By having the funny first line in your Bourne shell scripts, even if a Cshell user runs one of them it will be executed correctly since the Berkeley kernel will exec /bin/sh to handle it. It is better to set the $PATH than to define explicit path names for standard UNIX system commands. For example, is "sort" in /bin in your system? It's in /usr/bin in others, except I want the one in /usr/5bin which has all the bugs fixed. When the /usr/5bin/sort is moved into /usr/bin I do not want to have to track down all the shell scripts and change "SORT=/usr/5bin/sort" to "SORT=/usr/bin/sort". By setting PATH=/usr/5bin:/bin:/usr/bin I have GUARANTEED getting the standard UNIX System V "sort" command regardless of where it actually lives, which is in different places on our different UNIX systems. An "Elements of Shell Programming Style" may be a good idea (in fact there are good guides already available in Bourne's and Kernighan & Pike's books), but you're not the one to write it..
matt%ucla-locus@sri-unix.UUCP (02/26/84)
From: Matthew J. Weinstein <matt@ucla-locus> Firstly: The Csh manual says that the standard shell will be invoked UNLESS the file begins with a # character (for command files). From the Csh man page on command files: ``... The shell opens this file, and saves its name for possible resubstitution by `$0'. Since many systems use either the standard version 6 or version 7 shells whose shell scripts are not compatible with this shell, the shell will execute such a `standard' shell if the first character of a script is not a `#', i.e. if the script does not start with a comment...'' As for #!, EXEC has been hacked on 4.x to look for #! as a special magic number; I'm not sure that Bell Unix has that (although it may); anyway, the man page for exec says: ``To aid execution of command files of various programs, if the first two characters of the executable file are '#!' then exec attempts to read a pathname from the executable file and use that program as the command files command interpreter. For example, the following command file sequence would be used to begin a csh script: #! /bin/csh # This shell script computes the checksum on /dev/foobar # ... ... The space (or tab) following the '#!' is mandatory, and the pathname must be explicit (no paths are searched)...'' (By the way, you left out the space after #! in your last message, which is mandatory). Finally, there is no mention of # as a comment character in my sh man page... If it's in yours, it's probably mentioned as a Csh compatability hack. Secondly: The contention on names of programs stems from a difference in outlook on name binding. A few types of name binding are available to the shell programmer: Static: A qualified pathname (one that contains a slash). This is sort of a ``you said it, you got it'' kind of execution. If that program doesn't work or isn't there, your command fails. There are two flavors of this: Absolute: This is a name that begins with a slash, and names a particular object in the file hierarchy. "/bin/sort" is an example of this. Note that a name of this sort is note context-sensitive. Useful if you want to make sure that you get a PARTICULAR executable. Relative: A partially qualified pathname. The name is RELATIVE TO your current working directory, and IS context sensitive. "./foo" is an example of one of these names. Useful if the shell script changes working directories, or if you are executing in a controlled environment. Note that the execution path mechanism is not used in this case. Dynamic: An unqualified pathname. This is the the kind of command name most shell files utter. It has no slashes, and the command is found by searching the execution path until an executable of the same name is found. The problem is, of course, that the particular program found may not be the same kind of program as was originally intended. The user may have his own `bzork' program in his bin directory. When you execute what you think is /bin/bzork, what you really get is the user's program instead. A possible solution is to alter the ``search path'' to guarantee that the name-program binding is performed using a specific ordered set of domains (directories). However, this may lead to unpleasant side-effects (example: a script which invokes an interactive program. The user forks a shell from that interactive program. He is, however, unable to reference his bin directory in the way he assumed he would be able to...). Clearly, when a shell script may invoke an interactive program, changing the path (or for that matter the working directory) without warning is not a good idea. In any case, this points out the fact that the semantics of commands may vary in certain circumstances, and that the script writer should consider his choices carefully. My suggestions about defining names should thus be considered in the light of the functionality required. I think that work on the semantics of command language bindings (in Unix) is overdue. We use a lot of ad-hoc mechanisms, and attempt to make up for this with ``programming style''. I am not sure who is doing compilable shell work, but they must have come across this sort of thing before. Does anyone have any feedback on this? - Matt
chris@umcp-cs.UUCP (03/02/84)
One more note. Apparently, someone is saying that "gee I want to run the user's ``xyzzy'' program but the /bin ``test'' program", and for that reason can't put #! /bin/sh PATH=/usr/bin:/bin at the beginning of ``sh'' scripts. Well don't despair, there is a solution. Try #! /bin/sh lpath=/usr/bin:/bin # or whatever your shell script needs xyzzy # use the user's xyzzy program PATH=$lpath if test ... # don't use the user's test program I can't say if it works on System III or System V, but it works under 4.1. (I just tested it.) Aren't "temporary environment variables" wonderful? -- In-Real-Life: Chris Torek, Univ of MD Comp Sci UUCP: {seismo,allegra,brl-bmd}!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris.umcp-cs@CSNet-Relay
john@genrad.UUCP (John Nelson) (03/02/84)
> #!/bin/sh > will be treated as a comment by the Bourne shell, so its presence > cannot possibly hurt. Excuse me, but while it may be true that the Bourne shell will treat that line as a comment, any csh running on a system that does not have the "#!" magic number built into the kernal exec will attempt to run that script as a csh script. Csh first attempts to exec a command. If that fails, it examines the first character of the file. If it is a "#", then it runs it as a csh script, otherwise, it runs it as an sh script. That line will do precisely the wrong thing on such a system (like my 68000 unisoft system III).
gwyn%brl-vld@sri-unix.UUCP (03/03/84)
From: Doug Gwyn (VLD/VMB) <gwyn@brl-vld> The white space after #! is not mandatory, it is optional. If a shell script is going to run a command such as an interactive subshell where it is important that the user's original PATH be in effect, then the script should run that command with PATH temporarily restored. This is standard shell programming practice. All you info-unix subscribers who are now totally confused, go back to my original posting on the subject and ignore the ensuing debate. The advice I gave is correct and is based on the approach used at BRL to cope with a variety of different UNIX command environments.
smk@axiom.UUCP (Steven M. Kramer) (03/03/84)
Matt's idea of putting SORT= ... at the beginning of a shell script is excellent. May I further that to makefiles. If I want to use a test utility, and that includes ANY program used in the makefile, I have to rewrite tha makefile (and usually take a couple of times to undo the quirks. If I forget other types of files related to shell and make files, I mean for these enchancements to go into them also. An idea whose time is come. -- --steve kramer {allegra,genrad,ihnp4,utzoo,philabs,uw-beaver}!linus!axiom!smk (UUCP) linus!axiom!smk@mitre-bedford (MIL)
henry@utzoo.UUCP (Henry Spencer) (03/06/84)
Doug Gwyn observes, in part: Before invoking ANY system commands, set the expected command search path. This is usually: PATH=/bin:/usr/bin Not quite right. The proper incantation, one which we take some pains to always use hereabouts, is: PATH=/bin:/usr/bin ; export PATH Without that magic "export", the user's original PATH is what gets exported to commands executed from the shell file, which means that it can reappear without warning. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry
gwyn%brl-vld@sri-unix.UUCP (03/07/84)
From: Doug Gwyn (VLD/VMB) <gwyn@brl-vld> As I pointed out in a very lengthy summary of how csh executes shell scripts, merely not having a # as the first character of the script is NOT enough to guarantee that it will be run under /bin/sh.
lcc.bob%ucla-locus@sri-unix.UUCP (03/09/84)
From: Bob English <lcc.bob@ucla-locus> I'll probably get yelled at for this, but if csh doesn't recognize #!, then the port from BSD (where it would never see such a thing) to whatever flavor of Unix you run was incomplete. --bob--