[net.unix] Shells, features and interaction

rob@alice.UucP (Rob Pike) (11/17/85)

The recent shell discussion prompted by Arnold Robbins's announcement has
missed an important point: effective programmability in an interactive
system can obviate many supposed `features'.

Those who claim shell functions are the way to do aliasing are right, but
the only implementation that works properly for this purpose is in the
8th edition shell.  The problem is that unless functions can be exported,
they are unavailable to subshells and shell files.  In the 8th edition shell,
functions are literally exported, and passed in the environment as
shell input text.  This sounds expensive, but on most current systems,
argument and environment passing through exec(2) is so efficient that a few
Kbytes of extra environment are probably cheaper than adding a directory
to PATH.  Shells like csh or ksh that read function (or equivalent) definitions
from a file at start up have peculiar semantics, can't provide an export
facility that works properly, and are probably no more efficient.

Functions can be used for much more than providing aliases.  For the
novitiate:
	f(){
		cmd
	}
is the shell syntax for functions.  'cmd' is any text, however many lines
you'd like, exactly as it would appear in a shell file.  The only difference
is that you 'return' from a function instead of 'exit.'  After f is defined,
it is invoked as any command, and 'cmd' may refer to the function arguments
as $1, $2, $#, etc.  So aliases are obviously trivial.  Now some v8isms:
	builtin cmd ...
runs cmd directly, but looks cmd up in the list of builtin functions first.
This allows functions to be evaluated first and override builtin functions:
	cd(){
		echo my cd
		builtin cd $1
	}
Function names may be anything printable, but may not contain = or ( to
simplify parsing.  Thus:
	ls-l(){
		ls -l $*
	}
and so on.

Because functions are evaluated locally, you can use them to do things
impossible in shell files.  For example, if the shell writes its input
text to a file with a known name, say $HISTORY, you can invest an arbitrary
amount of effort in the history manipulating program, but have it merely
print the resulting command.  A function can then collect the output
and eval it, so history works for any command even though the code for
supporting it is outside the shell.

People want to build things in to the shell for efficiency,
but every builtin is one less thing you can change (although functions clearly
mitigate this somewhat).  test is often built in to avoid the costly fork and
exec.  But most tests are doing string comparisons, and in fact the shell
semantics for string comparison is rather more powerful but has a peculiar
syntax.  What people say is:
	if test "$1" = "$2"
	then cmd
	fi
which is easy to read, but slow.  So they build in test.  More efficient
is the case statement:
	case "$1" in
	"$2") cmd
	esac
But it's weird.  Functions to the rescue:
	equal(){
		case "$1" in
		"$2") return 0 ;;
		*) return 1
		esac
	}
	if equal "$1" "$2"
	then ....

Obviously what's needed is a function library, as in any programming language.

Similar thought applied to things like expr(1) can show why expr needn't be
built in - when expr is in the inner loop it's usually doing something simple
like incrementing a variable.  Why not provide a looping primitive, and rewrite
the loop as something like:
	for i in `loop 1 100`
	do foo $i
	done
?

The next time someone hands you a shell with nice interactive features
or builtin commands, ask whether the implementer understands where the
power of the shell should lie.  The Bourne shell is the favorite for
shell commands, the csh for interaction.  But by addressing interaction
as a programmability problem, the interactive and programming shells
can be the same without sacrificing grace in either environment.

One final note:

text	data	bss	dec	hex
27648	1024	2568	31240	7a08	/bin/sh		8th edition shell
67584	2048	5740	75372	1266c	/bin/csh	C shell
81920	3072	9224	94216	17008	/usr/lbin/ksh	Korn shell


					Rob Pike

levy@ttrdc.UUCP (Daniel R. Levy) (11/19/85)

In article <4575@alice.UUCP>, rob@alice.UucP (Rob Pike) writes:
>The recent shell discussion prompted by Arnold Robbins's announcement has
>missed an important point: effective programmability in an interactive
>system can obviate many supposed `features'.
>
>Those who claim shell functions are the way to do aliasing are right, but
>the only implementation that works properly for this purpose is in the
>8th edition shell.  The problem is that unless functions can be exported,
>they are unavailable to subshells and shell files.  In the 8th edition shell,
>functions are literally exported, and passed in the environment as
>shell input text.  This sounds expensive, but on most current systems,
>argument and environment passing through exec(2) is so efficient that a few
>Kbytes of extra environment are probably cheaper than adding a directory
>to PATH.  Shells like csh or ksh that read function (or equivalent) definitions
>from a file at start up have peculiar semantics, can't provide an export
>facility that works properly, and are probably no more efficient.

I for one ask--why would one WANT to export a function to a shell script?
If the interactive shell from which the script were invoked were something
like csh, how would it even be possible to export the function to the
script?  And mightn't it break a shell script if it used a command which
you had made into a function and exported, but which it was expecting to
be a standard executable?  I perhaps could see the desireability to have
INTERACTIVE subshells (spawned by shell escapes from things like
vnews :-) inherit the functions to avoid having to read a startup file.
But exporting to shell scripts?  Enlighten me.
-- 
 -------------------------------    Disclaimer:  The views contained herein are
|       dan levy | yvel nad      |  my own and are not at all those of my em-
|         an engihacker @        |  ployer or the administrator of any computer
| at&t computer systems division |  upon which I may hack.
|        skokie, illinois        |
 --------------------------------   Path: ..!ihnp4!ttrdc!levy

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (11/19/85)

Very nice.  A refreshing change from the trend to load programs
with every conceivable feature.

I hope to post a public domain implementation of "seq" (which
Rob called "loop" in his example) soon.  Stay tuned.

clyde@ut-ngp.UUCP (Head UNIX Hacquer) (11/19/85)

Gee Rob, the 8th Edition shell sounds GREAT.  How many YEARS do we have to
wait for DeathStar (oops, I mean AT&T) to let us get ahold of it? :-)

-- 
Shouter-To-Dead-Parrots @ Univ. of Texas Computation Center; Austin, Texas  

"All life is a blur of Republicans and meat." -Zippy the Pinhead

	clyde@ngp.UTEXAS.EDU, clyde@sally.UTEXAS.EDU
	...!ihnp4!ut-ngp!clyde, ...!allegra!ut-ngp!clyde

dan@rna.UUCP (Dan Ts'o) (11/20/85)

In article <3387@brl-tgr.ARPA> gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) writes:
>I hope to post a public domain implementation of "seq" (which
>Rob called "loop" in his example) soon.  Stay tuned.

	Isn't "loop" or "seq" the same as GPS's "gas" (generate additive
sequence). I do

	for i in `gas -n20`
	do
		cmd
	done

	quite often.

GPS: graphics package from System III, V.

					Cheers,
					Dan Ts'o
					Dept. Neurobiology
					Rockefeller Univ.
					1230 York Ave.
					NY, NY 10021
					212-570-7671
					...cmcl2!rna!dan
					rna!dan@cmcl2.arpa

mwm@ucbopal.BERKELEY.EDU (Mike (I'll be mellow when I'm dead) Meyer) (11/21/85)

In article <3387@brl-tgr.ARPA> gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) writes:
>I hope to post a public domain implementation of "seq" (which
>Rob called "loop" in his example) soon.  Stay tuned.

On most 4.2 systems, there's a tool called 'jot' which is what seq/loop is
all about. Source lives in /usr/src/contrib/tools. These were written at the
UCB Comp Center (*not* the people responsible for 4BSD - thinking that just
makes both groups mad) and are considered to be in the public domain.

I've been given permission by the author to give them away, most notably to
the GNU project. If you want a copy, send me mail. If enough people ask,
I'll post them to mod.sources. Probably get around to sending them to the
GNU project at the same time.

	<mike

conor@glacier.ARPA (Conor Rafferty) (11/24/85)

>text	data	bss	dec	hex
>27648	1024	2568	31240	7a08	/bin/sh		8th edition shell
>67584	2048	5740	75372	1266c	/bin/csh	C shell
>81920	3072	9224	94216	17008	/usr/lbin/ksh	Korn shell

Aaah, but how about counting the number of occurences in /etc/passwd :-)

Admittedly this may not work the way I expect because most people on 
alice can probably do blit-hacking on the command line. But on a dumb
terminal, I'll trade you 60kb for history and command-editing any time.
I'd even go so far as to put line editing in the tty driver, canonical
mode. [There goes my reputation...]

conor rafferty == decwrl!glacier!conor == conor@su-glacier.arpa

"My time is more important than your principles!" -- Brian Reid

levy@ttrdc.UUCP (Daniel R. Levy) (11/25/85)

In article <1631@glacier.ARPA>, conor@glacier.ARPA (Conor Rafferty) writes:
>
>Admittedly this may not work the way I expect because most people on
>alice can probably do blit-hacking on the command line. But on a dumb
>terminal, I'll trade you 60kb for history and command-editing any time.
>I'd even go so far as to put line editing in the tty driver, canonical
>mode. [There goes my reputation...]
>
>conor rafferty == decwrl!glacier!conor == conor@su-glacier.arpa
>

But if you do, make sure you don't repeat DEC's mistake with VMS 4.1 (you
can crash the system by typing the wrong [or right, if you are a hacker :-)]
gibberish at it).  Make it BULLETPROOF.
-- 
 -------------------------------    Disclaimer:  The views contained herein are
|       dan levy | yvel nad      |  my own and are not at all those of my em-
|         an engihacker @        |  ployer or the administrator of any computer
| at&t computer systems division |  upon which I may hack.
|        skokie, illinois        |
 --------------------------------   Path: ..!ihnp4!ttrdc!levy

peter@graffiti.UUCP (Peter da Silva) (11/27/85)

>text	data	bss	dec	hex
>27648	1024	2568	31240	7a08	/bin/sh		8th edition shell
>67584	2048	5740	75372	1266c	/bin/csh	C shell
>81920	3072	9224	94216	17008	/usr/lbin/ksh	Korn shell

CSH doesn't have to be this big. It can be made to run on a small-number
PDP-11 with only 64K bytes of process address space. Personally I think
these shells could all do with some pruning... but then I'm a confirmed
PDP-11 fan.
-- 
Name: Peter da Silva
Graphic: `-_-'
UUCP: ...!shell!{graffiti,baylor}!peter
IAEF: ...!kitty!baylor!peter

jsdy@hadron.UUCP (Joseph S. D. Yao) (11/28/85)

In article <3387@brl-tgr.ARPA> gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) writes:
>I hope to post a public domain implementation of "seq" (which
>Rob called "loop" in his example) soon.  Stay tuned.

You mean this?

/* #include <local.h> */

/*********************************************************************\
**
** loop -- do a quick FORTRAN-style (ech) numeric loop
**
** Syntax:
**	loop [ start [ end [ incr ] ] ]
**
** Copyright and Disclaimers:
**	Who cares?  It's past midnight.  Have a ball.
**
** Description:
**	This is a program that really counts.
**
#ifdef	SCCS
** Last modified %G% %U%.  Last retrieved %H% %T%.
**
# else
** $Log:$
#endif	SCCS
** Author:
**	Joseph S. D. Yao
**	Engineering and Information Systems Division
**	Hadron, Inc.
**	9990 Lee Highway
**	Fairfax VA  22030
**	(703) 359-6163
**
** Routines:
**	main(argc, argv, envp)
**
\*********************************************************************/

#ifndef	lint
# ifdef SCCS
  static char SCCS_id[] = "%W%";
#  else
  static char RCS_id[] =
	"@(#)$Header:$";
# endif	SCCS
#endif	lint

main(argc, argv, envp)
  int argc;
  char **argv;
  char **envp;
{
	register int var, end, incr;

#ifdef	lint
	argv = envp;
#endif	lint
	var = end = incr = 1;

	/* Arg 1: start value for var. */
	if (--argc > 0)
		var = atoi(*++argv);

	/* Arg 2: end value for var. */
	if (--argc > 0)
		end = atoi(*++argv);

	/* Arg 3: increment for var. */
	if (--argc > 0)
		incr = atoi(*++argv);

	/* A zero increment could take a while. */
	if (incr == 0)
		incr = 1;

	/*
	** Different tests are used if incr < 0 or > 0.
	** This coding is for speed, not program size.
	*/
	if (incr < 0) {
		while (var >= end) {
			printf("%d\n", var);
			var += incr;
		}
	} else {
		while (var <= end) {
			printf("%d\n", var);
			var += incr;
		}
	}

	return(0);
}
-- 

	Joe Yao		hadron!jsdy@seismo.{CSS.GOV,ARPA,UUCP}

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (11/29/85)

> >I hope to post a public domain implementation of "seq" (which
> >Rob called "loop" in his example) soon.  Stay tuned.
> 
> You mean this?

Nope.

dgk@ulysses.UUCP (David Korn) (12/04/85)

> One final note:
> 
> text	data	bss	dec	hex
> 27648	1024	2568	31240	7a08	/bin/sh		8th edition shell
> 67584	2048	5740	75372	1266c	/bin/csh	C shell
> 81920	3072	9224	94216	17008	/usr/lbin/ksh	Korn shell
> 
> 					Rob Pike

I will not delineate here the reasons that ksh is the most widely used
shell within AT&T even though it is not officially supported.

Choosing a shell based on size of the text is like chosing a terminal
based solely on its weight.  The primary reason that the size of ksh
is large is because of the editing options.  Aliases take only about 1K
of text space in ksh.

Functions are not a complete replacement to aliases any more than
window managers provide a complete replacement to job control.  (I use
job control with my Apollo and 5620 just as often as I did with my
HP-2621.)

Aliases allow you to do things that you can't do with functions.  For
example, using aliases you can write a command which passes its arguments
without file name generation.  I can write a command, remote, such that
	remote x ls -l /bin/*
will expand the file name pattern on the remote machine.  With functions
this is impossible since file name generation occurs before the function
is invoked.  A complete MS-DOS script emulator was written in ksh and
aliases were needed in several places that functions couldn't be used.

The System V implementation of functions has the following deficiencies
most of which exist in the 8th. edition Bourne shell:

1.	No local variables
2.	name=value parameter lists remain after function completes.
3.	Cannot trap return from functions for cleanup of resources.
4.	Cannot separately trace function executions.
5.	Function names conflict with shell variable names.
6.	You can't write a function to replace a built in command. (This
	has been fixed in 8th edition)
7.	Function names must be alpha-numeric.

If the collection of functions is large (which is common), exporting by
deparsing functions and passing them through the environment is sure to
be more expensive in any current version of UN*X than reading a startup
file, as in ksh.  Even if the copying is speeded up, as it has been for
8th edition, the environment is generated for each command, and copied
to each created process.  If we assume that about 10% of processes created
by the shell are shells, then the functions are deparsed and copied 10
times for every shell invocation.  Ksh reads a file for startup and uses
block buffering.  I have seen startup files of several thousand bytes so
at a minimum the 5120 limit on environment sizes would have to be
increased substantially.

For the sake of simplicity, the Bourne shell is riddled with little things
that confuse the user.  Consider the following list of annoyances of the
'simple' Bourne shell:
1.	Can't kill any processes if you have reached your process limit.
2.	sh script doesn't perform path search for script.
3.	You can not < abc* as short hand for < abcdefghijkl.
4	You can write ${9} but not ${10}.
5.	x=`\$HOME` causes $HOME to be expanded
6.	Unmatched quotes give EOF unexpected, which is not very helpful.
7.	You can not read a line ending in \ without reading two lines.
8.	PATH=/bin:/usr/bin: will not search current directory last.
9.	Cannot set traps by name.
10.	name=value command has different semantics for built-ins.
11.	No obvious way to find the parent pid of the shell.
12.	Gaping security holes which get worse with exported functions.
13.	Pipelines cannot be timed.
14.	| and ^ are used as synonyms.

One final note: (in grams)

display	mouse	keyboard dec	hex
 4989	0	1134	 6123	17eb	ti745	  Silent 700
16329	0	1588	17919	45fd	hp2621p	  Hewlett Packard
26762	255	2268	29285	7265	dmd5620   Blit

David Korn
ulysses!dgk

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (12/05/85)

Oh boy!  Shell wars!

> The System V implementation of functions has the following deficiencies
> most of which exist in the 8th. edition Bourne shell:

Well, Dave, I always thought you were the one who added this feature
to the System V shell for Release 2.0; guess I was mistaken.  Whoever
did it, it would be interesting to hear why the particular design was
arrived at and whether it will be improved in future releases.

> For the sake of simplicity, the Bourne shell is riddled with little things
> that confuse the user.  Consider the following list of annoyances of the
> 'simple' Bourne shell:

> 2.	sh script doesn't perform path search for script.

Why should it?  I would be very upset if it did, since my cwd is last
in my $PATH for good and sufficient reasons, and I often am testing
new versions of scripts that live in public places.

> 8.	PATH=/bin:/usr/bin: will not search current directory last.

That's simply a bug.  I posted a fix for this some time ago.
(The fixed version is even simpler and smaller.)

> 14.	| and ^ are used as synonyms.

That was for compatibility, to avoid breaking old shell scripts.
I admit that at this point in time it is more just a nuisance.

The SVR2 shell is on the verge of becoming too big for 64Kb address
space machines such as the PDP-11.  Some of us think it would be a
shame for UNIX to be unimplementable on small systems due to the
shell being too big.  Perhaps a subset/superset approach (like sh/ksh)
would be best.

maurice@nmtvax.UUCP (12/10/85)

>>text	data	bss	dec	hex
>>27648	1024	2568	31240	7a08	/bin/sh		8th edition shell
>>67584	2048	5740	75372	1266c	/bin/csh	C shell
>>81920	3072	9224	94216	17008	/usr/lbin/ksh	Korn shell
>
>CSH doesn't have to be this big. It can be made to run on a small-number
>PDP-11 with only 64K bytes of process address space. Personally I think
>these shells could all do with some pruning... but then I'm a confirmed
>PDP-11 fan.

  You must be forgetting about overlays then. As it is right now, csh
*needs* the use of overlays (the 2.9bsd  version) to run. From doing
a check on it, a size yields this:

   text    data   bss    dec     octal
   16128 + 1240 + 1952 = 19320  45570
   56000 total text, overlays: (16384,15936,7552)

Looks like less than 64? You still need to put the stack in there as well,
and that gets the last 8k segment of 64k (depends on the pdp type, this
is from a nonseperate i/d pdp 23+).

It may be big to you (csh) , but I like the functionality it gives over sh.

   Roger M. Levasseur
   New Mexico Tech

davy@pur-ee.UUCP (Dave Curry) (12/12/85)

In article <897@nmtvax.UUCP> maurice@nmtvax.UUCP (Roger M. Levasseur) writes:
>  You must be forgetting about overlays then. As it is right now, csh
>*needs* the use of overlays (the 2.9bsd  version) to run. From doing
>a check on it, a size yields this:
>

This is a lie.  2.9BSD csh may need overlays, but we've got two versions
running on our Version 7 PDP-11's with no overlays.  Observe:

	41024+1104+1498 = 43626b = 0125152b

	49792+7310+3296 = 60398b = 0165756b

The first is a not very complete version, the second contains everything
except job control.  Believe it or not overlays aren't a necessity in
this world (although they do make things convenient, they make them slow
too).

>Looks like less than 64? You still need to put the stack in there as well,
>and that gets the last 8k segment of 64k (depends on the pdp type, this
>is from a nonseperate i/d pdp 23+).
>

Yup, looks like less than 64k to me.  This is a split i/d 11/70.

--Dave Curry