[comp.arch] Large programs

edw@ius1.cs.cmu.edu (Eddie Wyatt) (09/29/87)

  I was reading "The UNIX Time-Sharing System", by Dennis Ritchie and Ken
Thompson, 1978 for a qual and I came across something I found to be humorous
and pertainate to the discussion about large programs.

"In the absence of the ability to redirect output and input, a still
clusmsier method would have been to require the 'ls' command to accept user
request to paginate its output, to print in multi-column format, and
to arrange that its output be delivered off-line. Actually it would be
surprising, and in fact unwise for efficiency reasons, to expect
authors of commands such as 'ls' to provide such a wide variety of output
options."

   Its seems very funny that they use 'ls' as an example since that
command now is so burdened with options.  The functionality of which
could be provided by piping the output of the command into other
UNIX utilities. It seems that someone lost sight of the original plan.

-- 

					Eddie Wyatt

e-mail: edw@ius1.cs.cmu.edu

roy@phri.UUCP (09/29/87)

In article <1046@ius1.cs.cmu.edu> edw@ius1.cs.cmu.edu (Eddie Wyatt) writes:
>    Its seems very funny that they use 'ls' as an example since that
> command now is so burdened with options.  The functionality of which
> could be provided by piping the output of the command into other
> UNIX utilities. It seems that someone lost sight of the original plan.

	Once again, it seems that two comp.unix.wizards discussions have
converged to a common point.  In the one, we have people arguing about how
much exta baggage ls should have which could be done with piping through a
formatter and on the other hand we have people arguing about RISC vs. CISC
and whether to make integer divide an instruction or a subroutine.

	It's really the same argument.  You start with a simple set of tool
modules which you can plug together in various ways to do whatever you
want.  Then, you watch people for a long time and try to spot patterns in
how they plug the modules together.  If you see that almost every
invocation of "ls" is piped into "pr -4" to get multicolumn output, you
start to think it might be worthwhile to just build it into ls and save a
fork/exec every time.  Same argument for hardware divide instructions.

	Of course, what I've just described is creeping featureism, the
philosophy-non-grata of today's RISC-oriented society.  CF hit hardware
design like a ton of bricks with things like the Vax and the 68020 and the
industry (over?) reacted to the plague with Clipper, MIPS, SPARC, etc.  Are
we to see the same reaction in Unix?  Is that what GNU and Mach are all
about?  Interesting to note that SUN, while going whole-hog on software
complexity (YP, suntools, etc) also has embraced RISC as a hardware design
paradigm.
-- 
Roy Smith, {allegra,cmcl2,philabs}!phri!roy
System Administrator, Public Health Research Institute
455 First Avenue, New York, NY 10016

mc68020@gilsys.UUCP (Thomas J Keller) (10/03/87)

In article <1046@ius1.cs.cmu.edu>, edw@ius1.cs.cmu.edu (Eddie Wyatt) writes:
> 
> "In the absence of the ability to redirect output and input, 
>	[ stuff about why ls shouldn't have lot's of options ]
> authors of commands such as 'ls' to provide such a wide variety of output
> options."
> 
>    Its seems very funny that they use 'ls' as an example since that
> command now is so burdened with options.  The functionality of which
> could be provided by piping the output of the command into other
> UNIX utilities. It seems that someone lost sight of the original plan.

   Okay, now I am the first to admit that I am a relative neophyte to UNIX and
its philosophy, but it seems to me that a crucial point is being missed here.

   I read quite frequently about how programs should be kept small, simple,
single-purpose, and then tied together with pipes to perform more complex
tasks.  This is all well and good from one perspective.  But it seems to me
that it ignores a perspective which is highly important (not altogether
surprising, as UNIX has a well established tradition of ignoring this aspect
of computing), specifically, the user interface.

   1)  entering a command which uses three to seven different small programs,
all piped together, is a *PAIN* in the arse!  In many cases, a single command
is much more desireable, certainly less prone to errors, and always eaiser and
faster to use.

   2)  speaking of speed, we all seem to have forgotten that each one of those
lovely small programs in the chain has to be loaded from disk.  Clearly, the
overhead necessary to fork & spawn multiple processes, which in turn load
multiple program text into memory, is **MUCH** greater than spawning and 
loading a single program!  Waiting time is important too, you know?

   I use the power of the I/O re-direction in UNIX whenever it makes sense to
do so, and I find it extremely useful  I would suggest, however, that mono-
maniacal adherence to a so-called "UNIX Philosophy" which for the most part
blatantly ignores the needs and convenience of the USERS is an error.  Sure,
it's FUN to be a wizard, and know how to invoke arcane sequences which 
accomplish what are really fairly simple tasks, and to have unsophisticated
users in awe of your prowess.  Fun and very satisfying.  But not very effective,
and for my money, highly counter-productive.  

   There is no reason that UNIX should remain a mysterious and arcane system
which typical users are fearful to approach, yet this is the case.  Continuing
promulgation of the "UNIX Philosophy", as it currently exists, can only ensure
that fewer people will learn and use UNIX.  It is time for us to get our egos
and our heads out of the clouds, and make UNIX a reasonable, effective
environment for everyone, not just the wizards.

   [stepping down off soapbox, donning asbestos suit (don't tell the EPA!)]


-- 
Tom Keller 
VOICE  : + 1 707 575 9493
UUCP   : {ihnp4,ames,sun,amdahl,lll-crg,pyramid}!ptsfa!gilsys!mc68020

guy%gorodish@Sun.COM (Guy Harris) (10/06/87)

>    1)  entering a command which uses three to seven different small programs,
> all piped together, is a *PAIN* in the arse!  In many cases, a single command
> is much more desireable, certainly less prone to errors, and always eaiser
> and faster to use.

Which means that any commonly-used such sequence should be wrapped up in e.g. a
shell script or an alias.  Unfortunately, many such commonly-used sequences
aren't so bundled, e.g. the "ls | <multi-column filter>" sequence so often
suggested as preferable to having "ls" do the job.  (I'm curious how
general-purpose such a multi-column filter would be if it were to give you
*all* the capabilities of the current multi-column "ls"; i.e., were something
such as "ls * | <multi_column_filter>" in a directory with multiple
subdirectories able to give a listing of the form

	directory1:
	file1.1		file3.1
	file2.1		file4.1

	directory2:
	file1.2		file3.2
	file2.2		file4.2

If the filter couldn't do that, I wouldn't find it acceptable.  If it could do
*more* than that, e.g. converting "ls /foo/*.c /bar/*.c" | <multi-column
filter>" into

	foo:
	alpha.c		gamma.c
	beta.c

	bar:
	delta.c

I'd find it wonderful.)
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com

crowl@cs.rochester.edu (Lawrence Crowl) (10/06/87)

In article <1130@gilsys.UUCP> mc68020@gilsys.UUCP (Thomas J Keller) writes:
]   I read quite frequently about how programs should be kept small, simple,
]single-purpose, and then tied together with pipes to perform more complex
]tasks.  This is all well and good from one perspective.  But it seems to me
]that it ignores ... the user interface.

I think you should be careful to distinguish between ignoring the user
interface and choosing a user interface you feel is inappropriate.

]   1)  entering a command which uses three to seven different small programs,
]all piped together, is a *PAIN* in the arse!  In many cases, a single command
]is much more desireable, certainly less prone to errors, and always eaiser and
]faster to use.

This problem is easily solved with a shell script.  This gets you a single
command and the convenience of not having to place all the filters in the
program.

]   2)  speaking of speed, we all seem to have forgotten that each one of those
]lovely small programs in the chain has to be loaded from disk.  Clearly, the
]overhead necessary to fork & spawn multiple processes, which in turn load
]multiple program text into memory, is **MUCH** greater than spawning and 
]loading a single program!  Waiting time is important too, you know?

You forgot an important speed difference.  In the pipe approach, each program
in the pipe does a lot of file I/O and string to data to string conversions.
A system which operates on the data values themselves without the intermediate
file representation can be much more efficient.

]   I would suggest, however, that mono-maniacal adherence to a so-called
]"UNIX Philosophy" which for the most part blatantly ignores the needs and
]convenience of the USERS is an error.  Sure, it's FUN to be a wizard, and know
]how to invoke arcane sequences which accomplish what are really fairly simple
]tasks, and to have unsophisticated users in awe of your prowess.  Fun and very
]satisfying.  But not very effective, and for my money, highly
]counter-productive.  

But the intended users of Unix are (or were initially) wizards!  They were
assumed to be doing weird things with consistent need for rapid, "hack"
solutions that a more structured environment might inhibit.

]   There is no reason that UNIX should remain a mysterious and arcane system
]which typical users are fearful to approach, yet this is the case.  Continuing
]promulgation of the "UNIX Philosophy", as it currently exists, can only ensure
]that fewer people will learn and use UNIX.  It is time for us to get our egos
]and our heads out of the clouds, and make UNIX a reasonable, effective
]environment for everyone, not just the wizards.

If you want to change the basic design premise of the system, fine.  But don't
get mad because someone else wants to maintain the original design premise.  I
believe there is a good compromise out there, but it is not obvious.
-- 
  Lawrence Crowl		716-275-9499	University of Rochester
		      crowl@cs.rochester.edu	Computer Science Department
...!{allegra,decvax,rutgers}!rochester!crowl	Rochester, New York,  14627

edw@ius1.cs.cmu.edu (Eddie Wyatt) (10/06/87)

>    I read quite frequently about how programs should be kept small, simple,
> single-purpose, and then tied together with pipes to perform more complex
> tasks.  This is all well and good from one perspective.  But it seems to me
> that it ignores a perspective which is highly important (not altogether
> surprising, as UNIX has a well established tradition of ignoring this aspect
> of computing), specifically, the user interface.
> 
>    1)  entering a command which uses three to seven different small programs,
> all piped together, is a *PAIN* in the arse!  In many cases, a single command
> is much more desireable, certainly less prone to errors, and always eaiser and
> faster to use.

   Is it??  What options to "ls" sorts by time last modified, time created,
prints in single columns, multi columns.....

   Having the interface to each command so large, makes it hard just to
remember what damn switches to set to get things done.  So in my opinion,
piping output around is no more complex than the "switch" aproach.
This fact does not justify the modular approach over the monolithic though.
You gain by using pipes in that

	1) Once you know how to perform some operation on some data
	   (like sorting the output of ls by file size) you can extend it
	   to any command (like sorting the output of df by size).

	2) From the implemation standpoint, modularity can reduce the
	   amount of duplicated efforted. -- Does ls bother calling
	   sort for its sorting of output or did someone implement yet
	   another sort in the ls code??

	3) Uniformity is achieved.  Does the -v switch for ls do the same
	   thing for cat?? Probably not. (Though I have to admit some
	   attempt at uniformity in switches is made: -i for cp, rm, mv
	   does basically the same thing)


> 
>    2)  speaking of speed, we all seem to have forgotten that each one of those
> lovely small programs in the chain has to be loaded from disk.  Clearly, the
> overhead necessary to fork & spawn multiple processes, which in turn load
> multiple program text into memory, is **MUCH** greater than spawning and 
> loading a single program!  Waiting time is important too, you know?

   Admittedly, speed in execution is one of the prices you pay for taking
the modular approach, but things aren't all that bad.  Piped processes
get executed concurrently. If you had a parallel processor, who knows,
maybe each program could be executed on a different processor. The
pipes could provide a course grain break down of the computing needed. 8-}

-- 

					Eddie Wyatt

e-mail: edw@ius1.cs.cmu.edu

howard@cpocd2.UUCP (Howard A. Landman) (10/17/87)

In article <2946@sol.ARPA> crowl@cs.rochester.edu (Lawrence Crowl) writes:
>In article <1130@gilsys.UUCP> mc68020@gilsys.UUCP (Thomas J Keller) writes:
>]   2)  speaking of speed, we all seem to have forgotten that each one of those
>]lovely small programs in the chain has to be loaded from disk.  Clearly, the
>]overhead necessary to fork & spawn multiple processes, which in turn load
>]multiple program text into memory, is **MUCH** greater than spawning and 
>]loading a single program!  Waiting time is important too, you know?
>
>You forgot an important speed difference.  In the pipe approach, each program
>in the pipe does a lot of file I/O and string to data to string conversions.

???  A pipe need not do any file I/O at all!  The data is buffered in memory.
One of the advantages of pipes is that they still work when your file system
is full, whereas writing intermediate files (the normal alternative under
many operating systems) won't.

Also, while the pipe transmits a byte-stream, conversions are not necessary.
Most of the existing UNIX utilities operate on text, but it is possible to
pass any datatype through a pipe as long as the receiving program is expecting
it.  Try using fwrite() instead of printf() sometime inside a filter program;
you'll be *amazed* at the performance difference!  The drawback is, this won't
work if the data crosses the boundary between systems with different byte or
halfword ordering conventions, whereas text will work just fine.  It's an
issue of portability versus speed.

>A system which operates on the data values themselves without the intermediate
>file representation can be much more efficient.

There is no "intermediate file representation", unless by "file" you mean
"byte stream".  I don't find it generally useful to confuse these terms.

-- 
	Howard A. Landman
	{oliveb,hplabs}!intelca!mipos3!cpocd2!howard	<- works
	howard%cpocd2%sc.intel.com@RELAY.CS.NET		<- recently flaky
	"Unpick a ninny - recall Mecham"

ron@topaz.rutgers.edu (Ron Natalie) (10/21/87)

Excuse me but pipes do file I/O on some systems (neglecting MiniUNIX
which doesn't have pipes and implements them with real files) real
UNIX pipes used disk I/O.  In non BSD implementations, an inode is
allocated and disk blocks are allocated.  Hopefully these stay in
the buffer cache rather than needing to be written to disk, but if
necessary they will get written out.  Back in the days before FSCK,
it was usually necessary to clri some of these pipe turds that were
left in a crash (they neither have directory entries nor a link count).

The System V R 2 V 3 on our 3B20 still does pipes this way.
Berkeley UNIX implements pipes as network sockets.  The data
is stored in MBUFS, I suppose as virtual memory these can get
paged out incurring disk I/O as well.

-Ron