edw@ius1.cs.cmu.edu (Eddie Wyatt) (09/29/87)
I was reading "The UNIX Time-Sharing System", by Dennis Ritchie and Ken Thompson, 1978 for a qual and I came across something I found to be humorous and pertainate to the discussion about large programs. "In the absence of the ability to redirect output and input, a still clusmsier method would have been to require the 'ls' command to accept user request to paginate its output, to print in multi-column format, and to arrange that its output be delivered off-line. Actually it would be surprising, and in fact unwise for efficiency reasons, to expect authors of commands such as 'ls' to provide such a wide variety of output options." Its seems very funny that they use 'ls' as an example since that command now is so burdened with options. The functionality of which could be provided by piping the output of the command into other UNIX utilities. It seems that someone lost sight of the original plan. -- Eddie Wyatt e-mail: edw@ius1.cs.cmu.edu
roy@phri.UUCP (09/29/87)
In article <1046@ius1.cs.cmu.edu> edw@ius1.cs.cmu.edu (Eddie Wyatt) writes: > Its seems very funny that they use 'ls' as an example since that > command now is so burdened with options. The functionality of which > could be provided by piping the output of the command into other > UNIX utilities. It seems that someone lost sight of the original plan. Once again, it seems that two comp.unix.wizards discussions have converged to a common point. In the one, we have people arguing about how much exta baggage ls should have which could be done with piping through a formatter and on the other hand we have people arguing about RISC vs. CISC and whether to make integer divide an instruction or a subroutine. It's really the same argument. You start with a simple set of tool modules which you can plug together in various ways to do whatever you want. Then, you watch people for a long time and try to spot patterns in how they plug the modules together. If you see that almost every invocation of "ls" is piped into "pr -4" to get multicolumn output, you start to think it might be worthwhile to just build it into ls and save a fork/exec every time. Same argument for hardware divide instructions. Of course, what I've just described is creeping featureism, the philosophy-non-grata of today's RISC-oriented society. CF hit hardware design like a ton of bricks with things like the Vax and the 68020 and the industry (over?) reacted to the plague with Clipper, MIPS, SPARC, etc. Are we to see the same reaction in Unix? Is that what GNU and Mach are all about? Interesting to note that SUN, while going whole-hog on software complexity (YP, suntools, etc) also has embraced RISC as a hardware design paradigm. -- Roy Smith, {allegra,cmcl2,philabs}!phri!roy System Administrator, Public Health Research Institute 455 First Avenue, New York, NY 10016
mc68020@gilsys.UUCP (Thomas J Keller) (10/03/87)
In article <1046@ius1.cs.cmu.edu>, edw@ius1.cs.cmu.edu (Eddie Wyatt) writes: > > "In the absence of the ability to redirect output and input, > [ stuff about why ls shouldn't have lot's of options ] > authors of commands such as 'ls' to provide such a wide variety of output > options." > > Its seems very funny that they use 'ls' as an example since that > command now is so burdened with options. The functionality of which > could be provided by piping the output of the command into other > UNIX utilities. It seems that someone lost sight of the original plan. Okay, now I am the first to admit that I am a relative neophyte to UNIX and its philosophy, but it seems to me that a crucial point is being missed here. I read quite frequently about how programs should be kept small, simple, single-purpose, and then tied together with pipes to perform more complex tasks. This is all well and good from one perspective. But it seems to me that it ignores a perspective which is highly important (not altogether surprising, as UNIX has a well established tradition of ignoring this aspect of computing), specifically, the user interface. 1) entering a command which uses three to seven different small programs, all piped together, is a *PAIN* in the arse! In many cases, a single command is much more desireable, certainly less prone to errors, and always eaiser and faster to use. 2) speaking of speed, we all seem to have forgotten that each one of those lovely small programs in the chain has to be loaded from disk. Clearly, the overhead necessary to fork & spawn multiple processes, which in turn load multiple program text into memory, is **MUCH** greater than spawning and loading a single program! Waiting time is important too, you know? I use the power of the I/O re-direction in UNIX whenever it makes sense to do so, and I find it extremely useful I would suggest, however, that mono- maniacal adherence to a so-called "UNIX Philosophy" which for the most part blatantly ignores the needs and convenience of the USERS is an error. Sure, it's FUN to be a wizard, and know how to invoke arcane sequences which accomplish what are really fairly simple tasks, and to have unsophisticated users in awe of your prowess. Fun and very satisfying. But not very effective, and for my money, highly counter-productive. There is no reason that UNIX should remain a mysterious and arcane system which typical users are fearful to approach, yet this is the case. Continuing promulgation of the "UNIX Philosophy", as it currently exists, can only ensure that fewer people will learn and use UNIX. It is time for us to get our egos and our heads out of the clouds, and make UNIX a reasonable, effective environment for everyone, not just the wizards. [stepping down off soapbox, donning asbestos suit (don't tell the EPA!)] -- Tom Keller VOICE : + 1 707 575 9493 UUCP : {ihnp4,ames,sun,amdahl,lll-crg,pyramid}!ptsfa!gilsys!mc68020
guy%gorodish@Sun.COM (Guy Harris) (10/06/87)
> 1) entering a command which uses three to seven different small programs, > all piped together, is a *PAIN* in the arse! In many cases, a single command > is much more desireable, certainly less prone to errors, and always eaiser > and faster to use. Which means that any commonly-used such sequence should be wrapped up in e.g. a shell script or an alias. Unfortunately, many such commonly-used sequences aren't so bundled, e.g. the "ls | <multi-column filter>" sequence so often suggested as preferable to having "ls" do the job. (I'm curious how general-purpose such a multi-column filter would be if it were to give you *all* the capabilities of the current multi-column "ls"; i.e., were something such as "ls * | <multi_column_filter>" in a directory with multiple subdirectories able to give a listing of the form directory1: file1.1 file3.1 file2.1 file4.1 directory2: file1.2 file3.2 file2.2 file4.2 If the filter couldn't do that, I wouldn't find it acceptable. If it could do *more* than that, e.g. converting "ls /foo/*.c /bar/*.c" | <multi-column filter>" into foo: alpha.c gamma.c beta.c bar: delta.c I'd find it wonderful.) Guy Harris {ihnp4, decvax, seismo, decwrl, ...}!sun!guy guy@sun.com
crowl@cs.rochester.edu (Lawrence Crowl) (10/06/87)
In article <1130@gilsys.UUCP> mc68020@gilsys.UUCP (Thomas J Keller) writes:
] I read quite frequently about how programs should be kept small, simple,
]single-purpose, and then tied together with pipes to perform more complex
]tasks. This is all well and good from one perspective. But it seems to me
]that it ignores ... the user interface.
I think you should be careful to distinguish between ignoring the user
interface and choosing a user interface you feel is inappropriate.
] 1) entering a command which uses three to seven different small programs,
]all piped together, is a *PAIN* in the arse! In many cases, a single command
]is much more desireable, certainly less prone to errors, and always eaiser and
]faster to use.
This problem is easily solved with a shell script. This gets you a single
command and the convenience of not having to place all the filters in the
program.
] 2) speaking of speed, we all seem to have forgotten that each one of those
]lovely small programs in the chain has to be loaded from disk. Clearly, the
]overhead necessary to fork & spawn multiple processes, which in turn load
]multiple program text into memory, is **MUCH** greater than spawning and
]loading a single program! Waiting time is important too, you know?
You forgot an important speed difference. In the pipe approach, each program
in the pipe does a lot of file I/O and string to data to string conversions.
A system which operates on the data values themselves without the intermediate
file representation can be much more efficient.
] I would suggest, however, that mono-maniacal adherence to a so-called
]"UNIX Philosophy" which for the most part blatantly ignores the needs and
]convenience of the USERS is an error. Sure, it's FUN to be a wizard, and know
]how to invoke arcane sequences which accomplish what are really fairly simple
]tasks, and to have unsophisticated users in awe of your prowess. Fun and very
]satisfying. But not very effective, and for my money, highly
]counter-productive.
But the intended users of Unix are (or were initially) wizards! They were
assumed to be doing weird things with consistent need for rapid, "hack"
solutions that a more structured environment might inhibit.
] There is no reason that UNIX should remain a mysterious and arcane system
]which typical users are fearful to approach, yet this is the case. Continuing
]promulgation of the "UNIX Philosophy", as it currently exists, can only ensure
]that fewer people will learn and use UNIX. It is time for us to get our egos
]and our heads out of the clouds, and make UNIX a reasonable, effective
]environment for everyone, not just the wizards.
If you want to change the basic design premise of the system, fine. But don't
get mad because someone else wants to maintain the original design premise. I
believe there is a good compromise out there, but it is not obvious.
--
Lawrence Crowl 716-275-9499 University of Rochester
crowl@cs.rochester.edu Computer Science Department
...!{allegra,decvax,rutgers}!rochester!crowl Rochester, New York, 14627
edw@ius1.cs.cmu.edu (Eddie Wyatt) (10/06/87)
> I read quite frequently about how programs should be kept small, simple, > single-purpose, and then tied together with pipes to perform more complex > tasks. This is all well and good from one perspective. But it seems to me > that it ignores a perspective which is highly important (not altogether > surprising, as UNIX has a well established tradition of ignoring this aspect > of computing), specifically, the user interface. > > 1) entering a command which uses three to seven different small programs, > all piped together, is a *PAIN* in the arse! In many cases, a single command > is much more desireable, certainly less prone to errors, and always eaiser and > faster to use. Is it?? What options to "ls" sorts by time last modified, time created, prints in single columns, multi columns..... Having the interface to each command so large, makes it hard just to remember what damn switches to set to get things done. So in my opinion, piping output around is no more complex than the "switch" aproach. This fact does not justify the modular approach over the monolithic though. You gain by using pipes in that 1) Once you know how to perform some operation on some data (like sorting the output of ls by file size) you can extend it to any command (like sorting the output of df by size). 2) From the implemation standpoint, modularity can reduce the amount of duplicated efforted. -- Does ls bother calling sort for its sorting of output or did someone implement yet another sort in the ls code?? 3) Uniformity is achieved. Does the -v switch for ls do the same thing for cat?? Probably not. (Though I have to admit some attempt at uniformity in switches is made: -i for cp, rm, mv does basically the same thing) > > 2) speaking of speed, we all seem to have forgotten that each one of those > lovely small programs in the chain has to be loaded from disk. Clearly, the > overhead necessary to fork & spawn multiple processes, which in turn load > multiple program text into memory, is **MUCH** greater than spawning and > loading a single program! Waiting time is important too, you know? Admittedly, speed in execution is one of the prices you pay for taking the modular approach, but things aren't all that bad. Piped processes get executed concurrently. If you had a parallel processor, who knows, maybe each program could be executed on a different processor. The pipes could provide a course grain break down of the computing needed. 8-} -- Eddie Wyatt e-mail: edw@ius1.cs.cmu.edu
howard@cpocd2.UUCP (Howard A. Landman) (10/17/87)
In article <2946@sol.ARPA> crowl@cs.rochester.edu (Lawrence Crowl) writes: >In article <1130@gilsys.UUCP> mc68020@gilsys.UUCP (Thomas J Keller) writes: >] 2) speaking of speed, we all seem to have forgotten that each one of those >]lovely small programs in the chain has to be loaded from disk. Clearly, the >]overhead necessary to fork & spawn multiple processes, which in turn load >]multiple program text into memory, is **MUCH** greater than spawning and >]loading a single program! Waiting time is important too, you know? > >You forgot an important speed difference. In the pipe approach, each program >in the pipe does a lot of file I/O and string to data to string conversions. ??? A pipe need not do any file I/O at all! The data is buffered in memory. One of the advantages of pipes is that they still work when your file system is full, whereas writing intermediate files (the normal alternative under many operating systems) won't. Also, while the pipe transmits a byte-stream, conversions are not necessary. Most of the existing UNIX utilities operate on text, but it is possible to pass any datatype through a pipe as long as the receiving program is expecting it. Try using fwrite() instead of printf() sometime inside a filter program; you'll be *amazed* at the performance difference! The drawback is, this won't work if the data crosses the boundary between systems with different byte or halfword ordering conventions, whereas text will work just fine. It's an issue of portability versus speed. >A system which operates on the data values themselves without the intermediate >file representation can be much more efficient. There is no "intermediate file representation", unless by "file" you mean "byte stream". I don't find it generally useful to confuse these terms. -- Howard A. Landman {oliveb,hplabs}!intelca!mipos3!cpocd2!howard <- works howard%cpocd2%sc.intel.com@RELAY.CS.NET <- recently flaky "Unpick a ninny - recall Mecham"
ron@topaz.rutgers.edu (Ron Natalie) (10/21/87)
Excuse me but pipes do file I/O on some systems (neglecting MiniUNIX which doesn't have pipes and implements them with real files) real UNIX pipes used disk I/O. In non BSD implementations, an inode is allocated and disk blocks are allocated. Hopefully these stay in the buffer cache rather than needing to be written to disk, but if necessary they will get written out. Back in the days before FSCK, it was usually necessary to clri some of these pipe turds that were left in a crash (they neither have directory entries nor a link count). The System V R 2 V 3 on our 3B20 still does pipes this way. Berkeley UNIX implements pipes as network sockets. The data is stored in MBUFS, I suppose as virtual memory these can get paged out incurring disk I/O as well. -Ron