jfh@rpp386.cactus.org (John F Haugh II) (01/15/91)
In article <1991Jan14.202053.20054@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes: >Add up the costs of writing the C code, the costs of maintaining it, the >costs of enhancing it to add the next round of silly "features", and so >forth. Now consider how many times per second "who" gets run, and how >critical its performance is -- i.e., not very many and not very. The >optimization is not worth the price. Studies have shown that the cost of maintaining code is largely independent of the language and depends primarily on the amount of code. A three line change to a C file should be just as easy to maintain as a three line shell script. As for performance, benchmarks will indicate the fork() is an extremely expensive system call. When I was a Pinnacle Systems, one of my jobs was to perform benchmarks on competitors equipment. The average system that I evaluated fork()'d less than 100 times per second, with many well below 30 or 40. A certain 3 letter company's product fork'd about 20 times per second. Script is typescript, started Tue Jan 15 08:28:02 1991 rpp386-> cat fork.c main () { int i; for (i = 0;i 1000;i++) if (fork ()) while (wait (0) != -1) ; else exit (0); } rpp386-> cc -o fork fork.c -O -s fork.c rpp386-> timex ./fork execution complete, exit code = 1 real 14.53 user 0.83 sys 12.72 rpp386-> calc 1000 / 14.53 68.82312456985548 rpp386-> exit Script done Tue Jan 15 08:28:53 1991 >>... fork() and exec() are neither free nor even cheap - shell >>scripts are just not the right answer. > >Odd; we've found that shell scripts are the right answer for an enormous >range of applications. "ls" is a shell script on utzoo (it invokes the >Sun ls with the -1 option). The question is not whether fork() and exec() >are free -- obviously not -- or cheap -- thanks to manufacturer stupidity >and greed, often they aren't -- but whether they are cheap *enough*. The >answer is usually "yes". If you have more than 68 users on your system and it only forks 68 times per second, you are likely to find that shell scripts run pretty damned slow. The command who | cut -d' ' -f1 | sort | pr -6 -l1 will stop being fast *enough* the first time some collection of people start kicking off the other shell scripts the have replaced the other commands on the system. -- John F. Haugh II UUCP: ...!cs.utexas.edu!rpp386!jfh Ma Bell: (512) 832-8832 Domain: jfh@rpp386.cactus.org "While you are here, your wives and girlfriends are dating handsome American movie and TV stars. Stars like Tom Selleck, Bruce Willis, and Bart Simpson."
les@chinet.chi.il.us (Leslie Mikesell) (01/16/91)
In article <18946@rpp386.cactus.org> jfh@rpp386.cactus.org (John F Haugh II) writes: >If you have more than 68 users on your system and it only forks 68 times >per second, you are likely to find that shell scripts run pretty damned >slow. After identifying the problem, why do you suggest throwing programming effort at the symptoms rather than the cause? There's no reason that fork() needs to be slow, especially when called from a small executable program. There's even less reason for fork()/exec() combinations to be slow since you know that most of the effort of the fork() is going to be wasted, but that's another topic... >who | cut -d' ' -f1 | sort | pr -6 -l1 >will stop being fast *enough* the first time some collection of people >start kicking off the other shell scripts the have replaced the other >commands on the system. Likewise, if you build sorting and output formatting into every executable the system will slow down from loading all the extra code. And with less effort you could have designed a reasonable memory management scheme instead (or a shell where cut, sort and pr are built-in's if you like the kitchen-sink style). Les Mikesell les@chinet.chi.il.us
henry@zoo.toronto.edu (Henry Spencer) (01/17/91)
In article <18946@rpp386.cactus.org> jfh@rpp386.cactus.org (John F Haugh II) writes: >Studies have shown that the cost of maintaining code is largely independent >of the language and depends primarily on the amount of code. A three line >change to a C file should be just as easy to maintain as a three line shell >script. (a) These are not three-line changes. (b) Three lines of C *in the context of a much larger program* are a great deal harder to maintain than a three-line shell script which interacts with nothing but itself. >As for performance, benchmarks will indicate the fork() is an extremely >expensive system call... The question is not whether it costs a lot, but whether it is worth the cost. >If you have more than 68 users on your system and it only forks 68 times >per second, you are likely to find that shell scripts run pretty damned >slow... Only if your users can type at least one command a second, which is rare. I have no idea what the fork rate on our system is -- although being a Sun, it's probably damnably slow -- but it's enough for our not-notably-patient user community. A user who's sitting in a text editor or waiting for a sort to finish doesn't care about the fork rate. A user who's running "who" or a variant thereon usually doesn't either. I don't advocate callous disregard for efficiency -- that way lies GNU Emacs and other excesses -- but a sense of perspective is needed. Hacking C code to avoid writing a one-line shell script is a gross waste of time and money unless that program is truly critical to system performance. -- If the Space Shuttle was the answer, | Henry Spencer at U of Toronto Zoology what was the question? | henry@zoo.toronto.edu utzoo!henry
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (01/18/91)
In article <18946@rpp386.cactus.org> jfh@rpp386.cactus.org (John F Haugh II) writes: > As for performance, benchmarks will indicate the fork() is an extremely > expensive system call. When I was a Pinnacle Systems, one of my jobs was > to perform benchmarks on competitors equipment. The average system that > I evaluated fork()'d less than 100 times per second, with many well below > 30 or 40. A certain 3 letter company's product fork'd about 20 times > per second. Your test is extremely misleading. On one system I manage, the page size is 64K, and practically every executable is at least two pages. Guess what? fork() takes a *noticeable* amount of real time, simply because the machine spends so long loading from disk. But with several users running programs from several different disks, everything overlaps nicely and the total time spent is relatively small. I suspect the same is true of the systems you test---every fork() takes 30 or 50 ms, but many of them can overlap at once. ---Dan
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (01/18/91)
In article <1991Jan16.175908.3338@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes: > In article <18946@rpp386.cactus.org> jfh@rpp386.cactus.org (John F Haugh II) writes: > >Studies have shown that the cost of maintaining code is largely independent > >of the language and depends primarily on the amount of code. A three line > >change to a C file should be just as easy to maintain as a three line shell > >script. > (a) These are not three-line changes. You're absolutely right: adding -q to my clone of who takes four lines. One to declare a flag, one to snarf the option, one to test the flag, and one to process the output properly. I don't think it's possible to make the change in only three lines. (I'm not saying this is the right way to go. There should be a library routine for sweeping through utmp, and all these programs should take at most ten lines of real code. Then separate programs for who, users, who am i, etc. all make sense.) > (b) Three lines of C *in the context of a much larger program* are a great > deal harder to maintain than a three-line shell script which > interacts with nothing but itself. We are not talking about a much larger program. > I don't advocate callous disregard for efficiency -- that way lies GNU > Emacs and other excesses -- but a sense of perspective is needed. Hacking > C code to avoid writing a one-line shell script is a gross waste of time > and money unless that program is truly critical to system performance. That depends on your user community. In general, code that will be distributed to thousands of sites should be written efficiently. ---Dan
henry@zoo.toronto.edu (Henry Spencer) (01/19/91)
In article <1396:Jan1811:54:2091@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >> ... a sense of perspective is needed. Hacking >> C code to avoid writing a one-line shell script is a gross waste of time >> and money unless that program is truly critical to system performance. > >That depends on your user community. In general, code that will be >distributed to thousands of sites should be written efficiently. Oh, I quite agree. Please note that I'm co-author of a major piece of code -- C News -- that is distributed to, and in use at, thousands of sites. It relies quite heavily on shell scripts. The customers are generally extremely pleased with its performance. People who claim that shell scripts can't be efficient don't know what they're talking about. -- If the Space Shuttle was the answer, | Henry Spencer at U of Toronto Zoology what was the question? | henry@zoo.toronto.edu utzoo!henry
jfh@rpp386.cactus.org (John F Haugh II) (01/19/91)
In article <1249:Jan1811:37:3991@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >Your test is extremely misleading. On one system I manage, the page size >is 64K, and practically every executable is at least two pages. Guess >what? fork() takes a *noticeable* amount of real time, simply because >the machine spends so long loading from disk. But with several users >running programs from several different disks, everything overlaps >nicely and the total time spent is relatively small. I suspect the same >is true of the systems you test---every fork() takes 30 or 50 ms, but >many of them can overlap at once. Had you looked at the test data you would see that the CPU time is actually spent in the kernel executing code. The load during this test is 100% CPU utilization - there was no time available to overlap CPU and I/O cycles. In the case you gave of a large page size architecture, fork() should still take very little time since there need not be any physical I/O unless the available memory is consumed - fork() does not load a new image from disk, but rather copies the existing user structure, makes a few modifications and dopes around with page tables or the data pages themselves. An exec() test would illustrate any weakness in the exec() code caused by large page sizes. -- John F. Haugh II UUCP: ...!cs.utexas.edu!rpp386!jfh Ma Bell: (512) 832-8832 Domain: jfh@rpp386.cactus.org "While you are here, your wives and girlfriends are dating handsome American movie and TV stars. Stars like Tom Selleck, Bruce Willis, and Bart Simpson."
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (01/20/91)
In article <1991Jan18.162833.11061@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes: > In article <1396:Jan1811:54:2091@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > >> ... a sense of perspective is needed. Hacking > >> C code to avoid writing a one-line shell script is a gross waste of time > >> and money unless that program is truly critical to system performance. > >That depends on your user community. In general, code that will be > >distributed to thousands of sites should be written efficiently. > Oh, I quite agree. Please note that I'm co-author of a major piece of > code -- C News -- that is distributed to, and in use at, thousands of > sites. It relies quite heavily on shell scripts. The customers are > generally extremely pleased with its performance. Well, yes, but that's because you make intelligent decisions about which code to write in C so that the shell part isn't a bottleneck. Writing Berkeley's utmp checkers in sh rather than C is pointless. > People who claim that shell scripts can't be efficient don't know what > they're talking about. (Did I say shell scripts couldn't be efficient?) Efficiency is relative. Now that I have a lot of practice and some good libraries, I can convert shell code into C code at between 15 and 30 seconds per line, depending how sane the shell code is; I often get a speedup between 5 and 50 times. This is well worth it for any program that is often executed and rarely changed. ``who'' and ``users'' are examples of such programs. I don't get the speedup when the shell script simply invokes some long-running Cnews programs, but ``who'' is not my idea of a long-running program. ---Dan
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (01/20/91)
In article <18959@rpp386.cactus.org> jfh@rpp386.cactus.org (John F Haugh II) writes: > Had you looked at the test data you would see that the CPU time is > actually spent in the kernel executing code. Apparently the system is idle in your tests; many if not most schedulers will add disk wait time to the system time of the current process. > In the case you gave of a large page size architecture, fork() should > still take very little time since there need not be any physical I/O > unless the available memory is consumed Available (physical) memory is quite often consumed, and the average executable is swapped out. At NYU we actually use computers for computing things, ya know? ---Dan
jfh@rpp386.cactus.org (John F Haugh II) (01/20/91)
In article <1991Jan16.175908.3338@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes: >Only if your users can type at least one command a second, which is rare. >I have no idea what the fork rate on our system is -- although being a Sun, >it's probably damnably slow -- but it's enough for our not-notably-patient >user community. A user who's sitting in a text editor or waiting for a >sort to finish doesn't care about the fork rate. A user who's running >"who" or a variant thereon usually doesn't either. Actually it is quite easy for a user to exceed one command per second. This is done most easily by writing shell scripts, since each execution of the shell script will involve many more commands being executed. The overhead of fork/exec in those cases is completely wasted. There are some obvious places where scripts are warranted - such as where the execution time of the commands being executed dominates the execution time of the shell. In the case of short lived commands, such as "who | wc -l" it may be questionable - the command doesn't run very long in either case. A more ridiculous example might be where "who -q" is implemented somewhat faithfully as a shell script. -- if [ $# != 0 ]; then INPUT=$1 else INPUT=/etc/utmp fi USERS=`who $INPUT | cut -d' ' -f1` LINE=0 for USER in $USERS; do LENGTH=`expr $USER : '.*'` if [ `expr $LENGTH + $LINE + 1` -gt 80 ]; then LINE=0 echo fi LINE=`expr $LINE + $LENGTH + 1` echo $USER '\c' done echo COUNT=`who $INPUT | wc -l | sed -e 's/ *//g'` echo '# users='$COUNT -- Yes, it is an absolutely absurd example and is intended only to illustrate how incredibly SLOW shell scripts can be compared to the options when implemented in C code. -- John F. Haugh II UUCP: ...!cs.utexas.edu!rpp386!jfh Ma Bell: (512) 832-8832 Domain: jfh@rpp386.cactus.org "While you are here, your wives and girlfriends are dating handsome American movie and TV stars. Stars like Tom Selleck, Bruce Willis, and Bart Simpson."
jfh@rpp386.cactus.org (John F Haugh II) (01/20/91)
In article <19394:Jan1917:08:2691@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >In article <18959@rpp386.cactus.org> jfh@rpp386.cactus.org (John F Haugh II) writes: >> Had you looked at the test data you would see that the CPU time is >> actually spent in the kernel executing code. > >Apparently the system is idle in your tests; many if not most schedulers >will add disk wait time to the system time of the current process. There was no disk wait time, nor did there need to be. You are missing something very crucial here - if the system were idle and I hadn't consumed all available disk buffers, then there would be no physical I/O anyway since the blocks come straight out of the buffer pool. On the other hand, if the scheduler DID charge for disk-wait time, you would have the sum of all executing commands system time higher than the total wall clock time and that is just plain wrong. Normally the "current process" is the system itself or the "idle" process. But in most cases I've seen since v7, there is no accounting for the time - no process see the ticks in their accounting structure. You can verify this with the "crash" command. Finally, there was no need for disk I/O at all since I had not run out of physical memory and did not need to page out or read from the file system - I executed no new commands in the original fork/(exit|wait) scenario. Besides, I'm sitting right next to the drive which make an incredible racket, so I would have heard them go clickity-click-clack if there were any I/O. A counter-proof would be to execute multiple copies of that test at the same time. If the total time remained constant for increasing numbers of simultanous executions of the test case, you would be proven correct. However, I have done exactly this test at past jobs and it has demonstrated that the fork() call does not overlap time well. It really is just a very CPU intensive call. People will point out that it is very wasteful, and I agree - I also agree that fork() should be fixed rather than avoided, but given that the vendors REFUSE to make fork() faster, I don't see any real option but to avoid it. -- Script is typescript, started Sat Jan 19 13:51:45 1991 rpp386-> cat fork.c main () { int i; for (i = 0;i < 100;i++) if (fork ()) wait (0); else _exit (0); } rpp386-> cc -o fork fork.c fork.c rpp386-> timex ./fork execution complete, exit code = 1 real 1.31 user 0.00 sys 1.23 rpp386-> cat fork.sh ./fork & ./fork & ./fork & ./fork & ./fork & wait rpp386-> timex /bin/sh fork.sh execution complete real 6.93 user 0.05 sys 6.36 rpp386-> Script done Sat Jan 19 13:52:24 1991 -- Doing the math, % calc 6.93 / 5 1.386 % calc 6.36 / 5 1.272 Which you can see is pretty damned close to 1.31 and 1.23 in the first case. -- John F. Haugh II UUCP: ...!cs.utexas.edu!rpp386!jfh Ma Bell: (512) 832-8832 Domain: jfh@rpp386.cactus.org "While you are here, your wives and girlfriends are dating handsome American movie and TV stars. Stars like Tom Selleck, Bruce Willis, and Bart Simpson."
jmaynard@thesis1.hsch.utexas.edu (Jay Maynard) (01/21/91)
In article <1396:Jan1811:54:2091@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >In article <1991Jan16.175908.3338@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes: >> I don't advocate callous disregard for efficiency -- that way lies GNU >> Emacs and other excesses -- but a sense of perspective is needed. Hacking >> C code to avoid writing a one-line shell script is a gross waste of time >> and money unless that program is truly critical to system performance. (Henry, that first sentence would normally be .sigfile material. Bravo!) >That depends on your user community. In general, code that will be >distributed to thousands of sites should be written efficiently. For heaven's sake, why? Like Henry, I feel that efficiency is important - and C news says all that needs saying about Henry's idea of efficient (thanks!) - but consider the poor slob who's distributed those thousands of copies, and then has to maintain it. Why should he increase his support burden and maintenance headaches just to save a tenth of a second on a function that's executed twice a day? There's a tradeoff there, and the most effort towards efficiency is best concentrated on those parts of the system where it makes a significant difference. -- Jay Maynard, EMT-P, K5ZC, PP-ASEL | Never ascribe to malice that which can jmaynard@thesis1.hsch.utexas.edu | adequately be explained by stupidity. "Today is different from yesterday." -- State Department spokesman Margaret Tutwiler, 17 Jan 91, explaining why they won't negotiate with Saddam Hussein
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (01/21/91)
In article <4581@lib.tmc.edu> jmaynard@thesis1.hsch.utexas.edu (Jay Maynard) writes: > In article <1396:Jan1811:54:2091@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > >In article <1991Jan16.175908.3338@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes: > >> I don't advocate callous disregard for efficiency -- that way lies GNU > >> Emacs and other excesses -- but a sense of perspective is needed. Hacking > >> C code to avoid writing a one-line shell script is a gross waste of time > >> and money unless that program is truly critical to system performance. > (Henry, that first sentence would normally be .sigfile material. Bravo!) Hmmm. Up to the second dash, or the whole sentence? It'll only fit up to ``Emacs'' on one line (or two half-lines). > >That depends on your user community. In general, code that will be > >distributed to thousands of sites should be written efficiently. > For heaven's sake, why? I just gave several colleagues a good laugh by taking that response out of context. ``What is he trying to do, sell code to the government?'' The answer is ``To save as much computer time around the world as possible.'' If you waste a second on a program run ten times every day at thousands of sites, you waste thousands of hours of computer time every year. Is that really worth the five minutes that it takes to write ``users'' in C instead of sh? You're going to waste five minutes writing the documentation anyway. > There's a tradeoff there, and the most effort towards > efficiency is best concentrated on those parts of the system where it makes a > significant difference. I agree entirely. Read what I said again: Code that will be distributed to thousands of sites should be written efficiently. If a shell script is (for all practical purposes) just as efficient as a C program, then there's no problem. But utmp checkers are so easy to write in any language that there's no point using anything but C. ---Dan
ian@sibyl.eleceng.ua.OZ (Ian Dall) (01/22/91)
In article <18946@rpp386.cactus.org> jfh@rpp386.cactus.org (John F Haugh II) writes: >Studies have shown that the cost of maintaining code is largely independent >of the language and depends primarily on the amount of code. That confirms my experience. >A three line >change to a C file should be just as easy to maintain as a three line shell >script. I don't believe that. The size of the C program you are maintaining is the whole program, not just the 3 lines you added! The trouble is, when making the "three line change" to a big program, you have to comprehend (almost) the whole program. It's like the ubiquitous one line bug fix. The hard bit isn't typing in the line, it is working out what change to what line is required. Secondly, I think it is rare that you could translate a 3 line shell script to 3 lines of C code. -- Ian Dall life (n). A sexually transmitted disease which afflicts some people more severely than others. ACSnet: ian@sibyl.eleceng.ua.oz internet: ian@sibyl.eleceng.ua.oz.au
martin@mwtech.UUCP (Martin Weitzel) (02/03/91)
In article <18964@rpp386.cactus.org> jfh@rpp386.cactus.org (John F Haugh II) writes: [...] >There are some obvious places where scripts are warranted - such as where >the execution time of the commands being executed dominates the execution >time of the shell. In the case of short lived commands, such as "who | >wc -l" it may be questionable - the command doesn't run very long in >either case. A more ridiculous example might be where "who -q" is >implemented somewhat faithfully as a shell script. ^^^^^^^^^^^^^^^^^^^ (Hmm, as English is not my native language I'm not quite sure but I suppose this means the following example was deliberatly choosen not to show good performance - i.e. the poster is aware of the complicated way the script does its work.) >if [ $# != 0 ]; then > INPUT=$1 >else > INPUT=/etc/utmp >fi >USERS=`who $INPUT | cut -d' ' -f1` >LINE=0 >for USER in $USERS; do > LENGTH=`expr $USER : '.*'` > if [ `expr $LENGTH + $LINE + 1` -gt 80 ]; then > LINE=0 > echo > fi > LINE=`expr $LINE + $LENGTH + 1` > echo $USER '\c' >done >echo >COUNT=`who $INPUT | wc -l | sed -e 's/ *//g'` >echo '# users='$COUNT IMHO one problem is - as shown here - that often programming techniques useful in other languages are simply transliterated into the shell. The experienced shell programmer will generally try to avoid `expr ...` (and if he or she allready programmed under V7 also `test ...`), especially in the body of some often executed loop. An improvement to the above script avoids two `expr ....` within the body of the loop and reduces usr-time to 40% and sys-time to 30% of the original example (timings are taken on my 386-box and I'm aware of the difficulties timing a shell script containing pipes; as the most complex part is at the end of the pipe this shouldn't matter too much). case $# in 0) INPUT=/etc/utmp;; 1) INPUT=$1;; esac NL='' who $INPUT | pr -tn | ( while read num usr rest do if [ `expr "$OUTLINE$usr " : '.*'` -gt 80 ] then echo "$NL$OUTLINE"'\c' NL='\n' OUTLINE="$usr " else OUTLINE="$OUTLINE$usr " fi count=$num done echo "$NL$OUTLINE" echo "# users=$count" ) I think the above algorithm is not less obvious than the original example. The remaining `expr ...` in the body of the loop could also be changed into case "$OUTLINE$usr" ????????<80 question marks total>?????????*) echo "$NL$OUTLINE"'\c' NL='\n' OUTLINE="$usr " ;; *) OUTLINE="$OUTLINE$usr " ;; esac but I'd only choose that if I were hunting for performance, because it looks a bit obscure. (BTW: This last optmization reduces usr-time to 25% and sys-time to 5% of the original example.) Conclusion: Most languages have more and less efficient ways to do one and the same thing. Doing it the way you achieve good performance in one language may not fit well to some other language and vice versa. -- Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83