thad@cup.portal.com (Thad P Floryan) (11/27/90)
Yet another chapter in the saga of the ongoing "Don't shoe-shine MY data!" While investigating why the tape backup operation on the 3B1 is so s-l-o-w, even with double-buffering techniques, I finally pinpointed what appears to be the cause: PIPES. Pipes are used to transfer data to "tapecpio" in all the supplied shell scripts, and pipes are typically used to pass data from a "find" (i.e. "find * -print | cpio -oc > whatever"). "Piping" was the ONLY thing in common with all my testing, so I decided to instrument some pipe runs and see what gives. Seems the 3B1 pipes leak bits out into the Great Bit Bucket or sumtin'. This is the first time I've ever had something "bad" to say about the 3B1. And this "problem" affects more than just backups, it affects ANYTHING using pipes, so this should be of interest to you no matter what system you're using. Specifically: the BEST performance observed is approx. 35 KBytes/Second between two processes which are piped together. Adding more "drains" to the "pipe" worsens performance. I tested 4 UNIXPC systems, ranging from 4MB RAM/85MB HD to 1MB RAM/10MB HD, and the results are all in the same ballpark: 35-36 KBytes per second. Perhaps there's something I'm just not seeing, or perhaps some "ktune" params are not obvious. I'm working on the assumption that "pipes" are a performance bottleneck on the UNIXPC and so I went and grabbed some tape utils from site wsmr-simtel20.army.mil to see if a non-piped tape backup/restore program can improve performance. This will take some time to checkout, so in the meantime here are two things I'm asking: 1) Enclosed are my test programs, a Makefile, and a shell-script to exercise the tests. Try them on your system. If the results are substantially different, please post them along with your present "ktune" parameters (you get these by: "su; ktune -d"). By results "substantially different" I mean you're getting 200 KBytes/Sec or something else radically different from my results (below). 2) If you know of ways to improve pipe performance, please post them. I don't recall any discussions of this "problem" mentioned in this newsgroup before, so maybe I've opened a new "can-of-worms" here; wouldn't be the first time and definitely won't be the last! :-) Enclosed with this posting is a "shar" of my test suite. You may need to change the "gcc" in the Makefile to be "cc", but I tried both with no change in the observed performance. If nothing else, you may find the timing code in "recv.c" interesting. To run the tests, do either: $ ./test.sh (OR) $ nohup ./test.sh & That second form places its output in a file named "nohup.out". In all cases, the output will look something like: $ ./test.sh send <n> | recv 100000 characters received in 2.783 seconds for 35928 CPS 200000 characters received in 5.833 seconds for 34285 CPS 300000 characters received in 8.350 seconds for 35928 CPS 400000 characters received in 11.933 seconds for 33519 CPS 500000 characters received in 14.100 seconds for 35460 CPS 1000000 characters received in 28.200 seconds for 35460 CPS send <n> | pass | recv 100000 characters received in 5.566 seconds for 17964 CPS 200000 characters received in 10.333 seconds for 19354 CPS 300000 characters received in 16.200 seconds for 18518 CPS 400000 characters received in 21.200 seconds for 18867 CPS 500000 characters received in 26.733 seconds for 18703 CPS 1000000 characters received in 53.050 seconds for 18850 CPS If you see any flaws in my testing techniques, I'd appreciate knowing about them, too. But I've checked this out quite thoroughly and I'm convinced that what I'm seeing with the results (above) is the actual piping throughput. The "ktune" parameters on my systems are (the comments are my annotations): # ktune -d nbuf 100 #number of system buffers for block devices ninode 400 #number of memory-resident inodes at one time nfile 300 #number of files open on system at one time nproc 100 #number of processes existing at one time ntext 75 #number of text structures allocated in kernel nclist 150 #number of clist buffers available npbuf 16 #number of buffer headers in the raw I/O pool ncall 32 #number of callouts allowed in the kernel nttyhog 1024 #number of chars in tty buffers before implicit flush Some other systems I've already tested with the same suite include (with the results for 1,000,000 chars in both tests rounded to nearest 1000): HP-9000/840 (Spectrum RISC), HP-UX 3.01, 240000 CPS and 120000 CPS HP-9000/350 (Motorola 68030), HP-UX 7.0, 156000 CPS and 85000 CPS Thad Thad Floryan [ thad@cup.portal.com (OR) ..!sun!portal!cup.portal.com!thad ] ---- Cut Here and unpack ---- #!/bin/sh # This is a shell archive (shar 3.32) # made 11/27/1990 05:18 UTC by thad@thadlabs # Source directory /u/thad/Filecabinet/WORK/pipe-test # # existing files WILL be overwritten # # This shar contains: # length mode name # ------ ---------- ------------------------------------------ # 485 -rw-r--r-- Makefile # 247 -rw-r--r-- pass.c # 824 -rw-r--r-- recv.c # 332 -rw-r--r-- send.c # 411 -rwxr-xr-x test.sh # if touch 2>&1 | fgrep 'amc' > /dev/null then TOUCH=touch else TOUCH=true fi # ============= Makefile ============== echo "x - extracting Makefile (Text)" sed 's/^X//' << 'SHAR_EOF' > Makefile && X# 3B1 makefile for pipe speed testing X# XCC = gcc XCFLAGS = -O XLDFLAGS = -s XLIBS = /lib/crt0s.o /lib/shlib.ifile XNAME1 = send XOBJS1 = send.o XNAME2 = recv XOBJS2 = recv.o XNAME3 = pass XOBJS3 = pass.o X Xall : $(NAME1) $(NAME2) $(NAME3) X X$(NAME1): $(OBJS1) X $(LD) $(LDFLAGS) -o $(NAME1) $(OBJS1) $(LIBS) X X$(NAME2): $(OBJS2) X $(LD) $(LDFLAGS) -o $(NAME2) $(OBJS2) $(LIBS) X X$(NAME3): $(OBJS3) X $(LD) $(LDFLAGS) -o $(NAME3) $(OBJS3) $(LIBS) X Xclean : X rm -f $(OBJS1) $(OBJS2) $(OBJS3) core *~ SHAR_EOF $TOUCH -am 1126050290 Makefile && chmod 0644 Makefile || echo "restore of Makefile failed" set `wc -c Makefile`;Wc_c=$1 if test "$Wc_c" != "485"; then echo original size 485, current size $Wc_c fi # ============= pass.c ============== echo "x - extracting pass.c (Text)" sed 's/^X//' << 'SHAR_EOF' > pass.c && X/* pass.c X * X * just passes/handoffs chars from stdin to stdout until EOF for testing X * the speed of pipes on the system. X * X * Thad Floryan, 26-Nov-1990 X */ X X#include <stdio.h> X Xmain() X{ X int c; X X while ( (c = getchar()) != EOF ) putchar(c); X X} SHAR_EOF $TOUCH -am 1126045490 pass.c && chmod 0644 pass.c || echo "restore of pass.c failed" set `wc -c pass.c`;Wc_c=$1 if test "$Wc_c" != "247"; then echo original size 247, current size $Wc_c fi # ============= recv.c ============== echo "x - extracting recv.c (Text)" sed 's/^X//' << 'SHAR_EOF' > recv.c && X/* recv.c X * X * just receives chars from stdin until EOF for testing the speed X * of pipes on the system. X * X * Thad Floryan, 26-Nov-1990 X */ X X#include <stdio.h> X#include <sys/param.h> /* for def of HZ */ X#include <sys/types.h> X#include <sys/times.h> X Xmain() X{ X extern long times(); X X long startime, endtime, elapsed; X struct tms timebuf; X long numchrs = 0; X X startime = times(&timebuf); /* get start time in HZ units */ X X while ( getchar() != EOF ) ++numchrs; X X endtime = times(&timebuf); /* get completion time in HZ units */ X X if ( (elapsed = endtime - startime) != 0L ) X { X printf("%d characters received in %d.%03d seconds for %d CPS\n", X numchrs, X elapsed / HZ, X ((elapsed % HZ) * 1000L) / HZ, X ((numchrs * HZ) / elapsed )); X } X else X { X printf("Insufficient timer resolution for supplied input\n"); X } X} SHAR_EOF $TOUCH -am 1126045390 recv.c && chmod 0644 recv.c || echo "restore of recv.c failed" set `wc -c recv.c`;Wc_c=$1 if test "$Wc_c" != "824"; then echo original size 824, current size $Wc_c fi # ============= send.c ============== echo "x - extracting send.c (Text)" sed 's/^X//' << 'SHAR_EOF' > send.c && X/* send.c X * X * just sends argv[1] number of characters out for testing the speed X * of pipes on the system. X * X * Thad Floryan, 26-Nov-1990 X */ X X#include <stdio.h> X Xmain(argc, argv) X int argc; X char *argv[]; X{ X long numchrs; X X numchrs = atol(argv[1]); /* dismiss error checks for now */ X X while ( --numchrs >= 0L ) putchar('X'); X} SHAR_EOF $TOUCH -am 1126044190 send.c && chmod 0644 send.c || echo "restore of send.c failed" set `wc -c send.c`;Wc_c=$1 if test "$Wc_c" != "332"; then echo original size 332, current size $Wc_c fi # ============= test.sh ============== echo "x - extracting test.sh (Text)" sed 's/^X//' << 'SHAR_EOF' > test.sh && Xecho "\nsend <n> | recv\n" X./send 100000 | ./recv X./send 200000 | ./recv X./send 300000 | ./recv X./send 400000 | ./recv X./send 500000 | ./recv X./send 1000000 | ./recv Xecho "\nsend <n> | pass | recv\n" X./send 100000 | ./pass | ./recv X./send 200000 | ./pass | ./recv X./send 300000 | ./pass | ./recv X./send 400000 | ./pass | ./recv X./send 500000 | ./pass | ./recv X./send 1000000 | ./pass | ./recv SHAR_EOF $TOUCH -am 1126175890 test.sh && chmod 0755 test.sh || echo "restore of test.sh failed" set `wc -c test.sh`;Wc_c=$1 if test "$Wc_c" != "411"; then echo original size 411, current size $Wc_c fi exit 0
thad@cup.portal.com (Thad P Floryan) (11/28/90)
Hmmm, this topic is generating a lot of interest; I have received numerous 'phone calls today about it confirming my suspicions, and a few "Hey, Thad, doncha know about read(0,..) and write(1,..)." As I stated, I did research this quite carefully. I have 3 versions of EACH of those 3 programs (send, pass, recv) and the ones I posted ARE the fastest. Yep. Look at the macros for getchar() and putchar() in /usr/include/stdio.h on YOUR system ... not bad code at all. The 2nd version of my test suite used { read(0, &buf,1); and (write(1, &buf, 1) }, and the 3rd version of the suite used buffered read(0,,) & write(1,,) along the lines of the getchar() example in K&R. The version I posted produced the best times. I've also read the relevant chapters in both Bach's "The Design of the UNIX Operating System" and Leffler's, et al "The Design and Implementation of the 4.3BSD Operating System" with no new insight on the matter. What's interesting to me is that the 156KBytes/S on the 25MHz 68030 is approx. 5 times the 35Kbytes/S on the 10MHz 68010, leading me to speculate pipe performance is perhaps limited by the CPU (considering the FIFO ring-buffer implemention via its inode by the kernel). Dunno; without source I'm guessing. The 3B1's 35Kbytes/S pipe throughput and tape-change/-retension time also coincides perfectly with the observed time required to back up the system to tape, leading me to again conclude that pipes (and NOT disk or tape drive speeds) are the bottleneck. And I'm not using the word "pipe" loosely. If you examine the tape scripts, you'll note that filenames are piped to tapecpio which then pipes to dbuf which then "standard outputs" to the actual device. And, to add more fuel to the fire, I just 45 minutes ago received yet another call from a person who wishes to remain anonymous (to the net) stating that his company's booth at a trade show was visited by a passerby who "played" for awhile with their system in the booth and uttered: "Oh, your system has the AT&T pipe bug, too!" NOW: AT&T has never (to my knowledge) acknowledged any pipe bug(s), but that person's company DID make some kernel changes after the show which (allegedly) boosted pipe performance 10 times. Further inquiry on my part revealed the SCCS modules have long since been archived (to tape, natch! :-), so finding the specific fix(es) is simply not feasible since that company is presently doing an SVR4 port and has no interest in looking at 5-year old code for their "ancient" SVR2 and early SVR3 releases (for which the alleged fix was implemented by them). Sigh; without source, I have no hope of resolving this matter concerning pipes. Everyone else is confirming my posted stats. And various ktune changes aren't changing the stats one way or another. And I'm weary of rebooting to test. And the fact there's no actual "bug" (i.e. the code works, albeit slowly) obviates any hope of a third fix-disk from AT&T. So, on to checking out tar's performance, and checking out the stuff I snarfed from simtel20. A quick glance indicates Fred Fish's BRU program may be the best hope for speedy (and "maybe" streaming) tape backups. Any other ideas and suggestions are still welcome. Thad Floryan [ thad@cup.portal.com (OR) ..!sun!portal!cup.portal.com!thad ]
mark@cbnewsb.cb.att.com (Mark Horton) (11/28/90)
If slow pipes are really the problem, you could always use the same workaround that systems without pipes use: disk files. foo | bar can be emulated with foo > /tmp/foo.$$ bar < /tmp/foo.$$ rm /tmp/foo.$$ Of course, this assumes adequate disk space for the temp file. Mark
res@cbnews.att.com (Robert E. Stampfli) (12/01/90)
Here are some hacked up versions of Thad's send/recv pipe tester programs which yield results an order of magnitude faster than his. As Shakespeare would put it, "The fault, Dear Thaddeus, lies not in pipes, but in stdio..." Rob Stampfli att!cbnews!res ========================================================================== #! /bin/sh # This is a shell archive, meaning: # 1. Remove everything above the #! /bin/sh line. # 2. Save the resulting text in a file. # 3. Execute the file with /bin/sh (not csh) to create: # Makefile # myrecv.c # mysend.c # mytest.sh # pass.c # recv.c # send.c # test.sh # This archive created: Fri Nov 30 12:18:52 1990 export PATH; PATH=/bin:/usr/bin:$PATH echo shar: "extracting 'Makefile'" '(169 characters)' if test -f 'Makefile' then echo shar: "will not over-write existing file 'Makefile'" else sed 's/^ X//' << \SHAR_EOF > 'Makefile' X X# 3B1 makefile for pipe speed testing X# XCC = gcc -shlib -s XCFLAGS = -O X Xall : mysend myrecv send recv pass X Xclean : X rm -f *.o mysend myrecv send recv pass core a.out SHAR_EOF if test 169 -ne "`wc -c < 'Makefile'`" then echo shar: "error transmitting 'Makefile'" '(should have been 169 characters)' fi fi echo shar: "extracting 'myrecv.c'" '(1025 characters)' if test -f 'myrecv.c' then echo shar: "will not over-write existing file 'myrecv.c'" else sed 's/^ X//' << \SHAR_EOF > 'myrecv.c' X/* recv.c X * X * just receives chars from stdin until EOF for testing the speed X * of pipes on the system. X * X * Thad Floryan, 26-Nov-1990 X * rewritten without stdio, Rob Stampfli 30-Nov-1990 X */ X X#include <stdio.h> X#include <sys/param.h> /* for def of HZ */ X#include <sys/types.h> X#include <sys/times.h> X Xmain(argc) X{ X extern long times(); X X static char buffer[BUFSIZ*5]; X register int i; X long startime, endtime, elapsed; X struct tms timebuf; X long numchrs = 0; X X startime = times(&timebuf); /* get start time in HZ units */ X X while ( (i = read(0, buffer, BUFSIZ*5)) > 0 ) { X if(argc > 1) X printf("%d ", i); X numchrs += i; X } X X if(argc > 1) X printf("\n"); X X endtime = times(&timebuf); /* get completion time in HZ units */ X X if ( (elapsed = endtime - startime) != 0L ) X { X printf("%d characters received in %d.%03d seconds for %d CPS\n", X numchrs, X elapsed / HZ, X ((elapsed % HZ) * 1000L) / HZ, X ((numchrs * HZ) / elapsed )); X } X else X { X printf("Insufficient timer resolution for supplied input\n"); X } X} SHAR_EOF if test 1025 -ne "`wc -c < 'myrecv.c'`" then echo shar: "error transmitting 'myrecv.c'" '(should have been 1025 characters)' fi fi echo shar: "extracting 'mysend.c'" '(595 characters)' if test -f 'mysend.c' then echo shar: "will not over-write existing file 'mysend.c'" else sed 's/^ X//' << \SHAR_EOF > 'mysend.c' X/* send.c X * X * just sends argv[1] number of characters out for testing the speed X * of pipes on the system. X * X * Thad Floryan, 26-Nov-1990 X * rewritten without stdio, Rob Stampfli 30-Nov-1990 X */ X X#include <stdio.h> X Xmain(argc, argv) X int argc; X char *argv[]; X{ X static char buffer[BUFSIZ]; X register int i; X long numchrs; X X numchrs = atol(argv[1]); /* dismiss error checks for now */ X X /* while ( --numchrs >= 0L ) putchar('X'); */ X X for(i = 0; i < BUFSIZ; i++) X buffer[i] = 'X'; X X while ( numchrs > 0L) { X write(1, buffer, (numchrs > BUFSIZ ? BUFSIZ : numchrs)); X numchrs -= BUFSIZ; X } X} SHAR_EOF if test 595 -ne "`wc -c < 'mysend.c'`" then echo shar: "error transmitting 'mysend.c'" '(should have been 595 characters)' fi fi echo shar: "extracting 'mytest.sh'" '(482 characters)' if test -f 'mytest.sh' then echo shar: "will not over-write existing file 'mytest.sh'" else sed 's/^ X//' << \SHAR_EOF > 'mytest.sh' Xecho "\nmysend <n> | myrecv\n" X./mysend 100000 | ./myrecv X./mysend 200000 | ./myrecv X./mysend 300000 | ./myrecv X./mysend 400000 | ./myrecv X./mysend 500000 | ./myrecv X./mysend 1000000 | ./myrecv Xecho "\nmysend <n> | dd bs=5k | myrecv\n" X./mysend 100000 | dd bs=5k | ./myrecv X./mysend 200000 | dd bs=5k | ./myrecv X./mysend 300000 | dd bs=5k | ./myrecv X./mysend 400000 | dd bs=5k | ./myrecv X./mysend 500000 | dd bs=5k | ./myrecv X./mysend 1000000 | dd bs=5k | ./myrecv SHAR_EOF if test 482 -ne "`wc -c < 'mytest.sh'`" then echo shar: "error transmitting 'mytest.sh'" '(should have been 482 characters)' fi chmod +x 'mytest.sh' fi echo shar: "extracting 'pass.c'" '(247 characters)' if test -f 'pass.c' then echo shar: "will not over-write existing file 'pass.c'" else sed 's/^ X//' << \SHAR_EOF > 'pass.c' X/* pass.c X * X * just passes/handoffs chars from stdin to stdout until EOF for testing X * the speed of pipes on the system. X * X * Thad Floryan, 26-Nov-1990 X */ X X#include <stdio.h> X Xmain() X{ X int c; X X while ( (c = getchar()) != EOF ) putchar(c); X X} SHAR_EOF if test 247 -ne "`wc -c < 'pass.c'`" then echo shar: "error transmitting 'pass.c'" '(should have been 247 characters)' fi fi echo shar: "extracting 'recv.c'" '(824 characters)' if test -f 'recv.c' then echo shar: "will not over-write existing file 'recv.c'" else sed 's/^ X//' << \SHAR_EOF > 'recv.c' X/* recv.c X * X * just receives chars from stdin until EOF for testing the speed X * of pipes on the system. X * X * Thad Floryan, 26-Nov-1990 X */ X X#include <stdio.h> X#include <sys/param.h> /* for def of HZ */ X#include <sys/types.h> X#include <sys/times.h> X Xmain() X{ X extern long times(); X X long startime, endtime, elapsed; X struct tms timebuf; X long numchrs = 0; X X startime = times(&timebuf); /* get start time in HZ units */ X X while ( getchar() != EOF ) ++numchrs; X X endtime = times(&timebuf); /* get completion time in HZ units */ X X if ( (elapsed = endtime - startime) != 0L ) X { X printf("%d characters received in %d.%03d seconds for %d CPS\n", X numchrs, X elapsed / HZ, X ((elapsed % HZ) * 1000L) / HZ, X ((numchrs * HZ) / elapsed )); X } X else X { X printf("Insufficient timer resolution for supplied input\n"); X } X} SHAR_EOF if test 824 -ne "`wc -c < 'recv.c'`" then echo shar: "error transmitting 'recv.c'" '(should have been 824 characters)' fi fi echo shar: "extracting 'send.c'" '(332 characters)' if test -f 'send.c' then echo shar: "will not over-write existing file 'send.c'" else sed 's/^ X//' << \SHAR_EOF > 'send.c' X/* send.c X * X * just sends argv[1] number of characters out for testing the speed X * of pipes on the system. X * X * Thad Floryan, 26-Nov-1990 X */ X X#include <stdio.h> X Xmain(argc, argv) X int argc; X char *argv[]; X{ X long numchrs; X X numchrs = atol(argv[1]); /* dismiss error checks for now */ X X while ( --numchrs >= 0L ) putchar('X'); X} SHAR_EOF if test 332 -ne "`wc -c < 'send.c'`" then echo shar: "error transmitting 'send.c'" '(should have been 332 characters)' fi fi echo shar: "extracting 'test.sh'" '(411 characters)' if test -f 'test.sh' then echo shar: "will not over-write existing file 'test.sh'" else sed 's/^ X//' << \SHAR_EOF > 'test.sh' Xecho "\nsend <n> | recv\n" X./send 100000 | ./recv X./send 200000 | ./recv X./send 300000 | ./recv X./send 400000 | ./recv X./send 500000 | ./recv X./send 1000000 | ./recv Xecho "\nsend <n> | pass | recv\n" X./send 100000 | ./pass | ./recv X./send 200000 | ./pass | ./recv X./send 300000 | ./pass | ./recv X./send 400000 | ./pass | ./recv X./send 500000 | ./pass | ./recv X./send 1000000 | ./pass | ./recv SHAR_EOF if test 411 -ne "`wc -c < 'test.sh'`" then echo shar: "error transmitting 'test.sh'" '(should have been 411 characters)' fi chmod +x 'test.sh' fi exit 0 # End of shell archive -- Rob Stampfli / att.com!stampfli (uucp@work) / kd8wk@w8cqk (packet radio) 614-864-9377 / osu-cis.cis.ohio-state.edu!kd8wk!res (uucp@home)
rstevens@noao.edu (Rich Stevens) (12/03/90)
Take a look at Figure 17.2 (p. 684) of my book "UNIX Network Programming" (Prentice Hall, 1990) -- I could only get about 200 Kbytes/sec for the 3b1 using pipes with 512 bytes per read/write system call. Rich Stevens