stubbs@ncr-sd.UUCP (Jan Stubbs) (12/12/85)
IOCALL, A UNIX SYSTEM PERFORMANCE BENCHMARK The results so far are below. Thanks everybody. Send your results to me directly. The benchmark is a "C" program which measures Unix kernel performance. time iocall Send all 3 times (user, system, real) I am reporting the system time only. "The opinions expressed herein are those of the author. Your mileage may vary". Problems: 1) As Jeff Makey kindly pointed out, IOCALL unfortunately does cross a buffer boundary if your buffer size is 512. Older versions of Unix (Version 7, System III) and their progeny were 512. Berkeley 4.2, 4.3, System V and their progeny are 1024 or bigger, so no problem with those numbers. But all the numbers sent to me for the 512 byte buffer unixes are slower than they should be because they did over 1000 disk writes which uses lots of cpu cycles in the drivers. I don't know about Version 8 or 2.9 BSD, can anyone help? Jeff offered a solution, which adds a seek to keep everything in the 1st 512 bytes. This makes the kernel do a little extra work, but it did not change the timing on our Pyramid. The new source is below, if you have a 512 byte buffer version of Unix please rerun with this one. 2) Jeff and others also pointed out that the 2nd argument to lseek should be a long not an int. Shame on me! See what happens when you don't lint your programs? The source below also fixes this. Reruns may be required to get correct results on machines where longs aren't the same as ints. (PDP's...). 3) I failed to mention that these timings should be run on an otherwise idle machine. If you can please run them so, it does improve the timings. 4) Since not everyone is a good sport about benchmarks, and since I might be a biased source, and since I don't have access to the latest NCR Unix stuff anyhow (the M68020 based Tower/32) I won't publish any NCR numbers, unless offered to me by NCR E&M Columbia which is where the Tower line comes from. I encourage someone else to do so however. Jan Stubbs ..sdcsvax!ncr-sd!stubbs IOCALL RESULTS: SYSTEM UNIX VERSION SYSTEM TIME SECONDS ----------- ---------------- ------------------- DEC Rainbow100 w/NECV20 Venix 18.4 *a DEC Pro-300 Venix 1.1 18.1 *a MicroVax I Ultrix V1.1 18.0 Onyx C8002s Z8000 SIII 13.7 *a Onyx C8002 Z8000 v7 13.3 *a TIL NS32016 9MHz No Wait states Local Port 12.2 ATT 3b2/300 SV 10.3 VAX 11/750 4.2 BSD 10.0 PDP 11/44 ISR 2.9 BSD 9.5 VAX 11/750 SV.2 9.4 VAX 11/750 4.3 BSD 9.0 Sun-2 10MHz 68010 4.2 BSD Rel 2.0 9.0 Sun-2 10MHz 68010 4.2 BSD Rel 3.0 8.7 PE 3220 V7 Workbench 8.5 *a VAX 11/750 research version 8 8.1 VAX 11/750 4.1 BSD 7.2 Radio Shack 16A Xenix (v7) 7.2 *a PC/AT Venix 5.2 6.8 ATT7300 Unix PC 10MHz 68010 SV.2 6.4 Bullet286(PC/XT) Venix 2.0 6.0 *a Pyramid 90x w/cache OSx2.3 5.8 VAX 11/780 4.2 BSD 5.7 Plessey Mantra 12.5Mhz 68000 Uniplus SV Release 0 5.5 MicroVax II Ultrix 1.1 5.2 HP9000-550 3cpu's HP-UX 5.01 5.1 *c PC/AT 7.5 Mhz Venix286 SV.2 5.1 Convex C-1 4.2 BSD 4.6 VAX 11/785 SV.2 4.4 VAX 11/785 4.3 BSD 3.6 Sun-3/75 16.67Mhz 68020 4.2 BSD 3.6 Sun-3/160M-4 16.67Mhz 68020 4.2 BSD Rel 3.0 Alpha 3.6 GEC 63/40 S 5.1 2.7 Gould PN9080 UTX 1.2 2.5 Sperry 7000/40 (aka CCI 6/32) 4.2 BSD 1.9 *b VAX 8600 4.3 BSD 1.3 VAX 8600 Ultrix 1.2-1 1.1 IBM 3083 UTS SV 1.0 *b Amdahl 470/V8 UTS/V (SV Rel 2,3)V1.1+ .98 *b Notes: *a This result obtained with original version of IOCALL which crosses the 512 512 byte buffer boundary, and this version of Unix has buffers of 512 bytes. This is believed to be the case with all Version 7 and SIII derived OS's. It will result in a 1001 writes being done which uses significantly more cpu time and makes these results comparable only to others with the same problem. See discussion above. 2.9 BSD???? *b This result was obtained with a system which probably had other programs runningat the time the result was obtained. Submitter is requested to rerun if possiblewhen system is idle. This will improve the result somewhat. *c Multi-cpu system. IOCALL was run single thread, which probably did not utilize all cpu's. This system probably has considerably more power than is reflected by the result. -------cut----cut------cut------------------------------- /*This benchmark tests speed of Unix system call interface and speed of cpu doing common Unix io system calls. */ char buf[512]; int fd,count,i,j; main() { fd = creat("/tmp/testfile",0777); close(fd); fd = open("/tmp/testfile",2); unlink("/tmp/testfile"); for (i=0;i<=1000;i++) { lseek(fd,0L,0); /* add this line! */ count = write(fd,buf,500); lseek(fd,0L,0); /* second argument must be long */ for (j=0;j<=3;j++) count = read(fd,buf,100); } }
dan@rna.UUCP (Dan Ts'o) (12/13/85)
In article <354@ncr-sd.UUCP> stubbs@ncr-sd.UUCP (0000-Jan Stubbs) writes: > > IOCALL, A UNIX SYSTEM PERFORMANCE BENCHMARK >The results so far are below. Thanks everybody. >Send your results to me directly. The benchmark is a "C" program >which measures Unix kernel performance. > > >/*This benchmark tests speed of Unix system call interface > and speed of cpu doing common Unix io system calls. */ > >char buf[512]; >int fd,count,i,j; > >main() >{ > fd = creat("/tmp/testfile",0777); > close(fd); > fd = open("/tmp/testfile",2); > unlink("/tmp/testfile"); >for (i=0;i<=1000;i++) { > lseek(fd,0L,0); /* add this line! */ > count = write(fd,buf,500); > lseek(fd,0L,0); /* second argument must be long */ > > for (j=0;j<=3;j++) > count = read(fd,buf,100); > } >} Well I don't want to flame too much. Just a few comments. Basically, I find it difficult to take this benchmark and the presented results too seriously. - I have trouble understanding the point of the benchmark program. It just seems bizarre. For 1000 times, it writes 500 bytes at the beginning of the file and reads 400 of them back, 100 at a time. Because of the buffer cache, this whole routine just does user/kernel buffer copies, back and forth. If the performance of the system call interface and user/kernel memory copies is the what is trying to be measured, then the results may be okay, although strangely obtained. I don't believe it measures much else in the way of kernel performance, or system performance. Its not even something a normal user can relate to, such as "copying files on a X is twice as fast as Y". - It is obviously a single point measurement. It can tell you very little about how particular applications or the system in general will run. - The numbers are way to small to interpret with any substantial significance (i.e. you should run the benchmark with say 10000, rather than 1000 in the the loop). The difference between the various VAX 11/750 times are, for example 7.2 to 9.4 . I could be convince there is significance there, but... - That a Radio Shack 16A performs 25% better than a VAX 11/750 is cute but little practical interest (read ridiculous, a benchmark that tells me that is probably not going to be very useful, are we really to think that an Amdahl 470/V8 is only 12% faster than a VAX8600, that a Pyramid is slower than a VAX 11/780).
hammond@petrus.UUCP (Rich A. Hammond) (12/16/85)
> In article <354@ncr-sd.UUCP> stubbs@ncr-sd.UUCP (0000-Jan Stubbs) writes: > > > > IOCALL, A UNIX SYSTEM PERFORMANCE BENCHMARK > >... The benchmark is a "C" program which measures Unix kernel performance. > Dan Tsao writes: > Well I don't want to flame too much. Just a few comments. > > Basically, I find it difficult to take this benchmark and the presented > results too seriously. > > - I have trouble understanding the point of the benchmark program. > ... It's not even something a normal user can relate to, > such as "copying files on a X is twice as fast as Y". > > - It is obviously a single point measurement. It can tell you very > little about how particular applications or the system in general will run. > > - The numbers are way to small to interpret with any substantial > significance (i.e. you should run the benchmark with say 10000, rather than 1000 > in the the loop). The difference between the various VAX 11/750 times are, > for example 7.2 to 9.4 . I could be convince there is significance there, but... > > - That a Radio Shack 16A performs 25% better than a VAX 11/750 is cute > but little practical interest (read ridiculous, a benchmark that tells me that > is probably not going to be very useful, are we really to think that an > Amdahl 470/V8 is only 12% faster than a VAX8600, that a Pyramid is slower than > a VAX 11/780). a) I agree it doesn't measure everything, but it does check three important aspects that affect overall system performance: context switch costs, copying costs, and the cost of finding the buffer in the buffer cache. b) You want to avoid using the disks, since after all, an IBM PC with a fast hard disk would probably outperform an 8600 with an RK05. Thus, the statement "system A copies files twice as fast as system B" is only useful knowing the I/O configuration (was it massbus/unibus disks on a Vax?, what type disks, ....). c) I agree, run the benchmark with more times through the loop on fast machines. 1000 is probably enough on small machines. d) The point about the benchmark results is not that they are ridiculous, but that they might show up areas which need work. For example, if you simply port UNIX to a large machine and increase the number of buffers without thinking about the way the buffer cache works, you are likely to find that you have, say, 1024 buffers chained into 60 queues. Whereas on a pdp11 you had 60 buffers in 60 queues. Which one will take less time to find a buffer in? Raw machine speed alone won't tell you the answer. Further, lets suppose you built a machine with lots of registers and a load/store architecture (i.e. RISC, Pyramid). It turns out the cost of doing a context switch is higher (save all registers) and the load/store architecture is at its worst on doing memory to memory copies. Thus, a pyramid might very well do worse than a Vax 11/780. I timed a long to long copy on a pyramid in user mode, it was only 1.15 * the 11/780. Given that the pyramid has a slow context switch.... e) The variation among machines of the same model is real, we have two 780's and one is consistently about 5% faster on benchmarks. We have two Pyramids and again, one is consistently faster on the same benchmarks. One should always take +/- 10% on benchmarks to compare machines. Rich Hammond, Bell Communications Research
larry@geowhiz.UUCP (Larry McVoy) (12/18/85)
In article <761@petrus.UUCP> hammond@petrus.UUCP (Rich A. Hammond) writes: >> In article <354@ncr-sd.UUCP> stubbs@ncr-sd.UUCP (0000-Jan Stubbs) writes: >> > >> > IOCALL, A UNIX SYSTEM PERFORMANCE BENCHMARK >> >... The benchmark is a "C" program which measures Unix kernel performance. >> >Dan Tsao writes: >> Well I don't want to flame too much. Just a few comments. >> >> Basically, I find it difficult to take this benchmark and the presented >> results too seriously. I tend to agree with Dan. I think what people would like to see is a benchmark which measures how well Unix, running multiple users, performs on each machine. The benchmark would have to measure something that did not vary widely (such as I/O devices), as those results would only reflect how much one had spent on the bus & disk. So, how about this: The dryhstone benchmarks are considered good tests of the CPU (at least by me they are), but don't really test Unix at all (in fact some people run them in standalone mode). How about a version, (called forkstone?), which runs the dryhstone as 1, 2, 8, and 64 concurrent processes? This would show 1) the speed of the CPU, 2) first part of the curve, 8) a nice single user level, and 64) what happens when you have multiple users. It would not test I/O, which is a hard thing to test fairly. It would get rid of those Z80 dryhstones (flame, flame) as they're not multi tasking... I guess if there is any response and nobody wants to do it, I'll hack the drystones. I think it would be better if the original author did it, as {s}he probably can understand that bastardized {C}Ada source. Please post your views to the net. I don't want to discuss this via mail. -- Larry McVoy ----------- Arpa: mcvoy@rsch.wisc.edu Uucp: {seismo, ihnp4}!uwvax!geowhiz!geophiz!larry "If you are undertaking anything substantial, C is the only reasonable choice of programming language" - Brian W. Kerninghan
gemini@homxb.UUCP (Rick Richardson) (12/19/85)
Larry McVoy writes: >I tend to agree with Dan. I think what people would like to see is a >benchmark which measures how well Unix, running multiple users, performs >on each machine. The benchmark would have to measure something that did >not vary widely (such as I/O devices), as those results would only reflect >how much one had spent on the bus & disk. So, how about this: > >The dryhstone benchmarks are considered good tests of the CPU (at least by >me they are), but don't really test Unix at all (in fact some people run >them in standalone mode). How about a version, (called forkstone?), which >runs the dryhstone as 1, 2, 8, and 64 concurrent processes? This would >show 1) the speed of the CPU, 2) first part of the curve, 8) a nice single >user level, and 64) what happens when you have multiple users. > >It would not test I/O, which is a hard thing to test fairly. It would get >rid of those Z80 dryhstones (flame, flame) as they're not multi tasking... > >I guess if there is any response and nobody wants to do it, I'll hack the >drystones. I think it would be better if the original author did it, as >{s}he probably can understand that bastardized {C}Ada source. I don't think that running multiple dhrystones would measure anything more than the cost of doing a context switch once every <scheduling granularity>. Except on a multiple processor machine, the time will be N*1 dhrystone + M context switches. There are easier ways to measure the time to do a context switch. If you want to measure multi-user response, you've GOT to open the IO can-of-worms, since they WILL be doing IO. Rick Richardson, PC Research, Inc. (201) 922-1134 ..!ihnp4!houxm!castor!{rer,pcrat!rer} <--Replies to here, not to homxb!!! P.S. Rheinhold Weicker is the author of Dhrystone. I apologize for creating the bastardized {C}Ada source from his original Ada!
larry@geowhiz.UUCP (Larry McVoy) (12/21/85)
>I wrote: >>I tend to agree with Dan. I think what people would like to see is a >>benchmark which measures how well Unix, running multiple users, performs >>on each machine. The benchmark would have to measure something that did >>not vary widely (such as I/O devices), as those results would only reflect > Rick Richardson writes: >I don't think that running multiple dhrystones would measure anything more >than the cost of doing a context switch once every <scheduling granularity>. >Except on a multiple processor machine, the time will be N*1 dhrystone + >M context switches. There are easier ways to measure the time to do a context >switch. If you want to measure multi-user response, you've GOT to open the >IO can-of-worms, since they WILL be doing IO. > >P.S. Rheinhold Weicker is the author of Dhrystone. I apologize for >creating the bastardized {C}Ada source from his original Ada! Well, ok, so you don't think multiple dhrystones would be interesting. Hmm... I do - it would be interesting to know how well they do when there's lots of them. You say it's no more than testing context switches implying that all context switches are equal. Uh-uh. For example: I heard (from guy harris who I'm sure will correct any inaccuracies) that Sun-3 memory management is done such that 8 memory mapping context blocks are in memory at all times. This leads to fast-fast-fast response for active jobs <= 8, but what happens when you go to 16? 32? I think we both agree that testing I/O is a mess. Really hard to get an objective and accurate reflection of a machines performance. I think we also both agree that what people would like to see is some sort of measurement of a machines multi-{user,tasking} capability. So, I made a pass -- what have you to offer instead? -larry BTW - sorry about the {C}Ada crack, just my peevishness at not being able to decipher it... -- Larry McVoy ----------- Arpa: mcvoy@rsch.wisc.edu Uucp: {seismo, ihnp4}!uwvax!geowhiz!geophiz!larry "If you are undertaking anything substantial, C is the only reasonable choice of programming language" - Brian W. Kerninghan
jph@whuxlm.UUCP (Holtman Jim) (12/22/85)
> Larry McVoy writes: > >I tend to agree with Dan. I think what people would like to see is a > >benchmark which measures how well Unix, running multiple users, performs > >on each machine. The benchmark would have to measure something that did > >not vary widely (such as I/O devices), as those results would only reflect > >how much one had spent on the bus & disk. So, how about this: > > > >The dryhstone benchmarks are considered good tests of the CPU (at least by > >me they are), but don't really test Unix at all (in fact some people run > >them in standalone mode). How about a version, (called forkstone?), which > >runs the dryhstone as 1, 2, 8, and 64 concurrent processes? This would > >show 1) the speed of the CPU, 2) first part of the curve, 8) a nice single > >user level, and 64) what happens when you have multiple users. > > > >It would not test I/O, which is a hard thing to test fairly. It would get > >rid of those Z80 dryhstones (flame, flame) as they're not multi tasking... > > > >I guess if there is any response and nobody wants to do it, I'll hack the > >drystones. I think it would be better if the original author did it, as > >{s}he probably can understand that bastardized {C}Ada source. > > I don't think that running multiple dhrystones would measure anything more > than the cost of doing a context switch once every <scheduling granularity>. > Except on a multiple processor machine, the time will be N*1 dhrystone + > M context switches. There are easier ways to measure the time to do a context > switch. If you want to measure multi-user response, you've GOT to open the > IO can-of-worms, since they WILL be doing IO. > > Rick Richardson, PC Research, Inc. (201) 922-1134 > ..!ihnp4!houxm!castor!{rer,pcrat!rer} <--Replies to here, not to homxb!!! > > P.S. Rheinhold Weicker is the author of Dhrystone. I apologize for > creating the bastardized {C}Ada source from his original Ada! *** REPLACE THIS LINE WITH YOUR MESSAGE *** Results for VAX 8600 running SVR2 1.2 Real 1.1 System 0.0 User
stubbs@ncr-sd.UUCP (Jan Stubbs) (01/02/86)
In article <1035@homxb.UUCP> gemini@homxb.UUCP (Rick Richardson) writes: > There are easier ways to measure the time to do a context >switch. If you want to measure multi-user response, you've GOT to open the >IO can-of-worms, since they WILL be doing IO. How about the following as a multiuser benchmark? iocall& dhrystone& iocall& dhrystone& etc..... Putting the above in a shell file and getting stop watch times on a dedicated system gives a reasonable approximation of a real system workload. If you want physical IO in there as well, add a few cc hello.c&. If you want to simulate user think time, add a sleep between programs. Vary the mix of these programs to simulate your prospective use of the machine. If you really want to get fancy, have one shell file for each simulated user and measure response time degradation as you add simulated users. IOCALL and thecc invocations would have to be modified to use unique file names or they will write on top of each other. We have done this with some success, the problem is getting any two performance people to agree on what is an appropriate mix. The AIM benchmarks from AIM Technology (Santa Clara, CA.) attempt to do this sort of thing but more comprehensively, for a price and they provide results for many machines as well. The above opinions are those of the author only. Jan Stubbs Jan Stubbs