sbw@naucse.UUCP (Steve Wampler) (10/23/88)
I'm wondering if I can impose on some of you. I would like to do some (very) minor performance checks, on as many machines as possible. (I hesitate to say 'benchmarks' because the checks really aren't rigorous.) I have four programs that I'd like to compare the performance of on various machines. (Well, actually two programs, plus timings for 'fgrep' and 'grep' as baseline information.) I did not write any of the programs, and one is a very specialized version of fgrep (used by a company to check out some of the features 'used' by fgrep). The other program is the implementation of the Boyer-Moore fast find algorithm taken from 'Software Tools' by Webb Miller (PH). Anyway, if any of you would be willing to run them and report back the timings to me, I'd be most grateful. Let me know and I'll mail you (assuming I can reach you) a 13000-byte shell archive that contains the performance package. (You will need about 300K of disk on a UNIX(tm)-based system to actually run them.) There is a makefile in the archive that will get machine information from you, compile the tests, execute them, and build a file of the results. If the programs work, all you need to do is to mail that file back. I'd like to get both RISC and CISC-style machines in my survey. The only known (to me) system requirements are: UNIX (any version?) availability of 'grep' and 'fgrep' make the 'clock()' library routine, used by one of the programs. the 'time' shell command. Anyway, thanks! Let me know if you're willing to help. -- Steve Wampler {....!arizona!naucse!sbw}
sbw@naucse.UUCP (Steve Wampler) (11/03/88)
At the request of others, here is a summary of the performance measures that I've received (so far) from other people on the net. Let me start with a comment: ANYONE who uses these as realistic benchmarks should be laughed off the net. These program test various algorithms/implementations for a very specialized test case. They might provide some insight, but there are far better performance measures out there. Also, in retrospect, the file being searched is simply too small to provide accurate measures on the more interesting machines. It would be fairly easy to modify the file creation program to produce a file 10 times larger, but I cannot see asking people to donate 2.3MB of disk for this task. There were a few people who offered to help that I am unable to reach, for various reasons (one person will apparently get the source file sometime in the next 23 days, as near as I can tell from the messages his hosts mailer daemon sends me). I would like to thank you, and apologize for not being able to contact your more personably. The results are given here in tbl-troff source form. If you want to look at them, and don't have tbl and/or troff, you might try to deduce the results by examining this file. My thanks to all the people who responded. I know some of you took a fair amount of time to get times for your machines. If I can return the favor (not likely - my time is the 3B1!) I'll see what I can do. --- snip "Results.t" --- .TL Performance Measurements .SH Introduction .LP The following table gives the raw timings for several related programs on a variety of computers. Times are reported cpu times spent in user code. .LP The four test programs are (in order of appearance in the table): .IP "fgr" 1i \f(TTfgr\fR is a special case version of \fIfgrep\fR supplied by an unnamed computer manufacturer. It prints out the time spent in the search portion of its code, as returned by the function \f(TTclock()\fR. .IP "fgrep" 1i \f(TTfgrep\fR is the \fIfgrep\fR program as found on the measured machine. It is invoked with the \f(TT-c\fR option, searching for \f(TTkataveni\fR in a data file equivalent to the one built internally by \f(TTfgr\fR. .IP "grep" 1i \f(TTgrep\fR is the \fIgrep\fR program as found on the measured machine. It is invoked with the same arguments as \f(TTfgrep\fR. .IP "ff" 1i \f(TTff\fR is the implementation of the Boyer-Moore algorithm from the book \fI"Software Tools"\fR by Webb Miller. The only modification was to add support for a \f(TT-c\fR option. It is invoked with the same arguments as \f(TTfgrep\fR. .LP In most cases, values are averaged over three or more runs, the only exceptions are with the \fIAM29000\fR where the times are derived from counting the clock ticks in the simulator. Times are only given for the configuration of hardware/operating system/compiler that proved fastest for a given machine, for example, \f(TTgcc\fR produced slightly worse code on the \fISun\fR systems than the vendor supplied compiler. The first number for \f(TTfgr\fR is the time returned by \f(TTclock()\fR, reported in seconds. The second number is the time for the entire run, as reported by time. .TS center tab(:) ; c c s c s c s c s l | n l | n l | n l | n l | . \fBMachine\fR:\fBfgr\fR:\fBfgrep\fR:\fBgrep\fR:\fBff\fR :=:=:=:=:=:=:=:= \fIAM29000\fR:(0.023):0.11::-::-::(0.02) :_:_:_:_:_:_:_:_ \fIATT 3B1\fR:(0.877):1.96::(18.84)::(2.09)::(0.78) :_:_:_:_:_:_:_:_ \fIATT 3B2/400\fR:(1.480):2.38::(7.09)::(4.12)::(0.36) :_:_:_:_:_:_:_:_ \fICray II\fR:(1.233):1.40::(1.68)::(0.31)::(0.05) :_:_:_:_:_:_:_:_ \fICray X-MP\fR:(0.162):0.27::(0.75)::(0.37)::(0.03) :_:_:_:_:_:_:_:_ \fIDEC uVAX-II\fR:(1.127):2.03::(4.80)::(3.57)::(0.40) :_:_:_:_:_:_:_:_ \fIDEC uVAX-III\fR:(0.460):0.77::(1.77)::(1.47)::(0.10) :_:_:_:_:_:_:_:_ \fIEncore Multimax\fR:(0.806):1.40::(3.90)::(1.90)::(0.20) :_:_:_:_:_:_:_:_ \fIGould PN9050\fR:(0.377):0.57::(1.33)::(1.10)::(0.07) :_:_:_:_:_:_:_:_ \fIMIPS M/1000\fR:(0.150):0.29::(0.66)::(0.30)::(0.04) :_:_:_:_:_:_:_:_ \fIMIPS M/2000\fR:(0.080):0.16::(0.40)::(0.16)::(0.04) :_:_:_:_:_:_:_:_ \fISGI 3030\fR:(0.483):0.77::(2.77)::(1.87)::(0.17) :_:_:_:_:_:_:_:_ \fISun 2/50\fR:(1.077):2.17::(7.07)::(6.63)::(0.67) :_:_:_:_:_:_:_:_ \fISun 3/60\fR:(0.377):0.67::(2.27)::(1.60)::(0.17) :_:_:_:_:_:_:_:_ \fISun 3/140\fR:(0.516):0.87::(3.13)::(2.30)::(0.23) :_:_:_:_:_:_:_:_ \fISun 3/280\fR:(0.288):0.47::(1.47)::(1.07)::(0.17) :_:_:_:_:_:_:_:_ \fISun 4/110\fR:(0.256):0.40::(1.33)::(1.00)::(0.10) :_:_:_:_:_:_:_:_ \fISun 4/260\fR:(0.178):0.27::(0.80)::(0.80)::(0.00) :_:_:_:_:_:_:_:_ .TE .LP A few comments: .IP (1) 0.5i I suspect that, on the faster machines, some of the programs execute too quickly to be accurately measured. For example, I doubt that the \fIMIPS M/1000\fR really executes \f(TTff\fR as fast as the \fIMIPS M/2000\fR does. Nor do I believe that the \fISun 4/260\fR is really instantaneous on \f(TTff\fR. The \fICRAY\fRs have more accuracy in their output from 'time'. .IP (2) 0.5i The \fIEncore Multimax\fR is a parallel machine with 8 68020s (each running at about 20MHz). However, the compiler doesn't try to parallelize code unless it is told to do so, so most of the times are closer to that of a single 68020. .IP (3) 0.5i No one should take these times as definitive. There are nuances among the machines that are not reported here. Some (\fInot all\fR) examples are; the \fICRAY-II\fR used is not the fastest \fICRAY-II\fR; the \fICRAY X-MP\fR was able to vectorise some code not vectorised by the \fICRAY-II\fR (different versions of the compiler); etc. -- Steve Wampler {....!arizona!naucse!sbw}
rik@june.cs.washington.edu (Rik Littlefield) (11/04/88)
In article <1004@naucse.UUCP>, sbw@naucse.UUCP (Steve Wampler) writes: > The \fIEncore Multimax\fR is a parallel machine with > 8 68020s (each running at about 20MHz). > However, the compiler doesn't try to parallelize code unless > it is told to do so, so most of the times are closer to that > of a single 68020. Encore actually uses National 32x32 processors. Their original machine used 32032, later models 32332. The last I checked, they had a 32532 board under development. The 32332 is roughly comparable to a 68020, so that's probably what's reported here. The comment about parallelization is correct. Encore's compilers are conventional, and their version of Unix just assigns separate processes their own processors, if possible. Hooks are provided for sharing memory between processes and for synchronization, allowing users to write their own explicitly parallel programs. Some third-party tools, e.g. Force, are available that do semi-automatic parallelization in the easy cases, such as Fortran DO-loops with all iterations independent. --Rik