[comp.arch] A request...

sbw@naucse.UUCP (Steve Wampler) (10/23/88)

I'm wondering if I can impose on some of you.  I would like to do
some (very) minor performance checks, on as many machines as possible.
(I hesitate to say 'benchmarks' because the checks really aren't
rigorous.)  I have four programs that I'd like to compare the
performance of on various machines.  (Well, actually two programs,
plus timings for 'fgrep' and 'grep' as baseline information.)
I did not write any of the programs, and one is a very specialized
version of fgrep (used by a company to check out some of the
features 'used' by fgrep).  The other program is the implementation
of the Boyer-Moore fast find algorithm taken from 'Software Tools'
by Webb Miller (PH).

Anyway, if any of you would be willing to run them and report back
the timings to me, I'd be most grateful.  Let me know and I'll mail
you (assuming I can reach you) a 13000-byte shell archive that
contains the performance package.  (You will need about 300K of
disk on a UNIX(tm)-based system to actually run them.)  There is
a makefile in the archive that will get machine information from
you, compile the tests, execute them, and build a file of the
results.  If the programs work, all you need to do is to mail
that file back.

I'd like to get both RISC and CISC-style machines in my survey.
The only known (to me) system requirements are:

	UNIX (any version?)
	   availability of 'grep' and 'fgrep'
	   make
	   the 'clock()' library routine, used by one of the programs.
	   the 'time' shell command.

Anyway, thanks!  Let me know if you're willing to help.
-- 
	Steve Wampler
	{....!arizona!naucse!sbw}

sbw@naucse.UUCP (Steve Wampler) (11/03/88)

At the request of others, here is a summary of the performance
measures that I've received (so far) from other people on the
net.  Let me start with a comment:

	ANYONE who uses these as realistic benchmarks should be
	laughed off the net.

These program test various algorithms/implementations for a
very specialized test case.  They might provide some insight,
but there are far better performance measures out there.

Also, in retrospect, the file being searched is simply too
small to provide accurate measures on the more interesting
machines.  It would be fairly easy to modify the file
creation program to produce a file 10 times larger, but I
cannot see asking people to donate 2.3MB of disk for this task.

There were a few people who offered to help that I am unable
to reach, for various reasons (one person will apparently get
the source file sometime in the next 23 days, as near as I
can tell from the messages his hosts mailer daemon sends me).
I would like to thank you, and apologize for not being able
to contact your more personably.

The results are given here in tbl-troff source form.  If you
want to look at them, and don't have tbl and/or troff, you
might try to deduce the results by examining this file.

My thanks to all the people who responded.  I know some of
you took a fair amount of time to get times for your machines.
If I can return the favor (not likely - my time is the 3B1!)
I'll see what I can do.

--- snip "Results.t" ---
.TL
Performance Measurements
.SH
Introduction
.LP
The following table gives the raw timings for several related
programs on a variety of computers.
Times are reported cpu times spent in user code.
.LP
The four test programs are (in order of appearance in the table):
.IP "fgr" 1i
\f(TTfgr\fR is a special case version of \fIfgrep\fR supplied by an
unnamed computer manufacturer.
It prints out the time spent in the search portion of its
code, as returned by the function \f(TTclock()\fR.
.IP "fgrep" 1i
\f(TTfgrep\fR is the \fIfgrep\fR program as found on the measured
machine.
It is invoked with the \f(TT-c\fR option, searching for
\f(TTkataveni\fR in a data file equivalent to the one built
internally by \f(TTfgr\fR.
.IP "grep" 1i
\f(TTgrep\fR is the \fIgrep\fR program as found on the measured
machine.
It is invoked with the same arguments as \f(TTfgrep\fR.
.IP "ff" 1i
\f(TTff\fR is the implementation of the Boyer-Moore algorithm
from the book \fI"Software Tools"\fR by Webb Miller.
The only modification was to add support for a \f(TT-c\fR option.
It is invoked with the same arguments as \f(TTfgrep\fR.
.LP
In most cases, values are averaged over three or more runs, the
only exceptions are with the \fIAM29000\fR where the times are derived
from counting the clock ticks in the simulator.
Times are only given for the configuration of hardware/operating
system/compiler that proved fastest for a given machine,
for example, \f(TTgcc\fR produced slightly worse code on the \fISun\fR
systems than the vendor supplied compiler.
The first number for \f(TTfgr\fR is the time returned by \f(TTclock()\fR,
reported in seconds.
The second number is the time for the entire run, as reported by time.
.TS
center tab(:) ;
c c s c s c s c s
l | n l | n l | n l | n l | .
\fBMachine\fR:\fBfgr\fR:\fBfgrep\fR:\fBgrep\fR:\fBff\fR
:=:=:=:=:=:=:=:=
\fIAM29000\fR:(0.023):0.11::-::-::(0.02)
:_:_:_:_:_:_:_:_
\fIATT 3B1\fR:(0.877):1.96::(18.84)::(2.09)::(0.78)
:_:_:_:_:_:_:_:_
\fIATT 3B2/400\fR:(1.480):2.38::(7.09)::(4.12)::(0.36)
:_:_:_:_:_:_:_:_
\fICray II\fR:(1.233):1.40::(1.68)::(0.31)::(0.05)
:_:_:_:_:_:_:_:_
\fICray X-MP\fR:(0.162):0.27::(0.75)::(0.37)::(0.03)
:_:_:_:_:_:_:_:_
\fIDEC uVAX-II\fR:(1.127):2.03::(4.80)::(3.57)::(0.40)
:_:_:_:_:_:_:_:_
\fIDEC uVAX-III\fR:(0.460):0.77::(1.77)::(1.47)::(0.10)
:_:_:_:_:_:_:_:_
\fIEncore Multimax\fR:(0.806):1.40::(3.90)::(1.90)::(0.20)
:_:_:_:_:_:_:_:_
\fIGould PN9050\fR:(0.377):0.57::(1.33)::(1.10)::(0.07)
:_:_:_:_:_:_:_:_
\fIMIPS M/1000\fR:(0.150):0.29::(0.66)::(0.30)::(0.04)
:_:_:_:_:_:_:_:_
\fIMIPS M/2000\fR:(0.080):0.16::(0.40)::(0.16)::(0.04)
:_:_:_:_:_:_:_:_
\fISGI 3030\fR:(0.483):0.77::(2.77)::(1.87)::(0.17)
:_:_:_:_:_:_:_:_
\fISun 2/50\fR:(1.077):2.17::(7.07)::(6.63)::(0.67)
:_:_:_:_:_:_:_:_
\fISun 3/60\fR:(0.377):0.67::(2.27)::(1.60)::(0.17)
:_:_:_:_:_:_:_:_
\fISun 3/140\fR:(0.516):0.87::(3.13)::(2.30)::(0.23)
:_:_:_:_:_:_:_:_
\fISun 3/280\fR:(0.288):0.47::(1.47)::(1.07)::(0.17)
:_:_:_:_:_:_:_:_
\fISun 4/110\fR:(0.256):0.40::(1.33)::(1.00)::(0.10)
:_:_:_:_:_:_:_:_
\fISun 4/260\fR:(0.178):0.27::(0.80)::(0.80)::(0.00)
:_:_:_:_:_:_:_:_
.TE
.LP
A few comments:
.IP (1) 0.5i
I suspect that, on the faster machines, some of the
programs execute too quickly to be accurately measured.
For example, I doubt that the \fIMIPS M/1000\fR really
executes \f(TTff\fR as fast as the \fIMIPS M/2000\fR does.
Nor do I believe that the \fISun 4/260\fR is really instantaneous on
\f(TTff\fR.
The \fICRAY\fRs have more accuracy in their output from 'time'.
.IP (2) 0.5i
The \fIEncore Multimax\fR is a parallel machine with
8 68020s (each running at about 20MHz).
However, the compiler doesn't try to parallelize code unless
it is told to do so, so most of the times are closer to that
of a single 68020.
.IP (3) 0.5i
No one should take these times as definitive.
There are nuances among the machines that are not reported here.
Some (\fInot all\fR) examples are; the \fICRAY-II\fR used is not the fastest
\fICRAY-II\fR; the \fICRAY X-MP\fR was able to vectorise some code not
vectorised by the \fICRAY-II\fR (different versions of the compiler); etc.
-- 
	Steve Wampler
	{....!arizona!naucse!sbw}

rik@june.cs.washington.edu (Rik Littlefield) (11/04/88)

In article <1004@naucse.UUCP>, sbw@naucse.UUCP (Steve Wampler) writes:
> The \fIEncore Multimax\fR is a parallel machine with
> 8 68020s (each running at about 20MHz).
> However, the compiler doesn't try to parallelize code unless
> it is told to do so, so most of the times are closer to that
> of a single 68020.

Encore actually uses National 32x32 processors.  Their original machine used
32032, later models 32332.  The last I checked, they had a 32532 board under
development.  The 32332 is roughly comparable to a 68020, so that's probably
what's reported here.

The comment about parallelization is correct.  Encore's compilers are
conventional, and their version of Unix just assigns separate processes
their own processors, if possible.  Hooks are provided for sharing
memory between processes and for synchronization, allowing users to
write their own explicitly parallel programs.  Some third-party tools,
e.g. Force, are available that do semi-automatic parallelization in the
easy cases, such as Fortran DO-loops with all iterations independent.

--Rik