[comp.benchmarks] more bc babble

xxdon@monet.lerc.nasa.gov (Don Sosoka) (12/12/90)

In regards to the the comment that the bc test is *bad* for multi-user
machines, it was realized that load played a part in the timings.  It is for
that reason that the originol posting gave both REAL and USER timings.  In
general the REAL times were considerable greater than the USER times for
multi-user machines while for single-user machines the two numbers were usually
close.  Here are the numbers again:

Vendor       Model          User      Real
SGI          4D25           8.1        8.4
SGI          320 (2 cpu)    5.2        5.2
SGI          340 (4 cpu)    5.2        5.2
SGI          3030          37.8       39.4
CRAY         XMP4/8         7.8       20.7
CRAY         YMP4/64        5.9       13.9
CONVEX       C220           9.4       10.4
IBM          RS6000/530     3.5        3.5 
AMDAHL       5870 (UTS)     4.3       17.6


As for the CONVEX numbers, our CONVEX is relatively new and not currently      
heavily used.  I just repeated it again with 7 users on total and got the
following:   

          9.9 real             9.4 user
          9.7 real             9.4 user
          9.7 real             9.4 user

Again, no comments were made on what all this means (if anything), results were
simply reported.

oles@kelvin.uio.no (Ole Swang) (12/14/90)

Another easy-to-memorize benchmark is the computation of the sum
of the first 10 million terms in the harmonic series.
This is a FORTRAN version, it should not be too hard to translate
even without f2c :-)

      PROGRAM RR
      DOUBLE PRECISION R
      R=0.0
        DO 10 I=1,10000000
        R=R+1/DBLE(I)
10      CONTINUE
      WRITE(*,*)R,I
      END

This one is obviously testing floating-point perfomance only. The
emphasis on divisions might give biased results. It vectorizes
fully on the vectorizing compilers I've tested it on (Cray and Convex).
It has the advantage over the bc benchmark that it's the same code
every time.

Some results (in seconds):

Cray X/MP 216               0.29  *
Convex C 120                8.7
DECstation 5000/200        10.5
DECsystem 5400             13.1
VAX 6330/VMS5.3 FPA        41.9
VAX 8650/VMS5.3 FPA        55.3
VAX 8600/VMS5.2 FPA        77.1
Sun 3/60 (m68881)         105.6


* The code was modified to single presicion for the Cray, as this
yields the wanted 64-bits accuracy.

Comments and suggestions are encouraged.

-------------------------------------------------------------------------
Ole Swang,  assistant professor, Dept. of Chemistry, U. of Oslo, Norway
-------------------------------------------------------------------------
--
-----------------------------------------------------------------------
Ole Swang   oles@kelvin.uio.no  
-----------------------------------------------------------------------

rosenkra@convex.com (William Rosencranz) (12/16/90)

In article <OLES.90Dec13213301@kelvin.uio.no> oles@kelvin.uio.no (Ole Swang) writes:
>
>Another easy-to-memorize benchmark 
> [...]
>
>      PROGRAM RR
>      DOUBLE PRECISION R
       ^^^^^^^^^^^^^^^^^^

why not REAL*8 R? then it probably need not be modified for any system.
this probably assumes ANSI Fortran 77, though.

for ANSI C, you would probably have to specify double R (pcc would promote
float to double in many if not most systems but ANSI C would not).

-bill
rosenkra@convex.com

--
Bill Rosenkranz            |UUCP: {uunet,texsun}!convex!c1yankee!rosenkra
Convex Computer Corp.      |ARPA: rosenkra%c1yankee@convex.com

mccalpin@perelandra.cms.udel.edu (John D. McCalpin) (12/16/90)

>>>>> On 13 Dec 90 20:33:01 GMT, oles@kelvin.uio.no (Ole Swang) said:

Ole> Another easy-to-memorize benchmark is the computation of the sum
Ole> of the first 10 million terms in the harmonic series.
	[... code deleted ...]
Ole> This one is obviously testing floating-point perfomance only. The
Ole> emphasis on divisions might give biased results. It vectorizes
Ole> fully on the vectorizing compilers I've tested it on (Cray and Convex).
Ole> It has the advantage over the bc benchmark that it's the same code
Ole> every time.

Unfortunately, unless one's applications really spend all of their
time doing divides, this benchmark is going to have fairly limited
predictive capability.  The timing for the divide instruction is
rather variable between machines in ways that are not obviously
related to the timings for the add/subtract and multiply instructions.


Off the top of my head, here are some examples.  These are asymptotic
peak rates for vector operations in cycles per result for the
operation:
		a(i) = b(i)/c(i)

Machine		Divide cycles	Multiply cycles		ratio
----------------------------------------------------------------
Cray X/MP	   3N			N		  3
Cray 2		   4N			N		  4
ETA-10/Cyber 205   6N			N		  6
IBM 3090/VF	  13N		       3N ?		 (4)
IBM RS/6000	  20N		       3N		  7 *
----------------------------------------------------------------

I don't recall any other numbers right now, and I certainly won't
guarantee that the above numbers are precisely correct, but it does
give you some idea of the trouble.

? It is too early in the morning for me to remember the details of
what is overlappable on the 3090/VF.  Here I assume that the multiply
can be overlapped with one of the loads.  Since there is only one
load-store unit, that leaves to more cycles for the other load and the
store.

* Note that the RS/6000 would only require 2N cycles for the equivalent
multiplies except for the need to store a(i), which cannot be
overlapped with either of the loads or the multiply.

It would be especially interesting to add Intel i860 numbers to that
table, since the i860 does not have full FP divide hardware and must
iterate to get an IEEE-compliant result.
--
John D. McCalpin			mccalpin@perelandra.cms.udel.edu
Assistant Professor			mccalpin@brahms.udel.edu
College of Marine Studies, U. Del.	J.MCCALPIN/OMNET

khb@chiba.Eng.Sun.COM (Keith Bierman fpgroup) (12/18/90)

There are systems which still don't know about the "ibmism" real*8
means DOUBLE PRECISION.
--
----------------------------------------------------------------
Keith H. Bierman    kbierman@Eng.Sun.COM | khb@chiba.Eng.Sun.COM
SMI 2550 Garcia 12-33			 | (415 336 2648)   
    Mountain View, CA 94043

eugene@eos.arc.nasa.gov (Eugene Miya) (12/19/90)

In article <KHB.90Dec17205404@chiba.Eng.Sun.COM> khb@chiba.Eng.Sun.COM
(Keith Bierman fpgroup) writes:
>There are systems which still don't know about the "ibmism" real*8
>means DOUBLE PRECISION.

Still benchmarking and I don't have a lot of time. BUT..
      REAL*8
works on the Univac/Unisys EXEC*1100 ASCII Fortran compiler, even tho..
'8' has absolutely no meaning on this 36-bit word oriented machine.

Then, you can always tell the REAL perspective of a user when as you them
what single-precision means.  Real(tm) numeric types insist on 64-bit 8^).

Added note on the series benchmark.  I see you use DINT().  Turns out
the CRI CFT[77] lines of compilers fails Fortran compiler validation suite
test T801 (there abouts), subtest (some number) which is the DINT() conversion.
Interesting selection.  Back to the remote login (this one via phone,
have your ever thought about benchmarking phone systems........).

--e.n. miya, NASA Ames Research Center, eugene@eos.arc.nasa.gov
  {uunet,mailrus,most gateways}!ames!eugene
  AMERICA: CHANGE IT OR LOSE IT.