xxdon@monet.lerc.nasa.gov (Don Sosoka) (12/12/90)
In regards to the the comment that the bc test is *bad* for multi-user machines, it was realized that load played a part in the timings. It is for that reason that the originol posting gave both REAL and USER timings. In general the REAL times were considerable greater than the USER times for multi-user machines while for single-user machines the two numbers were usually close. Here are the numbers again: Vendor Model User Real SGI 4D25 8.1 8.4 SGI 320 (2 cpu) 5.2 5.2 SGI 340 (4 cpu) 5.2 5.2 SGI 3030 37.8 39.4 CRAY XMP4/8 7.8 20.7 CRAY YMP4/64 5.9 13.9 CONVEX C220 9.4 10.4 IBM RS6000/530 3.5 3.5 AMDAHL 5870 (UTS) 4.3 17.6 As for the CONVEX numbers, our CONVEX is relatively new and not currently heavily used. I just repeated it again with 7 users on total and got the following: 9.9 real 9.4 user 9.7 real 9.4 user 9.7 real 9.4 user Again, no comments were made on what all this means (if anything), results were simply reported.
oles@kelvin.uio.no (Ole Swang) (12/14/90)
Another easy-to-memorize benchmark is the computation of the sum of the first 10 million terms in the harmonic series. This is a FORTRAN version, it should not be too hard to translate even without f2c :-) PROGRAM RR DOUBLE PRECISION R R=0.0 DO 10 I=1,10000000 R=R+1/DBLE(I) 10 CONTINUE WRITE(*,*)R,I END This one is obviously testing floating-point perfomance only. The emphasis on divisions might give biased results. It vectorizes fully on the vectorizing compilers I've tested it on (Cray and Convex). It has the advantage over the bc benchmark that it's the same code every time. Some results (in seconds): Cray X/MP 216 0.29 * Convex C 120 8.7 DECstation 5000/200 10.5 DECsystem 5400 13.1 VAX 6330/VMS5.3 FPA 41.9 VAX 8650/VMS5.3 FPA 55.3 VAX 8600/VMS5.2 FPA 77.1 Sun 3/60 (m68881) 105.6 * The code was modified to single presicion for the Cray, as this yields the wanted 64-bits accuracy. Comments and suggestions are encouraged. ------------------------------------------------------------------------- Ole Swang, assistant professor, Dept. of Chemistry, U. of Oslo, Norway ------------------------------------------------------------------------- -- ----------------------------------------------------------------------- Ole Swang oles@kelvin.uio.no -----------------------------------------------------------------------
rosenkra@convex.com (William Rosencranz) (12/16/90)
In article <OLES.90Dec13213301@kelvin.uio.no> oles@kelvin.uio.no (Ole Swang) writes: > >Another easy-to-memorize benchmark > [...] > > PROGRAM RR > DOUBLE PRECISION R ^^^^^^^^^^^^^^^^^^ why not REAL*8 R? then it probably need not be modified for any system. this probably assumes ANSI Fortran 77, though. for ANSI C, you would probably have to specify double R (pcc would promote float to double in many if not most systems but ANSI C would not). -bill rosenkra@convex.com -- Bill Rosenkranz |UUCP: {uunet,texsun}!convex!c1yankee!rosenkra Convex Computer Corp. |ARPA: rosenkra%c1yankee@convex.com
mccalpin@perelandra.cms.udel.edu (John D. McCalpin) (12/16/90)
>>>>> On 13 Dec 90 20:33:01 GMT, oles@kelvin.uio.no (Ole Swang) said:
Ole> Another easy-to-memorize benchmark is the computation of the sum
Ole> of the first 10 million terms in the harmonic series.
[... code deleted ...]
Ole> This one is obviously testing floating-point perfomance only. The
Ole> emphasis on divisions might give biased results. It vectorizes
Ole> fully on the vectorizing compilers I've tested it on (Cray and Convex).
Ole> It has the advantage over the bc benchmark that it's the same code
Ole> every time.
Unfortunately, unless one's applications really spend all of their
time doing divides, this benchmark is going to have fairly limited
predictive capability. The timing for the divide instruction is
rather variable between machines in ways that are not obviously
related to the timings for the add/subtract and multiply instructions.
Off the top of my head, here are some examples. These are asymptotic
peak rates for vector operations in cycles per result for the
operation:
a(i) = b(i)/c(i)
Machine Divide cycles Multiply cycles ratio
----------------------------------------------------------------
Cray X/MP 3N N 3
Cray 2 4N N 4
ETA-10/Cyber 205 6N N 6
IBM 3090/VF 13N 3N ? (4)
IBM RS/6000 20N 3N 7 *
----------------------------------------------------------------
I don't recall any other numbers right now, and I certainly won't
guarantee that the above numbers are precisely correct, but it does
give you some idea of the trouble.
? It is too early in the morning for me to remember the details of
what is overlappable on the 3090/VF. Here I assume that the multiply
can be overlapped with one of the loads. Since there is only one
load-store unit, that leaves to more cycles for the other load and the
store.
* Note that the RS/6000 would only require 2N cycles for the equivalent
multiplies except for the need to store a(i), which cannot be
overlapped with either of the loads or the multiply.
It would be especially interesting to add Intel i860 numbers to that
table, since the i860 does not have full FP divide hardware and must
iterate to get an IEEE-compliant result.
--
John D. McCalpin mccalpin@perelandra.cms.udel.edu
Assistant Professor mccalpin@brahms.udel.edu
College of Marine Studies, U. Del. J.MCCALPIN/OMNET
khb@chiba.Eng.Sun.COM (Keith Bierman fpgroup) (12/18/90)
There are systems which still don't know about the "ibmism" real*8 means DOUBLE PRECISION. -- ---------------------------------------------------------------- Keith H. Bierman kbierman@Eng.Sun.COM | khb@chiba.Eng.Sun.COM SMI 2550 Garcia 12-33 | (415 336 2648) Mountain View, CA 94043
eugene@eos.arc.nasa.gov (Eugene Miya) (12/19/90)
In article <KHB.90Dec17205404@chiba.Eng.Sun.COM> khb@chiba.Eng.Sun.COM (Keith Bierman fpgroup) writes: >There are systems which still don't know about the "ibmism" real*8 >means DOUBLE PRECISION. Still benchmarking and I don't have a lot of time. BUT.. REAL*8 works on the Univac/Unisys EXEC*1100 ASCII Fortran compiler, even tho.. '8' has absolutely no meaning on this 36-bit word oriented machine. Then, you can always tell the REAL perspective of a user when as you them what single-precision means. Real(tm) numeric types insist on 64-bit 8^). Added note on the series benchmark. I see you use DINT(). Turns out the CRI CFT[77] lines of compilers fails Fortran compiler validation suite test T801 (there abouts), subtest (some number) which is the DINT() conversion. Interesting selection. Back to the remote login (this one via phone, have your ever thought about benchmarking phone systems........). --e.n. miya, NASA Ames Research Center, eugene@eos.arc.nasa.gov {uunet,mailrus,most gateways}!ames!eugene AMERICA: CHANGE IT OR LOSE IT.