berger@datacube.UUCP (11/03/86)
How about a sun 3/260? Its rated at 4 MIPS (68020 running 25 MHz) Bob Berger Datacube Inc. 4 Dearborn Rd. Peabody, Ma 01960 617-535-6644 ihnp4!datacube!berger {seismo,cbosgd,cuae2,mit-eddie}!mirror!datacube!berger
dave@onfcanim.UUCP (Dave Martindale) (11/11/86)
>How about a sun 3/260? Its rated at 4 MIPS (68020 running 25 MHz)
4 MIPS running what kind of instructions?
If you take a VAX 11/780 as being a "1 MIPS" machine, I haven't seen
any benchmark that rates anyone's 16MHz 68020 as 2.5 times a VAX 780.
Running the "Dhrystone" non-floating-point benchmark, a Sun 3/160 is
just about 2.2 times an 11/780. This would indicate that a 25MHz
68020 should be at most 3.3 MIPS doing integer stuff.
My experience with the Weitek 1164/1165 floating point chips, which I
believe is what is used by Sun's fast floating point board, suggest
that they are slightly slower than the 11/780's FPA in both single and
double precision. And the 780 is less than 1 MIPS when it comes to
floating point. Also, the Weitek chips force some things to be done
in software that the VAX FPA does in hardware: integer/float conversions,
short/long floating conversions. The Weitek's greater error (results
are not always exact even when the true result is representable exactly;
e.g. 500.0/10.0 != 5.0) also means extra work for software. So I'd
be surprised to find any such machine beating a 780 in real applications,
unless it was using single precision on the 68020 and double precision
on the VAX.
Can anyone post *real*, meaningful numbers for the 3/260?
yours for meaningful MIPS,
Dave Martindale
hansen@mips.UUCP (Craig Hansen) (11/13/86)
> >How about a sun 3/260? Its rated at 4 MIPS (68020 running 25 MHz) > > 4 MIPS running what kind of instructions? > > If you take a VAX 11/780 as being a "1 MIPS" machine, I haven't seen > any benchmark that rates anyone's 16MHz 68020 as 2.5 times a VAX 780. > Running the "Dhrystone" non-floating-point benchmark, a Sun 3/160 is > just about 2.2 times an 11/780. This would indicate that a 25MHz > 68020 should be at most 3.3 MIPS doing integer stuff. > Dhrystone? Is this an appropriate benchmark for a super computer? Seriously, though, you should recognize that there's more in a computer than the processor chip that affects performance. The 3/260 uses a cache memory, where cache hits can be processed at the speed of the 68020, while the 3/160 goes to DRAM memory for each reference. Thus, presuming a high enough cache hit rate, it is possible to increase the performance faster than the ratio of the processor clock rates. Of course, cache hit rates vary with program and data size and locality. > My experience with the Weitek 1164/1165 floating point chips, which I > believe is what is used by Sun's fast floating point board, suggest > that they are slightly slower than the 11/780's FPA in both single and > double precision. And the 780 is less than 1 MIPS when it comes to > floating point. Also, the Weitek chips force some things to be done > in software that the VAX FPA does in hardware: integer/float conversions, > short/long floating conversions. The Weitek's greater error (results > are not always exact even when the true result is representable exactly; > e.g. 500.0/10.0 != 5.0) also means extra work for software. So I'd > be surprised to find any such machine beating a 780 in real applications, > unless it was using single precision on the 68020 and double precision > on the VAX. This information is totally wrong. The writer of the above may be confused with the earlier (and inferior) Sky FPA board. First of all, the Weitek 1164/1165 perform add, subtract, and multiply faster than the 11/780's FPA when run at 16 MHz; the operations are faster than 1 usec by a healthy margin. The chip set directly performs integer/float conversions and short/long (or single/double) conversions. The operations are implemented in accordance with the IEEE standard, including support for IEEE directed rounding modes; so 500.0/10.0 == 5.0 exactly. > Can anyone post *real*, meaningful numbers for the 3/260? > > yours for meaningful MIPS, > Dave Martindale Can anyone clarify whether the 3/260 can use a Weitek-based FPA, or does one have to go back to a 68881? -- Craig Hansen | "Evahthun' tastes MIPS Computer Systems | bettah when it ...decwrl!mips!hansen | sits on a RISC"
dave@onfcanim.UUCP (Dave Martindale) (11/14/86)
In article <765@mips.UUCP> hansen@mips.UUCP (Craig Hansen) writes: > >> My experience with the Weitek 1164/1165 floating point chips, which I >> believe is what is used by Sun's fast floating point board, suggest >> that they are slightly slower than the 11/780's FPA in both single and >> double precision. [ more stuff deleted ] > > This information is totally wrong. The writer of the above > may be confused with the earlier (and inferior) Sky FPA board. > First of all, the Weitek 1164/1165 perform add, subtract, and multiply > faster than the 11/780's FPA when run at 16 MHz; the operations > are faster than 1 usec by a healthy margin. The chip > set directly performs integer/float conversions and short/long > (or single/double) conversions. The operations are implemented > in accordance with the IEEE standard, including support for > IEEE directed rounding modes; so 500.0/10.0 == 5.0 exactly. I'm not confused by a Sky board. I have had some experience with the named Weitek chips as used in a Silicon Graphics IRIS 2400T (16MHz 68020), not a Sun. So I'm talking about the same hardware, but not the same support software. I did some very simple benchmarking when the new IRIS boards arrived, and found that running real code, the IRIS was just slightly slower than the 780. I did get the impression that SGI was using software for some functions (float/int conversion, for example) which could have had much to do with slowing down performance. The IRIS FPA was out over a year ago, before Sun I believe, so the software may have been put together in somewhat of a hurry, and may have improved since then - I haven't had time to check. The 500/10 problem was real enough - it caused printf to print out 500 as "4:0". SGI fixed it by using software instead of hardware for that division, so I just assumed that the hardware wasn't capable of doing it right. Anyway, I was just reporting what I'd experienced; it seems that it may not be true anymore, or may not apply to the Sun. Still, my original question stands: How is "4 Mips" measured? And what is floating point performance really like? Dave Martindale
rick@seismo.CSS.GOV (Rick Adams) (11/17/86)
Following are some real timings from a real program. This is not a true "benchmark". The program is a real seismic analysis job that we use a lot. It is heavily cpu intensive and heavily floating point. I consider it a good measure of the floating point capacity of a processor. It does a very good job of measuring the kind of things that WE do, which is what a good benchmark should. Of course, for a different person, this may not be a good indicator of anything. For example, it doesn't show how badly things degrade when more than one process is running. Times are measured with /bin/time. If you want to consider a Vax 11/780 with an FPA as 1 MIP, (not unreasonable really) the first column can be thought of as MIPS. The Sun 3/260 with FPA comes in at 3.4 MIPS, which is not a lot less than the claimed 4. The low numbers are provided mainly for amusement. The range of machines represents what I had access to, nothing else. (I'd be happy to run this on a Cray or Amdahl to get a nice top end figure if anyone has some spare cycles...) The Convex time can probably be sped up by changing the code to vectorize better, but the point was to run this unmodified on all the systems. ---rick x 780 Total User Sys 10.297 479.1 469.3 6.7 Culler 7/10 * 6.193 796.5 534.8 210.6 Culler 7/10 4.586 1,075.6 1,056.3 19.3 Gould 9080 w/mult. accel. @ utah 4.432 1,113.1 1,064.2 48.9 Convex C-1 4.137 1,192.3 1,164.1 28.2 Vax 8600 w/fpa 4.3bsd @ utah 3.847 1,282.2 1,257.4 24.8 CCI Power 6/32 w/fpa 4.3bsd 3.616 1,364.2 1,339.7 24.5 CCI Power 6/32 w/fpa 4.2bsd 3.435 1,436.0 1,410.8 25.2 Sun 3/260 w/fpa -ffpa sun3.1 + 3.347 1,474.1 1,450.6 23.5 Celerity 1260D 3.2.50/betacc 3.315 1,488.0 1,450.8 37.2 Gould 9080 2.937 1,679.2 1,635.5 43.7 Celerity 1230 3.1? 2.725 1,810.4 1,751.5 58.9 Celerity 1260D 3.2.47 2.404 2,052.3 2,017.3 35.0 Sun 3/160 w/fpa sun3.2 2.057 2,398.1 2,342.5 55.6 Celerity 1200 3.1? 1.398 3,529.5 3,486.0 43.5 Gould 6000 1.290 3,823.2 3,785.5 37.7 Sun 3/260 w/fpa -f68881 sun3.1 + 1.159 4,257.3 4,253.2 4.1 Vax 11/780 w/fpa 4.3bsd * 1.118 4,413.3 4,462.6 50.7 Vax 11/780 w/fpa 4.3bsd 1.079 4,572.0 4,382.3 189.7 Sun 3/160 w/68881 sun3.0 1.000 4,933.1 4,867.3 65.8 Vax 11/780 w/fpa 4.2bsd 0.980 5,036.0 4,973.1 62.9 Sun 3/50 w/68881 sun3.0 0.859 5,745.8 5,711.8 34.0 Sun 3/160 sun3.0PILOT 0.765 6,448.1 6,327.2 120.9 Vax 11/750 w/fpa 4.3bsd @ utah 0.289 17,091.2 17,005.0 86.2 Pyramid 90x @ rutgers 0.230 21,413.1 21,332.9 80.2 Sun 3/260 w/fpa -fsoft sun3.1 + 0.197 25,031.0 24,470.9 560.1 Vax 11/750 @ verdix 0.146 33,679.9 33,424.3 225.6 Sun 2/120 -fsky sun1.4 0.141 34,867.8 34,822.2 45.6 Sun 3/160 -fsoft sun3.0 0.114 43,134.0 42,971.2 162.8 Sun 3/50 -fsoft sun3.0 0.046 107,399.3 105,875.5 1523.8 Sun 2/120 sun1.4 * Changing program i/o to use 8K byte buffers + Sun 3.1 user programs 3.2 kernel The Suns can be compiled 3 different ways. -ffpa With Floating Point Accelerator -f68881 With M68881 hardware floating point -fsoft Software floating point
bob@uhmanoa.UUCP (Bob Cunningham) (11/18/86)
A portion of the results of Jack Dongarra's work at Argonne National Laboratory comparing the performance of different computer systems solving. These particular results have to do with solving dense systems of linear equations using the LINPACK FORTRAN software of order 100 (notably the SGEFA and SGESL column-oriented algorithms, based upon the BLAS series of subprograms). This is an application-specific set of benchmarks resulting in many cases MFLOP ratings well below the theoretical performance of each machine (i.e., the machines might well perform differently with somewhat different problems, even similar floating-point-intensive problems). ``Rolled BLAS'' means BLAS routines not unrolled for optimal performance. [of the machines listed below, I'd be tempted to call the FPS-264 and Alliant FX-1 "personal supercomputers" in terms of their size, power requirements, etc.] Computer OS/Compiler MFLOPs -------- ----------- ------ CRAY X-MP/4 not tested CRAY X-MP/2 CFT 1.13 (rolled BLAS) 24 CRAY-2 (1 processor) CFT 2.70 (rolled BLAS) 15 CRAY-1S CFT (rolled BLAS) 12 Alliant FX/8 (8 CEs) FX FORTRAN v2.0.19 (rolled BLAS) 7.6 SCS-40 CFT 1.13 (rolled BLAS) 7.3 FPS-264 F02 APFTN64 OPT=4(rolled BLAS) 5.6 Convex C-1 FORTRAN 1.6 (rolled BLAS) 2.9 IBM 3081K (1 processor) H enhanced opt=3 2.1 Alliant FX/1 (1 CE) FX FORTRAN v2.0.19(rolled BLAS) 1.6 DEC VAX8800 VMS V4.3 .99 DEC VAX8650 VMS V4.1 .70 DEC VAX8500 VMS V4 .65 DEC VAX8600 VMS v4.1, FORTRAN 4.2 .49 Harris HCX-7 w/fpp f77 1.0 .48 Sun-3/160M+FPA f77 -O -ffpa 3.1 .40 Harris H800 SAUF77 .23 IBM 370/158 H opt=3 .23 Celerity C1200 4.2Bsd Unix f77 .21 DEC VAX11/785 FPA VMS V4.1 .20 DEC VAX8200 VMS V4.3 .15 DEC VAX11/780 FPA VMS V4.1 .14 DEC microVAX II VMS V4.1 .13 DEC VAX11/750 FPA VMS V4.1 .12 Sun-3/75 w/68881 f77 -O -f68881 3.0 .079 Apollo DN460/660 AEGIS 8.0 FTN .069 HP 9000 Series 320 HP-UX f77 5.15 .063 Apollo DN3000 AEGIS 8.0 FTN .062 Masscomp MC500 FPP 3.1 FORTRAN .061 IBM RT PC Model 20 f77 .036 Apollo DN320 AEGIS 8.0 FTN .028 Apollo DN550 FPA AEGIS 8.0 FTN .025 IBM AT 80287 PROFORT 1.0 .012 IBM PC 8087 PROFORT 1.0 .012 IBM AT 80287 Microsoft FORTRAN 3.2 .0091 Apple Macintosh ABSOFT 2.0b .0038 -- Bob Cunningham bob@hig.hawaii.edu