[comp.unix.questions] desktop super computers

berger@datacube.UUCP (11/03/86)

How about a sun 3/260? Its rated at 4 MIPS (68020 running 25 MHz)
			Bob Berger 

Datacube Inc. 4 Dearborn Rd. Peabody, Ma 01960 	617-535-6644
	
ihnp4!datacube!berger
{seismo,cbosgd,cuae2,mit-eddie}!mirror!datacube!berger

dave@onfcanim.UUCP (Dave Martindale) (11/11/86)

>How about a sun 3/260? Its rated at 4 MIPS (68020 running 25 MHz)

4 MIPS running what kind of instructions?

If you take a VAX 11/780 as being a "1 MIPS" machine, I haven't seen
any benchmark that rates anyone's 16MHz 68020 as 2.5 times a VAX 780.
Running the "Dhrystone" non-floating-point benchmark, a Sun 3/160 is
just about 2.2 times an 11/780.  This would indicate that a 25MHz
68020 should be at most 3.3 MIPS doing integer stuff.

My experience with the Weitek 1164/1165 floating point chips, which I
believe is what is used by Sun's fast floating point board, suggest
that they are slightly slower than the 11/780's FPA in both single and
double precision.  And the 780 is less than 1 MIPS when it comes to
floating point.  Also, the Weitek chips force some things to be done
in software that the VAX FPA does in hardware: integer/float conversions,
short/long floating conversions.  The Weitek's greater error (results
are not always exact even when the true result is representable exactly;
e.g. 500.0/10.0 != 5.0) also means extra work for software.  So I'd
be surprised to find any such machine beating a 780 in real applications,
unless it was using single precision on the 68020 and double precision
on the VAX.

Can anyone post *real*, meaningful numbers for the 3/260?

	yours for meaningful MIPS,
	Dave Martindale

hansen@mips.UUCP (Craig Hansen) (11/13/86)

> >How about a sun 3/260? Its rated at 4 MIPS (68020 running 25 MHz)
> 
> 4 MIPS running what kind of instructions?
> 
> If you take a VAX 11/780 as being a "1 MIPS" machine, I haven't seen
> any benchmark that rates anyone's 16MHz 68020 as 2.5 times a VAX 780.
> Running the "Dhrystone" non-floating-point benchmark, a Sun 3/160 is
> just about 2.2 times an 11/780.  This would indicate that a 25MHz
> 68020 should be at most 3.3 MIPS doing integer stuff.
> 
	Dhrystone? Is this an appropriate benchmark for a super computer?
	Seriously, though, you should recognize that there's more
	in a computer than the processor chip that affects performance.
	The 3/260 uses a cache memory, where
	cache hits can be processed at the speed of the 68020,
	while the 3/160 goes to DRAM memory for each reference.
	Thus, presuming a high enough cache hit rate, it is
	possible to increase the performance faster than the ratio
	of the processor clock rates. Of course, cache hit
	rates vary with program and data size and locality.

> My experience with the Weitek 1164/1165 floating point chips, which I
> believe is what is used by Sun's fast floating point board, suggest
> that they are slightly slower than the 11/780's FPA in both single and
> double precision.  And the 780 is less than 1 MIPS when it comes to
> floating point.  Also, the Weitek chips force some things to be done
> in software that the VAX FPA does in hardware: integer/float conversions,
> short/long floating conversions.  The Weitek's greater error (results
> are not always exact even when the true result is representable exactly;
> e.g. 500.0/10.0 != 5.0) also means extra work for software.  So I'd
> be surprised to find any such machine beating a 780 in real applications,
> unless it was using single precision on the 68020 and double precision
> on the VAX.

	This information is totally wrong.  The writer of the above
	may be confused with the earlier (and inferior) Sky FPA board.
	First of all, the Weitek 1164/1165 perform add, subtract, and multiply
	faster than the 11/780's FPA when run at 16 MHz; the operations
	are faster than 1 usec by a healthy margin. The chip
	set directly performs integer/float conversions and short/long
	(or single/double) conversions. The operations are implemented
	in accordance with the IEEE standard, including support for
	IEEE directed rounding modes; so 500.0/10.0 == 5.0 exactly.

> Can anyone post *real*, meaningful numbers for the 3/260?
> 
> 	yours for meaningful MIPS,
> 	Dave Martindale

	Can anyone clarify whether the 3/260 can use a Weitek-based FPA,
	or does one have to go back to a 68881?

-- 

Craig Hansen			|	 "Evahthun' tastes
MIPS Computer Systems		|	 bettah when it
...decwrl!mips!hansen		|	 sits on a RISC"

dave@onfcanim.UUCP (Dave Martindale) (11/14/86)

In article <765@mips.UUCP> hansen@mips.UUCP (Craig Hansen) writes:
>
>> My experience with the Weitek 1164/1165 floating point chips, which I
>> believe is what is used by Sun's fast floating point board, suggest
>> that they are slightly slower than the 11/780's FPA in both single and
>> double precision.  [ more stuff deleted ]
>
>	This information is totally wrong.  The writer of the above
>	may be confused with the earlier (and inferior) Sky FPA board.
>	First of all, the Weitek 1164/1165 perform add, subtract, and multiply
>	faster than the 11/780's FPA when run at 16 MHz; the operations
>	are faster than 1 usec by a healthy margin. The chip
>	set directly performs integer/float conversions and short/long
>	(or single/double) conversions. The operations are implemented
>	in accordance with the IEEE standard, including support for
>	IEEE directed rounding modes; so 500.0/10.0 == 5.0 exactly.

I'm not confused by a Sky board.  I have had some experience with the
named Weitek chips as used in a Silicon Graphics IRIS 2400T (16MHz 68020),
not a Sun.  So I'm talking about the same hardware, but not the same
support software.

I did some very simple benchmarking when the new IRIS boards arrived,
and found that running real code, the IRIS was just slightly slower
than the 780.  I did get the impression that SGI was using software
for some functions (float/int conversion, for example) which could
have had much to do with slowing down performance.  The IRIS FPA was
out over a year ago, before Sun I believe, so the software may have
been put together in somewhat of a hurry, and may have improved since
then - I haven't had time to check.

The 500/10 problem was real enough - it caused printf to print out 500
as "4:0".  SGI fixed it by using software instead of hardware for that
division, so I just assumed that the hardware wasn't capable of doing
it right.

Anyway, I was just reporting what I'd experienced; it seems that it may
not be true anymore, or may not apply to the Sun.

Still, my original question stands:  How is "4 Mips" measured?  And what
is floating point performance really like?

	Dave Martindale

rick@seismo.CSS.GOV (Rick Adams) (11/17/86)

Following are some real timings from a real program. This is not a
true "benchmark". The program is a real seismic analysis job that we use a lot.
It is heavily cpu intensive and heavily floating point.

I consider it a good measure of the floating point capacity of a processor.
It does a very good job of measuring the kind of things that WE do, which is
what a good benchmark should. Of course, for a different person, this may
not be a good indicator of anything. For example, it doesn't show
how badly things degrade when more than one process is running.

Times are measured with /bin/time. If you want to consider a Vax 11/780
with an FPA as 1 MIP, (not unreasonable really) the first column can be thought
of as MIPS. The Sun 3/260 with FPA comes in at 3.4 MIPS, which is not a lot
less than the claimed 4.

The low numbers are provided mainly for amusement. The range of machines
represents what I had access to, nothing else.

(I'd be happy to run this on a Cray or Amdahl to get a nice top end figure if
anyone has some spare cycles...)

The Convex time can probably be sped up by changing the code to vectorize
better, but the point was to run this unmodified on all the systems.

---rick

 x 780	  Total		   User		  Sys
10.297	    479.1	    469.3	   6.7 Culler 7/10 *
 6.193	    796.5	    534.8	 210.6 Culler 7/10
 4.586	  1,075.6         1,056.3         19.3 Gould 9080 w/mult. accel. @ utah
 4.432	  1,113.1	  1,064.2	  48.9 Convex C-1
 4.137	  1,192.3 	  1,164.1         28.2 Vax 8600 w/fpa 4.3bsd @ utah
 3.847	  1,282.2	  1,257.4	  24.8 CCI Power 6/32 w/fpa 4.3bsd
 3.616	  1,364.2	  1,339.7	  24.5 CCI Power 6/32 w/fpa 4.2bsd
 3.435	  1,436.0	  1,410.8	  25.2 Sun 3/260 w/fpa -ffpa sun3.1 +
 3.347	  1,474.1	  1,450.6	  23.5 Celerity 1260D 3.2.50/betacc
 3.315	  1,488.0	  1,450.8	  37.2 Gould 9080
 2.937	  1,679.2	  1,635.5	  43.7 Celerity 1230 3.1?
 2.725	  1,810.4	  1,751.5	  58.9 Celerity 1260D 3.2.47
 2.404	  2,052.3	  2,017.3	  35.0 Sun 3/160 w/fpa sun3.2
 2.057	  2,398.1	  2,342.5	  55.6 Celerity 1200 3.1?
 1.398	  3,529.5	  3,486.0 	  43.5 Gould 6000
 1.290	  3,823.2	  3,785.5	  37.7 Sun 3/260 w/fpa -f68881 sun3.1 +
 1.159	  4,257.3	  4,253.2	   4.1 Vax 11/780 w/fpa 4.3bsd *
 1.118	  4,413.3	  4,462.6	  50.7 Vax 11/780 w/fpa 4.3bsd
 1.079	  4,572.0	  4,382.3	 189.7 Sun 3/160 w/68881 sun3.0
 1.000	  4,933.1	  4,867.3	  65.8 Vax 11/780 w/fpa 4.2bsd
 0.980	  5,036.0	  4,973.1	  62.9 Sun 3/50 w/68881 sun3.0
 0.859	  5,745.8	  5,711.8	  34.0 Sun 3/160 sun3.0PILOT
 0.765	  6,448.1         6,327.2        120.9 Vax 11/750 w/fpa 4.3bsd @ utah
 0.289	 17,091.2	 17,005.0	  86.2 Pyramid 90x @ rutgers
 0.230	 21,413.1	 21,332.9	  80.2 Sun 3/260 w/fpa -fsoft sun3.1 +
 0.197	 25,031.0	 24,470.9	 560.1 Vax 11/750 @ verdix
 0.146	 33,679.9	 33,424.3	 225.6 Sun 2/120 -fsky sun1.4
 0.141	 34,867.8	 34,822.2	  45.6 Sun 3/160 -fsoft sun3.0
 0.114	 43,134.0	 42,971.2	 162.8 Sun 3/50 -fsoft sun3.0
 0.046	107,399.3	105,875.5	1523.8 Sun 2/120 sun1.4
 
* Changing program i/o to use 8K byte buffers
+ Sun 3.1 user programs 3.2 kernel

The Suns can be compiled 3 different ways.
	-ffpa	With Floating Point Accelerator
	-f68881	With M68881 hardware floating point
	-fsoft	Software floating point

bob@uhmanoa.UUCP (Bob Cunningham) (11/18/86)

A portion of the results of Jack Dongarra's work at Argonne National Laboratory
comparing the performance of different computer systems solving.  These
particular results have to do with solving  dense systems of linear equations
using the LINPACK FORTRAN software of order 100 (notably the SGEFA and SGESL
column-oriented algorithms, based upon the BLAS series of subprograms).  This
is an application-specific set of benchmarks resulting in many cases MFLOP
ratings well below the theoretical performance of each machine (i.e., the
machines might well perform differently with somewhat different problems, even
similar floating-point-intensive problems). ``Rolled BLAS'' means BLAS routines
not unrolled for optimal performance. 

[of the machines listed below, I'd be tempted to call the FPS-264 and Alliant
FX-1 "personal supercomputers" in terms of their size, power requirements,
etc.]
                                                                          
Computer                OS/Compiler                             MFLOPs
--------                -----------                             ------
CRAY X-MP/4             not tested
CRAY X-MP/2             CFT 1.13 (rolled BLAS)                  24
CRAY-2 (1 processor)    CFT 2.70 (rolled BLAS)                  15
CRAY-1S                 CFT (rolled BLAS)                       12
Alliant FX/8 (8 CEs)    FX FORTRAN v2.0.19 (rolled BLAS)         7.6
SCS-40                  CFT 1.13 (rolled BLAS)                   7.3
FPS-264                 F02 APFTN64 OPT=4(rolled BLAS)           5.6
Convex C-1              FORTRAN 1.6 (rolled BLAS)                2.9
IBM 3081K (1 processor) H enhanced opt=3                         2.1
Alliant FX/1 (1 CE)     FX FORTRAN v2.0.19(rolled BLAS)          1.6
DEC VAX8800             VMS V4.3                                  .99
DEC VAX8650             VMS V4.1                                  .70
DEC VAX8500             VMS V4                                    .65
DEC VAX8600             VMS v4.1, FORTRAN 4.2                     .49
Harris HCX-7 w/fpp      f77 1.0                                   .48
Sun-3/160M+FPA          f77 -O -ffpa 3.1                          .40
Harris H800             SAUF77                                    .23
IBM 370/158             H opt=3                                   .23
Celerity C1200          4.2Bsd Unix f77                           .21
DEC VAX11/785 FPA       VMS V4.1                                  .20
DEC VAX8200             VMS V4.3                                  .15
DEC VAX11/780 FPA       VMS V4.1                                  .14
DEC microVAX II         VMS V4.1                                  .13
DEC VAX11/750 FPA       VMS V4.1                                  .12
Sun-3/75 w/68881        f77 -O -f68881 3.0                        .079
Apollo DN460/660        AEGIS 8.0 FTN                             .069
HP 9000 Series 320      HP-UX f77 5.15                            .063
Apollo DN3000           AEGIS 8.0 FTN                             .062
Masscomp MC500 FPP      3.1 FORTRAN                               .061
IBM RT PC Model 20      f77                                       .036
Apollo DN320            AEGIS 8.0 FTN                             .028
Apollo DN550 FPA        AEGIS 8.0 FTN                             .025
IBM AT 80287            PROFORT 1.0                               .012
IBM PC 8087             PROFORT 1.0                               .012
IBM AT 80287            Microsoft FORTRAN 3.2                     .0091
Apple Macintosh         ABSOFT 2.0b                               .0038

-- 
Bob Cunningham
bob@hig.hawaii.edu