[comp.lang.fortran] Sun 3 vs uVAXII floating point speed....

ao@cevax.berkeley.edu (Akin Ozselcuk) (07/14/88)

Hello,

I am posting this article on behalf of a friend of mine who
is planning to buy either a Sun3 or a VAX Station 2000
(a watered down uVAXII)

He is planning to do a lot of number crunching by using f77.

Here is the problem: 

1. uVAXII 's have 0.9 VAX mips

2. Sun 3's have 3 Mips

My question :

Sun 3 seems very impressive in this respect BUT CAN WE SAY THAT
Sun 3 IS 3 TIMES FASTER IN FLOATING POINT CALCULATIONS THAN uVAX


HOW ABOUT FLOATING POINT SPEEDS OF Sun386i vs uVAXII?


Any comments about these 3 systems for heavy number crunching
applications will be appreciated.

Thanks.
      ..
AkIn  Ozselcuk
           '   

ao@cevax.berkeley.edu
Dept of Civil Engineering,                    Experientia Docet
UC Berkeley

roy@phri.UUCP (Roy Smith) (07/14/88)

ao@cevax.berkeley.edu (Akin Ozselcuk) writes:
> I am posting this article on behalf of a friend of mine who is planning
> to buy either a Sun3 or a VAX Station 2000 (a watered down uVAXII).  He
> is planning to do a lot of number crunching by using f77.

	Asking if a uVAX or a Sun-3 is faster for floating point is a
misleading question, or at least an imcomplete one.  Are you talking about
a 3/50 without even the 68881 option or a 3/260 with FPA?  The difference
in floating point speed between the two is at least an order of magnitude.

	By way of comparison, we have an 11/750 with FPA, 3/50s both with
and without 68881s and 3/160s with FPAs.  To give you some feel for the
rough relative speeds (notice the use of lots of ambigiuating terms; you're
mileage will vary depending on zillions of factors), we find that a 3/50
with 68881 and the 750 with FPA are roughly the same speed.  A 3/160 with
FPA is about 10 times faster than that.  From what I understand, the 3/260
(which we don't have) uses exactly the same FPA board as the 160 so for
floating-point intensive applications, the 260 is not a whole lot faster
than the 160.  My guess is that the uVAX-II is about the same speed as a
750.

	Another factor to consider is that Sun's new snazzy Fortran
compiler is supposed to produce *much* faster code than the generic Unix
f77 compiler.
-- 
Roy Smith, System Administrator
Public Health Research Institute
{allegra,philabs,cmcl2,rutgers}!phri!roy -or- phri!roy@uunet.uu.net
"The connector is the network"

reiter@endor.harvard.edu (Ehud Reiter) (07/14/88)

In article <25065@ucbvax.BERKELEY.EDU> ao@cevax.berkeley.edu (Akin Ozselcuk) writes:
>Sun 3 seems very impressive in this respect BUT CAN WE SAY THAT
>Sun 3 IS 3 TIMES FASTER IN FLOATING POINT CALCULATIONS THAN uVAX

The following data is from J. Dongarra, "Performance of Various Computers
Using Standard Linear Equation Software in a Fortran Environment", COMPUTER
ARCHITECTURE NEWS, vol16, no 1 (March 1988):

(from Table 1 - full (i.e. double) precision, no assembly subroutines)
Machine			Mflops
Sun 4/260		1.1
Sun 3/260 with FPA	 .46
uVAX 3200 (VMS)		 .41
Sun 3/160 with FPA	 .40
uVAX II (VMS)		 .13
Sun 3/260 with 68881	 .11
Sun 3/160 with 68881	 .10
Sun 3/50 with 68881	 .087
uVAX II (Ultrix)	 .082

(and, just for fun)
CRAY X-MP-4	      480	(with vector unrolling, assembly subroutines)
Alliant FX/8	       27	(with vector unrolling, assembly subroutines)
uVAX II (VMS)		 .16	(with assembly subroutines)
IBM PC/AT with 80287	 .012	(using PROFORT 1.0 compiler)

Readers can draw their own interpretations.  Note that while I think
Dongarra's LINPACK is one of the most honest benchmarks around (and far
better than, say, Dhrystone), it, like all benchmarks, still needs to
be taken with a very large grain of salt.

					Ehud Reiter
					reiter@harvard	(ARPA,BITNET,UUCP)
					reiter@harvard.harvard.EDU  (new ARPA)

guy@gorodish.Sun.COM (Guy Harris) (07/15/88)

> 	Asking if a uVAX or a Sun-3 is faster for floating point is a
> misleading question, or at least an imcomplete one.  Are you talking about
> a 3/50 without even the 68881 option or a 3/260 with FPA?  The difference
> in floating point speed between the two is at least an order of magnitude.

His reference to 3 MIPS made it sound as if he were talking about a 3/60; the
3/60 comes standard with a 20MHz 68881 (faster than the 16.67MHz one for 3/50s
and 3/100 series machines), but I don't think you can attach an FPA to it.

As for the Sun386i, some tests I ran a while ago indicate that it may be faster
on floating point than a 3/260 without an FPA, so it may well provide
performance that's as good, if not better, than a 3/60.  (The tests were just
the Stanford benchmarks, I'm guessing what the 3/260 had, and the 386i wasn't
running FCS software, so don't take my word for it.)

> My guess is that the uVAX-II is about the same speed as a 750.

My impression was that it was closer to a 780, but I've never used one so I
don't know.

> 	Another factor to consider is that Sun's new snazzy Fortran
> compiler is supposed to produce *much* faster code than the generic Unix
> f77 compiler.

It does; it has a "real" optimizer (I'd say "global" except that I don't know
how "global" it is; what is the "right" term for the generic sort of
non-peephole optimizer?).  It's not that "new" any more; in fact, in 4.0 on the
Sun-2, Sun-3, and Sun-4, and in the Sun-4 Sys4-3.2 release, the same optimizer
is available for the C compiler.  I don't know whether it's available for
FORTRAN or for C on the Sun386i.

Now I think DEC may offer the VMS FORTRAN compiler on Ultrix as well, and that
also has a "real" optimizer.

acphssrw@csuna.UUCP (Stephen R. Walton) (07/16/88)

In article <4953@husc6.harvard.edu> reiter@harvard.UUCP (Ehud Reiter) writes:
>The following data is from J. Dongarra, "Performance of Various Computers
>Using Standard Linear Equation Software in a Fortran Environment", COMPUTER
>ARCHITECTURE NEWS, vol16, no 1 (March 1988):
>
>[table omitted].
>IBM PC/AT with 80287	 .012	(using PROFORT 1.0 compiler)

For what it's worth, I get 0.020 with Microsoft Fortran V5.1 on an 8 MHz AT.

>Readers can draw their own interpretations.  Note that while I think
>Dongarra's LINPACK is one of the most honest benchmarks around (and far
>better than, say, Dhrystone), it, like all benchmarks, still needs to
>be taken with a very large grain of salt.

Which brings up something I've been meaning to throw out to the net.
The deleted lines from Ehud's posting show a Sun 3/160 to be about
half the speed of the VAX 11/780.  This is true but incomplete.  On the
Savage benchmark, the Sun comes up 5 times FASTER than Vax.  What's
happening?  Well, the Linpack benchmark does matrix manipulation and
therefore its real work is all * and /.  The Savage benchmark consists
entirely of transcendental functions, which are microcoded on the
68881 chip on the Sun but done in software on the Vax.  To put it
another way, SQRT on the Vax takes about the same time as 10
multiplications;  this number is 3 on the Sun.
    I think what this REALLY means is that previous rules of thumb of
the past about the tradeoff between transcendentals and
multiplications doesn't apply to the 68881, 80n87, and similar FPU's.
On these advanced chips, if you can get rid of 5 or 6 multiplications
in favor of one transcendal, it is worth doing.  I think a lot of old
code could run faster if this was taken into account.
    PS.  Ehud, did you mean "Whetstone" instead of "Dhrystone" above?
The latter does only integer and address arithmetic and is in C, not
Fortran.   The former is a weighted mix of various operations which is
supposedly "typical" of scientific code.

Stephen Walton, representing myself		swalton@solar.stanford.edu
Cal State, Northridge				rckg01m@calstate.BITNET

shenkin@cubsun.BIO.COLUMBIA.EDU (Peter Shenkin) (07/16/88)

For what it's worth, here are some benchmarks I did for one of my
programs.  I list the total time, the time in the "tweak" (number-
crunching) subroutine, and the time in the "io" (heavy on io) subroutine,
for two separarate runs, one of which is more io-intensive than the other.

The VAX was an 11/780 with fpa, running ULTRIX.  The code was written in
Fortran, and compiled & run with f77 on the VAX and Sun, with fc on the
Convex C1.  The different -O levels for the Convex refer to different
levels of optimization (see below).

Separate benchmarks of a different kind indicated that the uVAX-II is about
0.8 of an 11/780fpa on ordinary floating point arithmetic.  Lots depends on
the compiler, though.  A previous posting pointed out that DEC now makes
its own Fortran compiler, previously available only under VMS, available
under ULTRIX, and that Sun now has a DEC-compatible Fortran compiler, which
people say also produces better code than their version of f77 used to.

I advise you to skip the data for now and come back to it after reading
the conclusions at the bottom.

Comparison of times on the VAX, Sun3 and Convex for two typical random tweak
runs:
    l2-1000-0.0:    relatively high io/compute ratio
    l2-150-2.0all:  relatively low io/compute ratio

NUMBERS:

l2-1000-0.0             Sun-3     Sun-3     Convex    Convex    Convex
===========   VAX       -68881    -fpa      -O0       -O1       -O2
TIMES (cpu-s)
total:        2766      2107      1199       325       302       300
tweak:        1679      1824       950       208       184       174
io:           1029       226       224       108       110       119

TOTAL
SPEED:           1      1.31      2.31      8.51      9.16      9.22
(VAX = 1)

*************************************************************************
*************************************************************************

l2-150-2.0all           Sun-3     Sun-3     Convex    Convex    Convex
===========   VAX       -68881    -fpa      -O0       -O1       -O2
TIMES (cpu-s)
total:        2339      2734      1287       273       246       229
tweak:        1656      2266      1062       205       180       162
io:            161        34        34        16        17        17

TOTAL
SPEED:           1      0.86      1.82      8.57       9.51    10.21
(VAX = 1)

*************************************************************************
*************************************************************************

CONCLUSIONS (for THIS PROGRAM!!!):

    (1) Sun-3 vs. VAX:  With -68881, Sun is 4-5 times faster on IO, about
        0.8 times as fast on single-precision arithmetic.  (I know through
        other tests that it's several times faster on double-precision.)
        With -fpa (Weitek floating point board), same IO comparison holds,
        but Sun is about 1.7 times the speed of the vax in single-precision
        arithmetic.
	(2) Convex vs. VAX: with full optimization, about 9 times faster than
        the VAX on IO, about 10 times faster on single-precision arithmetic.
        Vectorization (-O2) gives a 20% speed-up over only local scalar
        optimization (-O0);  full scalar optimization gives a 10% speed-up
        over only local.

NOTES:

    (1) The program is (a) poorly written, and (b) not well-suited in its
        present form to automatic vectorization.  As such it is probably
        typical.  (On the other hand, it works....)
    (2) Estimates of IO and floating-point speeds were made from the
        io and tweak times, which are dominated by these kinds of operations,
        respectively.
    (3) VAX is the 11/780-fpa at Columbia Biology (cubsvax);  Sun3 -68881
        refers to the 68881 floating point processor.  This was also at
        Columbia Biology (ramon).  Sun3 -fpa was a machine at Sun in Fort
        Lee, NJ.  Convex was cuhhca at Howard Hughes Institute, Columbia
        Medical School.  See above for illumination of the -O options.
    (4) This particular program probably does not easily lend itself to great
        speed-up through vectorization, since the operations tend to be on
        fairly short vectors -- about 40 long in these examples, perhaps
        about 120 long in the "best" case, these being the numbers of atoms
        in the loop being repeatedly randomly generated.  With difficulty,
        it might be possible to rewrite the program so as to generate many
        loops together, and thereby deal with longer vectors.  Less drastic
        rewrites might conceivable speed things up by a factor of 1.5 to 2
        overall (just a guess, based on the speed-up of those portions of
        the code where everything vectorized).
-- 
*******************************************************************************
Peter S. Shenkin,    Department of Biological Sciences,    Columbia University,
New York, NY   10027         Tel: (212) 280-5517 (work);  (212) 829-5363 (home)
shenkin@cubsun.bio.columbia.edu    shenkin%cubsun.bio.columbia.edu@cuvmb.BITNET

wes@obie.UUCP (Barnacle Wes) (07/17/88)

In article <59936@sun.uucp>, guy@gorodish.Sun.COM (Guy Harris) writes:
> Now I think DEC may offer the VMS FORTRAN compiler on Ultrix as well, and that
> also has a "real" optimizer.

Yes, they do.  The VAX optimizer may help your code more than you expect
unless you're very good at writing f_a_s_t_ Fortran (as opposed to nice,
readable Fortran).
-- 
                     {hpda, uwmcsd1}!sp7040!obie!wes
           "Happiness lies in being priviledged to work hard for
           long hours in doing whatever you think is worth doing."
                         -- Robert A. Heinlein --

reiter@endor.harvard.edu (Ehud Reiter) (07/18/88)

In article <1284@csuna.UUCP> bcphssrw@csunb.csun.edu (Stephen R. Walton) writes:
>The deleted lines from Ehud's posting show a Sun 3/160 to be about
>half the speed of the VAX 11/780.  This is true but incomplete.  On the
>Savage benchmark, the Sun comes up 5 times FASTER than Vax.  What's
>happening?  Well, the Linpack benchmark does matrix manipulation and
>therefore its real work is all * and /.  The Savage benchmark consists
>entirely of transcendental functions, which are microcoded on the
>68881 chip on the Sun but done in software on the Vax.

Let me emphasize the point, which I should have made in my earlier posting
of LINPACK benchmark figures, that no benchmark can predict the performance
of real application programs with any accuracy (because application programs
differ so widely - as Steve points out, whether a Sun or a VAX is faster depends
on what kind of computation you're doing).  Anyone who wants to buy a
computer and is seriously interested in performance should test-run his own
software on the computers in question, and not rely on benchmarks.  Benchmarks
are fun to argue about, but please don't take them too seriously when you're
spending real money buying real machines.

					Ehud Reiter
					reiter@harvard	(ARPA,BITNET,UUCP)
					reiter@harvard.harvard.EDU  (new ARPA)

davidsen@steinmetz.ge.com (William E. Davidsen Jr) (07/19/88)

In article <25065@ucbvax.BERKELEY.EDU> ao@cevax.berkeley.edu (Akin Ozselcuk) writes:

| HOW ABOUT FLOATING POINT SPEEDS OF Sun386i vs uVAXII?

  An 11/780 is faster than a MV-II. A Sun3-260 is faster than a 780. I
include some figures I measured, showing actual instructions executed by
a high level language. I include figures from a Dell310 (386/387) simply
as a note of how far power has come in six years.

		U;ltrix 2.0	SunOS 3.2	Xenix/386 2.2.2
test		11/780		3/260		387
		w/ FPA		68881-20	80387-20

short		 302.1		1118.7		1922.1
long		 455.7		1804.2		1837.4
float		 136.7		 395.6		 442.3
double		 180.5		 457.2		 369.6

  All numbers are in a mix, similar to a Gibsom mix, described as
typical in an old IEEE journal. The mix percentages were rounded to the
nearest 5% before weighting. Like all benchmarks, this reveals trends
and small differences are not meaningful.
-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs | seismo}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

ldm@texhrc.UUCP (Lyle Meier) (07/31/88)

calling conventions, but rather sticks to the vms standard methods. Should
you wish to call a C program from the fortran, you need to write a bridge 
routine in something called a jacket building language. This is because the
VAX fortran compiler insists on passing charater variables by descriptor,
by default, and on uppercasing all entry point names. The VAX compilers 
behavior is different from the f77 compiler which passed chars in a form
C could understand. Furhter the f77 compiler created entry point names
by lower-casing and appending an underscore, which is what standard bsd
systems do (at least sun and convex do). I have asked dec what they can
do about this and have only goten the response "noted". This makes me
leary of ultrix, since we do a lot of work in fortran.

chris@mimsy.UUCP (Chris Torek) (07/31/88)

In article <247@texhrc.UUCP> ldm@texhrc.UUCP (Lyle Meier) writes:
[something apparently missing here]
>calling conventions, but rather sticks to the vms standard methods. ...
>[The] VAX [/VMS] fortran compiler insists on passing character variables
>by descriptor ... [this] behavior is different from the f77 compiler
>which passed chars in a form C could understand.

Descriptors are not incomprehensible.  You must simply create a few
structure definitions, and do the interpreting yourself.

>and on uppercasing all entry point names.

This also should not be a serious problem.  If the compiler does not
prepend an `_', however, you will have to resort to assembler linkages,
or to hackery (running the assembly code from /lib/ccom through sed).
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

karish@denali.stanford.edu (Chuck Karish) (07/31/88)

In article <247@texhrc.UUCP> ldm@texhrc.UUCP (Lyle Meier) writes:
>Should
>you wish to call a C program from the fortran, you need to write a bridge 
>routine in something called a jacket building language. This is because the
>VAX fortran compiler insists on passing charater variables by descriptor ...

The jacket building language is simple and easy to use, though the manual
is not as helpful as it might be, and had (has?) some serious errors in
its examples.  Most jacket routines are one-line programs that simply
declare the name of the routine and the types of the parameters for
both C and Fortran.

VAX Fortran for Ultrix is a useful tool for many Fortran users, for two
reasons:

	1) It's compatible with VMS Fortran, which is the source of
	   many programs that have to be ported.

	2) It's tailored for the VAX, and is fast.  Probably still
	   faster than the 4.3 version of f77; has anyone compared?

Under Ultrix, VAX Fortran makes executables that are bigger than f77
executables, and bigger than they would be under VMS.  This is because
under VMS the Fortran runtime library stays in shared memory, so the
developers favored speed over size.  Under Ultrix, those big library
routines get linked into every executable.

Chuck Karish	ARPA:	karish@denali.stanford.edu
		BITNET:	karish%denali@forsythe.stanford.edu
		UUCP:	{decvax,hplabs!hpda}!mindcrf!karish
		USPS:	1825 California St. #5   Mountain View, CA 94041