ao@cevax.berkeley.edu (Akin Ozselcuk) (07/14/88)
Hello, I am posting this article on behalf of a friend of mine who is planning to buy either a Sun3 or a VAX Station 2000 (a watered down uVAXII) He is planning to do a lot of number crunching by using f77. Here is the problem: 1. uVAXII 's have 0.9 VAX mips 2. Sun 3's have 3 Mips My question : Sun 3 seems very impressive in this respect BUT CAN WE SAY THAT Sun 3 IS 3 TIMES FASTER IN FLOATING POINT CALCULATIONS THAN uVAX HOW ABOUT FLOATING POINT SPEEDS OF Sun386i vs uVAXII? Any comments about these 3 systems for heavy number crunching applications will be appreciated. Thanks. .. AkIn Ozselcuk ' ao@cevax.berkeley.edu Dept of Civil Engineering, Experientia Docet UC Berkeley
roy@phri.UUCP (Roy Smith) (07/14/88)
ao@cevax.berkeley.edu (Akin Ozselcuk) writes: > I am posting this article on behalf of a friend of mine who is planning > to buy either a Sun3 or a VAX Station 2000 (a watered down uVAXII). He > is planning to do a lot of number crunching by using f77. Asking if a uVAX or a Sun-3 is faster for floating point is a misleading question, or at least an imcomplete one. Are you talking about a 3/50 without even the 68881 option or a 3/260 with FPA? The difference in floating point speed between the two is at least an order of magnitude. By way of comparison, we have an 11/750 with FPA, 3/50s both with and without 68881s and 3/160s with FPAs. To give you some feel for the rough relative speeds (notice the use of lots of ambigiuating terms; you're mileage will vary depending on zillions of factors), we find that a 3/50 with 68881 and the 750 with FPA are roughly the same speed. A 3/160 with FPA is about 10 times faster than that. From what I understand, the 3/260 (which we don't have) uses exactly the same FPA board as the 160 so for floating-point intensive applications, the 260 is not a whole lot faster than the 160. My guess is that the uVAX-II is about the same speed as a 750. Another factor to consider is that Sun's new snazzy Fortran compiler is supposed to produce *much* faster code than the generic Unix f77 compiler. -- Roy Smith, System Administrator Public Health Research Institute {allegra,philabs,cmcl2,rutgers}!phri!roy -or- phri!roy@uunet.uu.net "The connector is the network"
reiter@endor.harvard.edu (Ehud Reiter) (07/14/88)
In article <25065@ucbvax.BERKELEY.EDU> ao@cevax.berkeley.edu (Akin Ozselcuk) writes: >Sun 3 seems very impressive in this respect BUT CAN WE SAY THAT >Sun 3 IS 3 TIMES FASTER IN FLOATING POINT CALCULATIONS THAN uVAX The following data is from J. Dongarra, "Performance of Various Computers Using Standard Linear Equation Software in a Fortran Environment", COMPUTER ARCHITECTURE NEWS, vol16, no 1 (March 1988): (from Table 1 - full (i.e. double) precision, no assembly subroutines) Machine Mflops Sun 4/260 1.1 Sun 3/260 with FPA .46 uVAX 3200 (VMS) .41 Sun 3/160 with FPA .40 uVAX II (VMS) .13 Sun 3/260 with 68881 .11 Sun 3/160 with 68881 .10 Sun 3/50 with 68881 .087 uVAX II (Ultrix) .082 (and, just for fun) CRAY X-MP-4 480 (with vector unrolling, assembly subroutines) Alliant FX/8 27 (with vector unrolling, assembly subroutines) uVAX II (VMS) .16 (with assembly subroutines) IBM PC/AT with 80287 .012 (using PROFORT 1.0 compiler) Readers can draw their own interpretations. Note that while I think Dongarra's LINPACK is one of the most honest benchmarks around (and far better than, say, Dhrystone), it, like all benchmarks, still needs to be taken with a very large grain of salt. Ehud Reiter reiter@harvard (ARPA,BITNET,UUCP) reiter@harvard.harvard.EDU (new ARPA)
guy@gorodish.Sun.COM (Guy Harris) (07/15/88)
> Asking if a uVAX or a Sun-3 is faster for floating point is a > misleading question, or at least an imcomplete one. Are you talking about > a 3/50 without even the 68881 option or a 3/260 with FPA? The difference > in floating point speed between the two is at least an order of magnitude. His reference to 3 MIPS made it sound as if he were talking about a 3/60; the 3/60 comes standard with a 20MHz 68881 (faster than the 16.67MHz one for 3/50s and 3/100 series machines), but I don't think you can attach an FPA to it. As for the Sun386i, some tests I ran a while ago indicate that it may be faster on floating point than a 3/260 without an FPA, so it may well provide performance that's as good, if not better, than a 3/60. (The tests were just the Stanford benchmarks, I'm guessing what the 3/260 had, and the 386i wasn't running FCS software, so don't take my word for it.) > My guess is that the uVAX-II is about the same speed as a 750. My impression was that it was closer to a 780, but I've never used one so I don't know. > Another factor to consider is that Sun's new snazzy Fortran > compiler is supposed to produce *much* faster code than the generic Unix > f77 compiler. It does; it has a "real" optimizer (I'd say "global" except that I don't know how "global" it is; what is the "right" term for the generic sort of non-peephole optimizer?). It's not that "new" any more; in fact, in 4.0 on the Sun-2, Sun-3, and Sun-4, and in the Sun-4 Sys4-3.2 release, the same optimizer is available for the C compiler. I don't know whether it's available for FORTRAN or for C on the Sun386i. Now I think DEC may offer the VMS FORTRAN compiler on Ultrix as well, and that also has a "real" optimizer.
acphssrw@csuna.UUCP (Stephen R. Walton) (07/16/88)
In article <4953@husc6.harvard.edu> reiter@harvard.UUCP (Ehud Reiter) writes: >The following data is from J. Dongarra, "Performance of Various Computers >Using Standard Linear Equation Software in a Fortran Environment", COMPUTER >ARCHITECTURE NEWS, vol16, no 1 (March 1988): > >[table omitted]. >IBM PC/AT with 80287 .012 (using PROFORT 1.0 compiler) For what it's worth, I get 0.020 with Microsoft Fortran V5.1 on an 8 MHz AT. >Readers can draw their own interpretations. Note that while I think >Dongarra's LINPACK is one of the most honest benchmarks around (and far >better than, say, Dhrystone), it, like all benchmarks, still needs to >be taken with a very large grain of salt. Which brings up something I've been meaning to throw out to the net. The deleted lines from Ehud's posting show a Sun 3/160 to be about half the speed of the VAX 11/780. This is true but incomplete. On the Savage benchmark, the Sun comes up 5 times FASTER than Vax. What's happening? Well, the Linpack benchmark does matrix manipulation and therefore its real work is all * and /. The Savage benchmark consists entirely of transcendental functions, which are microcoded on the 68881 chip on the Sun but done in software on the Vax. To put it another way, SQRT on the Vax takes about the same time as 10 multiplications; this number is 3 on the Sun. I think what this REALLY means is that previous rules of thumb of the past about the tradeoff between transcendentals and multiplications doesn't apply to the 68881, 80n87, and similar FPU's. On these advanced chips, if you can get rid of 5 or 6 multiplications in favor of one transcendal, it is worth doing. I think a lot of old code could run faster if this was taken into account. PS. Ehud, did you mean "Whetstone" instead of "Dhrystone" above? The latter does only integer and address arithmetic and is in C, not Fortran. The former is a weighted mix of various operations which is supposedly "typical" of scientific code. Stephen Walton, representing myself swalton@solar.stanford.edu Cal State, Northridge rckg01m@calstate.BITNET
shenkin@cubsun.BIO.COLUMBIA.EDU (Peter Shenkin) (07/16/88)
For what it's worth, here are some benchmarks I did for one of my programs. I list the total time, the time in the "tweak" (number- crunching) subroutine, and the time in the "io" (heavy on io) subroutine, for two separarate runs, one of which is more io-intensive than the other. The VAX was an 11/780 with fpa, running ULTRIX. The code was written in Fortran, and compiled & run with f77 on the VAX and Sun, with fc on the Convex C1. The different -O levels for the Convex refer to different levels of optimization (see below). Separate benchmarks of a different kind indicated that the uVAX-II is about 0.8 of an 11/780fpa on ordinary floating point arithmetic. Lots depends on the compiler, though. A previous posting pointed out that DEC now makes its own Fortran compiler, previously available only under VMS, available under ULTRIX, and that Sun now has a DEC-compatible Fortran compiler, which people say also produces better code than their version of f77 used to. I advise you to skip the data for now and come back to it after reading the conclusions at the bottom. Comparison of times on the VAX, Sun3 and Convex for two typical random tweak runs: l2-1000-0.0: relatively high io/compute ratio l2-150-2.0all: relatively low io/compute ratio NUMBERS: l2-1000-0.0 Sun-3 Sun-3 Convex Convex Convex =========== VAX -68881 -fpa -O0 -O1 -O2 TIMES (cpu-s) total: 2766 2107 1199 325 302 300 tweak: 1679 1824 950 208 184 174 io: 1029 226 224 108 110 119 TOTAL SPEED: 1 1.31 2.31 8.51 9.16 9.22 (VAX = 1) ************************************************************************* ************************************************************************* l2-150-2.0all Sun-3 Sun-3 Convex Convex Convex =========== VAX -68881 -fpa -O0 -O1 -O2 TIMES (cpu-s) total: 2339 2734 1287 273 246 229 tweak: 1656 2266 1062 205 180 162 io: 161 34 34 16 17 17 TOTAL SPEED: 1 0.86 1.82 8.57 9.51 10.21 (VAX = 1) ************************************************************************* ************************************************************************* CONCLUSIONS (for THIS PROGRAM!!!): (1) Sun-3 vs. VAX: With -68881, Sun is 4-5 times faster on IO, about 0.8 times as fast on single-precision arithmetic. (I know through other tests that it's several times faster on double-precision.) With -fpa (Weitek floating point board), same IO comparison holds, but Sun is about 1.7 times the speed of the vax in single-precision arithmetic. (2) Convex vs. VAX: with full optimization, about 9 times faster than the VAX on IO, about 10 times faster on single-precision arithmetic. Vectorization (-O2) gives a 20% speed-up over only local scalar optimization (-O0); full scalar optimization gives a 10% speed-up over only local. NOTES: (1) The program is (a) poorly written, and (b) not well-suited in its present form to automatic vectorization. As such it is probably typical. (On the other hand, it works....) (2) Estimates of IO and floating-point speeds were made from the io and tweak times, which are dominated by these kinds of operations, respectively. (3) VAX is the 11/780-fpa at Columbia Biology (cubsvax); Sun3 -68881 refers to the 68881 floating point processor. This was also at Columbia Biology (ramon). Sun3 -fpa was a machine at Sun in Fort Lee, NJ. Convex was cuhhca at Howard Hughes Institute, Columbia Medical School. See above for illumination of the -O options. (4) This particular program probably does not easily lend itself to great speed-up through vectorization, since the operations tend to be on fairly short vectors -- about 40 long in these examples, perhaps about 120 long in the "best" case, these being the numbers of atoms in the loop being repeatedly randomly generated. With difficulty, it might be possible to rewrite the program so as to generate many loops together, and thereby deal with longer vectors. Less drastic rewrites might conceivable speed things up by a factor of 1.5 to 2 overall (just a guess, based on the speed-up of those portions of the code where everything vectorized). -- ******************************************************************************* Peter S. Shenkin, Department of Biological Sciences, Columbia University, New York, NY 10027 Tel: (212) 280-5517 (work); (212) 829-5363 (home) shenkin@cubsun.bio.columbia.edu shenkin%cubsun.bio.columbia.edu@cuvmb.BITNET
wes@obie.UUCP (Barnacle Wes) (07/17/88)
In article <59936@sun.uucp>, guy@gorodish.Sun.COM (Guy Harris) writes: > Now I think DEC may offer the VMS FORTRAN compiler on Ultrix as well, and that > also has a "real" optimizer. Yes, they do. The VAX optimizer may help your code more than you expect unless you're very good at writing f_a_s_t_ Fortran (as opposed to nice, readable Fortran). -- {hpda, uwmcsd1}!sp7040!obie!wes "Happiness lies in being priviledged to work hard for long hours in doing whatever you think is worth doing." -- Robert A. Heinlein --
reiter@endor.harvard.edu (Ehud Reiter) (07/18/88)
In article <1284@csuna.UUCP> bcphssrw@csunb.csun.edu (Stephen R. Walton) writes: >The deleted lines from Ehud's posting show a Sun 3/160 to be about >half the speed of the VAX 11/780. This is true but incomplete. On the >Savage benchmark, the Sun comes up 5 times FASTER than Vax. What's >happening? Well, the Linpack benchmark does matrix manipulation and >therefore its real work is all * and /. The Savage benchmark consists >entirely of transcendental functions, which are microcoded on the >68881 chip on the Sun but done in software on the Vax. Let me emphasize the point, which I should have made in my earlier posting of LINPACK benchmark figures, that no benchmark can predict the performance of real application programs with any accuracy (because application programs differ so widely - as Steve points out, whether a Sun or a VAX is faster depends on what kind of computation you're doing). Anyone who wants to buy a computer and is seriously interested in performance should test-run his own software on the computers in question, and not rely on benchmarks. Benchmarks are fun to argue about, but please don't take them too seriously when you're spending real money buying real machines. Ehud Reiter reiter@harvard (ARPA,BITNET,UUCP) reiter@harvard.harvard.EDU (new ARPA)
davidsen@steinmetz.ge.com (William E. Davidsen Jr) (07/19/88)
In article <25065@ucbvax.BERKELEY.EDU> ao@cevax.berkeley.edu (Akin Ozselcuk) writes: | HOW ABOUT FLOATING POINT SPEEDS OF Sun386i vs uVAXII? An 11/780 is faster than a MV-II. A Sun3-260 is faster than a 780. I include some figures I measured, showing actual instructions executed by a high level language. I include figures from a Dell310 (386/387) simply as a note of how far power has come in six years. U;ltrix 2.0 SunOS 3.2 Xenix/386 2.2.2 test 11/780 3/260 387 w/ FPA 68881-20 80387-20 short 302.1 1118.7 1922.1 long 455.7 1804.2 1837.4 float 136.7 395.6 442.3 double 180.5 457.2 369.6 All numbers are in a mix, similar to a Gibsom mix, described as typical in an old IEEE journal. The mix percentages were rounded to the nearest 5% before weighting. Like all benchmarks, this reveals trends and small differences are not meaningful. -- bill davidsen (wedu@ge-crd.arpa) {uunet | philabs | seismo}!steinmetz!crdos1!davidsen "Stupidity, like virtue, is its own reward" -me
ldm@texhrc.UUCP (Lyle Meier) (07/31/88)
calling conventions, but rather sticks to the vms standard methods. Should you wish to call a C program from the fortran, you need to write a bridge routine in something called a jacket building language. This is because the VAX fortran compiler insists on passing charater variables by descriptor, by default, and on uppercasing all entry point names. The VAX compilers behavior is different from the f77 compiler which passed chars in a form C could understand. Furhter the f77 compiler created entry point names by lower-casing and appending an underscore, which is what standard bsd systems do (at least sun and convex do). I have asked dec what they can do about this and have only goten the response "noted". This makes me leary of ultrix, since we do a lot of work in fortran.
chris@mimsy.UUCP (Chris Torek) (07/31/88)
In article <247@texhrc.UUCP> ldm@texhrc.UUCP (Lyle Meier) writes: [something apparently missing here] >calling conventions, but rather sticks to the vms standard methods. ... >[The] VAX [/VMS] fortran compiler insists on passing character variables >by descriptor ... [this] behavior is different from the f77 compiler >which passed chars in a form C could understand. Descriptors are not incomprehensible. You must simply create a few structure definitions, and do the interpreting yourself. >and on uppercasing all entry point names. This also should not be a serious problem. If the compiler does not prepend an `_', however, you will have to resort to assembler linkages, or to hackery (running the assembly code from /lib/ccom through sed). -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
karish@denali.stanford.edu (Chuck Karish) (07/31/88)
In article <247@texhrc.UUCP> ldm@texhrc.UUCP (Lyle Meier) writes: >Should >you wish to call a C program from the fortran, you need to write a bridge >routine in something called a jacket building language. This is because the >VAX fortran compiler insists on passing charater variables by descriptor ... The jacket building language is simple and easy to use, though the manual is not as helpful as it might be, and had (has?) some serious errors in its examples. Most jacket routines are one-line programs that simply declare the name of the routine and the types of the parameters for both C and Fortran. VAX Fortran for Ultrix is a useful tool for many Fortran users, for two reasons: 1) It's compatible with VMS Fortran, which is the source of many programs that have to be ported. 2) It's tailored for the VAX, and is fast. Probably still faster than the 4.3 version of f77; has anyone compared? Under Ultrix, VAX Fortran makes executables that are bigger than f77 executables, and bigger than they would be under VMS. This is because under VMS the Fortran runtime library stays in shared memory, so the developers favored speed over size. Under Ultrix, those big library routines get linked into every executable. Chuck Karish ARPA: karish@denali.stanford.edu BITNET: karish%denali@forsythe.stanford.edu UUCP: {decvax,hplabs!hpda}!mindcrf!karish USPS: 1825 California St. #5 Mountain View, CA 94041