gleicher@duke.cs.duke.edu (Michael Gleicher) (04/11/88)
Is my Mac II a dog? It is really 1/13th as fast as a MicroVax II (which is a slow machine - much slower than say a Sun 3/60)? Please look at what I've done and tell me what I've been doing wrong, and how I can get some decent performance out of what I thought was a blazing fast machine. I'd imagine something stupid and silly is going on like I'm not using the floating point accelerator (68881) or that some processor time is being eaten up on nonsense like updating the cursor. The benchmark program I have is a simple ray tracing program. For the tests of interest, all I/O is removed. It is a sheer test of double precision floating point performance as it uses little memory. I've tried LSC 2.15 and MPW. On MPW I had to use "extended" numbers otherwise performance was unbearable. The machine is a stock Mac II with 5 Meg. In the chart MPW was used under Finder. LSC under multifinder gave very similiar results (+-10%). Here are the numbers: Ray Tracing Times Machine No I/O File I/O user elapsed user elapsed Vax8600 6.63 7 13.017 15 MV20000 7.95 4 ???? 21.183 14 ?? uVax II 33.75 35 104.3 107 Mac II(a) 474.1 (a) compiler = MPW, extended used instead of double as it is faster all trails were run on lightly loaded machines 8600 = duke MV/20000 = dukee uVaxII = elmer Something is definately wrong. The code required NO changes to move from the Mac to the Vax (when developed with LSC). I will give the code to anyone who wants to look at it. It's only about 300 lines long. If someone could tell me how to use the 68881 and put some assembly into my Mac program, or tell MPW to write native code 68881 and use 68020 instructions (I'm sure there must be a compiler option to do this, but "Help C" only mentions 68010 Could it be our MPW is old?). An assembly function that would take the place of the C function: extended dot(x1,y1,z1,x2,y2,z2) { return(x1*x2+y1*y2+z1*z2);} would be enough of an example so I could figure the rest out. Thanks for help. I'm sure I'm just doing something stupid. Thanks, Mike Michael Lee Gleicher (-: If it looks like I'm wandering Duke University (-: around like I'm lost . . . E-Mail: gleicher@cs.duke.edu)(or uucp (-: Or P.O.B. 5899 D.S., Durham, NC 27706 (-: It's because I am!
dplatt@coherent.com (Dave Platt) (04/12/88)
In article <11541@duke.cs.duke.edu> gleicher@duke.cs.duke.edu (Michael Gleicher) writes: > Is my Mac II a dog? It is really 1/13th as fast as a MicroVax II (which is > a slow machine - much slower than say a Sun 3/60)? > > Please look at what I've done and tell me what I've been doing wrong, and > how I can get some decent performance out of what I thought was a blazing > fast machine. I'd imagine something stupid and silly is going on like I'm > not using the floating point accelerator (68881) or that some processor time > is being eaten up on nonsense like updating the cursor. Lightspeed C performs all of its floating-point operations via the SANE (Standard Apple Numeric Environment) traps. These traps are essentially subroutine calls to the SANE package in the ROM, which performs the actual floating-point operation... using the 68881 on the Mac II, and using software emulation techniques on the smaller machines. I don't know whether MPW uses the same technique, but I suspect so. Apple recommends the use of SANE wherever possible, for two reasons: 1) Using SANE ensures that the code will run on all machines, regardless of whether a 68881 is available. 2) Using SANE ensures that programs will return the same result on all machines. There are a few cases in which the 68881's trig functions return values that are slightly different than the SANE software- emulations of the same functions; in these cases, SANE does _not_ use the built-in 68881 function, but performs the usual polynomial expansion (using the 68881 for the floating-point math). SANE with a 68881 is 5-10 times faster than SANE without a 68881... but it's still painfully slow compared to direct in-line 68881 calculation (easily 10:1 slower in many cases). > The benchmark program I have is a simple ray tracing program. For the tests > of interest, all I/O is removed. It is a sheer test of double precision > floating point performance as it uses little memory. > > I've tried LSC 2.15 and MPW. On MPW I had to use "extended" numbers otherwise > performance was unbearable. The machine is a stock Mac II with 5 Meg. > In the chart MPW was used under Finder. LSC under multifinder gave very > similiar results (+-10%). LsC 3.0 will have a 68881-compilation option... but it's not shipping yet, and (despite the mistakenly-released ad in MacTutor) probably won't ship for some time... "perhaps midsummer" according to Think's sales rep to whom I spoke on Friday. > Here are the numbers: > > > Ray Tracing Times > > Machine No I/O File I/O > user elapsed user elapsed > > Vax8600 6.63 7 13.017 15 > MV20000 7.95 4 ???? 21.183 14 ?? > uVax II 33.75 35 104.3 107 > Mac II(a) 474.1 > Seems consistent with SANE overhead. I had the same sort of experience while working with my Mandelbrot-set calculation program (MandelZot... recently posted on comp.binaries.mac). > Something is definately wrong. The code required NO changes to move > from the Mac to the Vax (when developed with LSC). I will give the code > to anyone who wants to look at it. It's only about 300 lines long. > > If someone could tell me how to use the 68881 and put some assembly into > my Mac program, or tell MPW to write native code 68881 and use 68020 > instructions (I'm sure there must be a compiler option to do this, but > "Help C" only mentions 68010 Could it be our MPW is old?). An assembly > function that would take the place of the C function: > extended dot(x1,y1,z1,x2,y2,z2) > { return(x1*x2+y1*y2+z1*z2);} > would be enough of an example so I could figure the rest out. Well, 'tis not trivial, but it can be done without _too_ much headbanging in LightSpeed C 2.15, and presumably in MPW as well. There are really two catches to the deal: 1) LsC 2.15 doesn't understand 68881 opcodes; it's necessary to hand- assemble the instructions (or use an assembler on a 68881-aware machine such as a Sun 3), and then insert the instructions into the code stream with the "DC" directive. Urrgh... ugly as sin, but it does work. 2) LsC defines the "double" floating-point type as the SANE "extended". SANE uses a slightly different representation for the "extended" type than the 68881 does (this is permitted by the IEEE standard); SANE stores it as an 80-bit number, while the 68881 stores it as a 96-bit number with a 16-bit must-be-zero filler between the exponent and mantissa. There are two ways around this difference: you can manually convert the SANE "extended" to 68881 "extended" via a simple move-data-and-insert- a-zero-word, or you can declare the variables to be "short double", which LsC represents as a SANE "double" (which uses exactly the same representation as the 68881 uses... the IEEE standard for this datatype is specified down to the bit level). I've added code to MandelZot that uses the former approach... I convert the few variables I'm using to 68881 "extended" format. Compared to the SANE mode (even with a 68881 available) the difference in performance is astounding... high-precision, high-dwell calculations of the Mandelbrot iteration are almost as fast in 68881 mode as they are in my tightly-coded 16-bit-integer assembly-language loop. One nice tweak available if you code up the 68881 stuff manually is that you can leave intermediate results in the 68881's floating-point registers... a trick that most C compilers may not perform, and which reduces memory-access delays when running the code. I'll mail you the code fragments under separate cover. > Thanks for help. I'm sure I'm just doing something stupid. Nope... you've just fallen into a "It ain't there yet!" gap in the software. -- Dave Platt VOICE: (415) 493-8805 USNAIL: Coherent Thought Inc. 3350 West Bayshore #205 Palo Alto CA 94303 UUCP: ...!{ames,sun,uunet}!coherent!dplatt DOMAIN: dplatt@coherent.com INTERNET: coherent!dplatt@ames.arpa, ...@sun.com, ...@uunet.uu.net -- Dave Platt VOICE: (415) 493-8805 USNAIL: Coherent Thought Inc. 3350 West Bayshore #205 Palo Alto CA 94303 UUCP: ...!{ames,sun,uunet}!coherent!dplatt DOMAIN: dplatt@coherent.com INTERNET: coherent!dplatt@ames.arpa, ...@sun.com, ...@uunet.uu.net
paul@morganucodon.cis.ohio-state.edu (Paul Placeway) (04/13/88)
< extended dot(x1,y1,z1,x2,y2,z2) < { return(x1*x2+y1*y2+z1*z2);} In addition to the Mac-specific floating-point advice, you could probably get a reasonable speadup on all of the machines (depending on the smartness of the respective compilers) by replacing this with: #define DOT(x1,y1,z1,x2,y2,z2) (x1*x2+y1*y2+z1*z2) This will eliminate the overhead of a function call with 6 arguments on the stack (usually compairable in speed to several FLOPS) , and will allow you to declare the variables as register (as long as the compiler understands what a "register double" is). A really good compiler should turn this into code that is about as good as tuned (straight-forwared) assembly code (including exploiting 68881 register values). Of course, your milage may vary... -- Paul -=- Existence is beyond the power of words To define: Terms may be used But are none of them absolute.
singer@endor.harvard.edu (Rich Siegel) (04/13/88)
Your MPW is definitely old. The 2.0.x versions of MPW C have compiler switches to generate inline calls to the coprocessor, as well as optimizations for the 68020. --Rich
cam@ptisea.UUCP (cameron elliott) (04/14/88)
In article <3273@coherent.com>, dplatt@coherent.com (Dave Platt) writes: [A whole lot of stuff...] > > from the Mac to the Vax (when developed with LSC). I will give the code > > to anyone who wants to look at it. It's only about 300 lines long. [A whole lot of stuff...] Could you please post the ray tracing program? I am also looking for one that can handle polygons also. I have a mac II, and would like to play with it. -- Disclaimer: If employees dont represent an organization what does? Cameron Elliott Portable Cellular Communications Path: ...!uw-beaver!tikal!ptisea!cam