[comp.sys.mac.programmer] Is my Mac II really slow?

gleicher@duke.cs.duke.edu (Michael Gleicher) (04/11/88)

Is my Mac II a dog? It is really 1/13th as fast as a MicroVax II (which is
a slow machine - much slower than say a Sun 3/60)?

Please look at what I've done and tell me what I've been doing wrong, and
how I can get some decent performance out of what I thought was a blazing
fast machine. I'd imagine something stupid and silly is going on like I'm
not using the floating point accelerator (68881) or that some processor time
is being eaten up on nonsense like updating the cursor.

The benchmark program I have is a simple ray tracing program. For the tests
of interest, all I/O is removed. It is a sheer test of double precision
floating point performance as it uses little memory.

I've tried LSC 2.15 and MPW. On MPW I had to use "extended" numbers otherwise
performance was unbearable. The machine is a stock Mac II with 5 Meg. 
In the chart MPW was used under Finder. LSC under multifinder gave very
similiar results (+-10%).

Here are the numbers:
	

	Ray Tracing Times

Machine		No I/O			File I/O
		user	elapsed		user		elapsed

Vax8600		6.63	7		13.017		15
MV20000		7.95	4 ????		21.183		14 ??
uVax II		33.75	35		104.3		107
Mac II(a)		474.1

(a) compiler = MPW, extended used instead of double as it is faster

all trails were run on lightly loaded machines
8600 = duke
MV/20000 = dukee
uVaxII = elmer

Something is definately wrong. The code required NO changes to move
from the Mac to the Vax (when developed with LSC). I will give the code
to anyone who wants to look at it. It's only about 300 lines long.

If someone could tell me how to use the 68881 and put some assembly into
my Mac program, or tell MPW to write native code 68881 and use 68020
instructions (I'm sure there must be a compiler option to do this, but
"Help C" only mentions 68010 Could it be our MPW is old?). An assembly
function that would take the place of the C function:
	extended dot(x1,y1,z1,x2,y2,z2)
	{ return(x1*x2+y1*y2+z1*z2);}
would be enough of an example so I could figure the rest out.

Thanks for help. I'm sure I'm just doing something stupid.

Thanks,
	Mike

Michael Lee Gleicher			(-: If it looks like I'm wandering
	Duke University			(-:    around like I'm lost . . .
E-Mail: gleicher@cs.duke.edu)(or uucp	(-:
Or P.O.B. 5899 D.S., Durham, NC 27706	(-:   It's because I am!

dplatt@coherent.com (Dave Platt) (04/12/88)

In article <11541@duke.cs.duke.edu> gleicher@duke.cs.duke.edu (Michael Gleicher) writes:
> Is my Mac II a dog? It is really 1/13th as fast as a MicroVax II (which is
> a slow machine - much slower than say a Sun 3/60)?
> 
> Please look at what I've done and tell me what I've been doing wrong, and
> how I can get some decent performance out of what I thought was a blazing
> fast machine. I'd imagine something stupid and silly is going on like I'm
> not using the floating point accelerator (68881) or that some processor time
> is being eaten up on nonsense like updating the cursor.

Lightspeed C performs all of its floating-point operations via the SANE
(Standard Apple Numeric Environment) traps.  These traps are essentially
subroutine calls to the SANE package in the ROM, which performs the actual
floating-point operation... using the 68881 on the Mac II, and using
software emulation techniques on the smaller machines.

I don't know whether MPW uses the same technique, but I suspect so.  Apple
recommends the use of SANE wherever possible, for two reasons:

1) Using SANE ensures that the code will run on all machines, regardless
   of whether a 68881 is available.

2) Using SANE ensures that programs will return the same result on all
   machines.  There are a few cases in which the 68881's trig functions
   return values that are slightly different than the SANE software-
   emulations of the same functions;  in these cases, SANE does _not_ use
   the built-in 68881 function, but performs the usual polynomial expansion
   (using the 68881 for the floating-point math).

SANE with a 68881 is 5-10 times faster than SANE without a 68881... but it's
still painfully slow compared to direct in-line 68881 calculation (easily
10:1 slower in many cases).
 
> The benchmark program I have is a simple ray tracing program. For the tests
> of interest, all I/O is removed. It is a sheer test of double precision
> floating point performance as it uses little memory.
> 
> I've tried LSC 2.15 and MPW. On MPW I had to use "extended" numbers otherwise
> performance was unbearable. The machine is a stock Mac II with 5 Meg. 
> In the chart MPW was used under Finder. LSC under multifinder gave very
> similiar results (+-10%).

LsC 3.0 will have a 68881-compilation option... but it's not shipping yet,
and (despite the mistakenly-released ad in MacTutor) probably won't ship
for some time... "perhaps midsummer" according to Think's sales rep to whom I
spoke on Friday.
 
> Here are the numbers:
> 	
> 
> 	Ray Tracing Times
> 
> Machine		No I/O			File I/O
> 		user	elapsed		user		elapsed
> 
> Vax8600		6.63	7		13.017		15
> MV20000		7.95	4 ????		21.183		14 ??
> uVax II		33.75	35		104.3		107
> Mac II(a)		474.1
> 

Seems consistent with SANE overhead.  I had the same sort of experience
while working with my Mandelbrot-set calculation program (MandelZot...
recently posted on comp.binaries.mac).
 
> Something is definately wrong. The code required NO changes to move
> from the Mac to the Vax (when developed with LSC). I will give the code
> to anyone who wants to look at it. It's only about 300 lines long.
> 
> If someone could tell me how to use the 68881 and put some assembly into
> my Mac program, or tell MPW to write native code 68881 and use 68020
> instructions (I'm sure there must be a compiler option to do this, but
> "Help C" only mentions 68010 Could it be our MPW is old?). An assembly
> function that would take the place of the C function:
> 	extended dot(x1,y1,z1,x2,y2,z2)
> 	{ return(x1*x2+y1*y2+z1*z2);}
> would be enough of an example so I could figure the rest out.

Well, 'tis not trivial, but it can be done without _too_ much headbanging
in LightSpeed C 2.15, and presumably in MPW as well.

There are really two catches to the deal:

1) LsC 2.15 doesn't understand 68881 opcodes;  it's necessary to hand-
   assemble the instructions (or use an assembler on a 68881-aware machine
   such as a Sun 3), and then insert the instructions into the code stream
   with the "DC" directive.  Urrgh... ugly as sin, but it does work.
   
2) LsC defines the "double" floating-point type as the SANE "extended".
   SANE uses a slightly different representation for the "extended" type
   than the 68881 does (this is permitted by the IEEE standard);  SANE
   stores it as an 80-bit number, while the 68881 stores it as a 96-bit
   number with a 16-bit must-be-zero filler between the exponent and
   mantissa.
   
   There are two ways around this difference:  you can manually convert the
   SANE "extended" to 68881 "extended" via a simple move-data-and-insert-
   a-zero-word, or you can declare the variables to be "short double",
   which LsC represents as a SANE "double" (which uses exactly the same
   representation as the 68881 uses... the IEEE standard for this datatype
   is specified down to the bit level).

I've added code to MandelZot that uses the former approach... I convert
the few variables I'm using to 68881 "extended" format.  Compared to
the SANE mode (even with a 68881 available) the difference in
performance is astounding... high-precision, high-dwell calculations of
the Mandelbrot iteration are almost as fast in 68881 mode as they are
in my tightly-coded 16-bit-integer assembly-language loop.

One nice tweak available if you code up the 68881 stuff manually is
that you can leave intermediate results in the 68881's floating-point
registers... a trick that most C compilers may not perform, and which
reduces memory-access delays when running the code.

I'll mail you the code fragments under separate cover.
 
> Thanks for help. I'm sure I'm just doing something stupid.

Nope... you've just fallen into a "It ain't there yet!" gap in the software.
 
-- 
Dave Platt                                             VOICE: (415) 493-8805
  USNAIL: Coherent Thought Inc.  3350 West Bayshore #205  Palo Alto CA 94303
  UUCP: ...!{ames,sun,uunet}!coherent!dplatt     DOMAIN: dplatt@coherent.com
  INTERNET:   coherent!dplatt@ames.arpa,    ...@sun.com,    ...@uunet.uu.net

-- 
Dave Platt                                             VOICE: (415) 493-8805
  USNAIL: Coherent Thought Inc.  3350 West Bayshore #205  Palo Alto CA 94303
  UUCP: ...!{ames,sun,uunet}!coherent!dplatt     DOMAIN: dplatt@coherent.com
  INTERNET:   coherent!dplatt@ames.arpa,    ...@sun.com,    ...@uunet.uu.net

paul@morganucodon.cis.ohio-state.edu (Paul Placeway) (04/13/88)

< 	extended dot(x1,y1,z1,x2,y2,z2)
< 	{ return(x1*x2+y1*y2+z1*z2);}

In addition to the Mac-specific floating-point advice, you could
probably get a reasonable speadup on all of the machines (depending on
the smartness of the respective compilers) by replacing this with:

    #define DOT(x1,y1,z1,x2,y2,z2)	(x1*x2+y1*y2+z1*z2)

This will eliminate the overhead of a function call with 6 arguments on
the stack (usually compairable in speed to several FLOPS) , and will
allow you to declare the variables as register (as long as the compiler
understands what a "register double" is).  A really good compiler should
turn this into code that is about as good as tuned (straight-forwared)
assembly code (including exploiting 68881 register values).

Of course, your milage may vary...

		-- Paul
-=-
Existence is beyond the power of words
To define:
Terms may be used
But are none of them absolute.

singer@endor.harvard.edu (Rich Siegel) (04/13/88)

Your MPW is definitely old. The 2.0.x versions of MPW C have compiler
switches to generate inline calls to the coprocessor, as well as 
optimizations for the 68020.

		--Rich

cam@ptisea.UUCP (cameron elliott) (04/14/88)

In article <3273@coherent.com>, dplatt@coherent.com (Dave Platt) writes:
[A whole lot of stuff...]
> > from the Mac to the Vax (when developed with LSC). I will give the code
> > to anyone who wants to look at it. It's only about 300 lines long.
[A whole lot of stuff...]

Could you please post the ray tracing program?  I am also looking
for one that can handle polygons also.  I have a mac II, and would like to play with it.
-- 
Disclaimer: If employees dont represent an organization what does?
Cameron Elliott		Portable Cellular Communications
Path: ...!uw-beaver!tikal!ptisea!cam