[rec.games.programmer] 3D int/float optimizations stuff

jdb@reef.cis.ufl.edu (Brian K. W. Hook) (04/13/91)

Thanks to everyone who helped with the optimizations.  For those
interested, I am posting the results of each optimization followed by the
final source code.

Summary:  WOW!

First pass:    	21.86 seconds
Optimizations: none (this means all doubles/floats)

Second pass:   	21.48 seconds
Optimizations:  converted functions to pass pointers to objects instead of
                copies of objects

Third pass:	13.40 seconds(!)
Optimizations:	converted most of the calculations to fixed point integer
		calculations versus floating point

Fourth pass:	13.35 seconds
Optimizations:	converted "tmp" variable from long to int

Fifth pass:	12.46 seconds
Optimizations:	converted global variable from double to int

Last pass:	12.20 seconds
Optimizations:	converted _yaw, _roll, and _pitch to ints from doubles

I am sure that a couple of optimizations can still be done, most obviously
those bit shifts (although I really doubt they matter much).  I am not sure
how accurate these calculations are, but I do know that they don't distort
when displaying wireframe images which is all I care about.  I chose to
shift by 10 completely arbitrarily.

If you have any suggestions, please post or mail.

This is the actual function.  It is based on pseudo code from Lee Adam's "
Hi Performance Interactive Graphics in C" but is a pretty crappy book
otherwise.

/*

WX, WY, and WY:   World coordinates of object (relative to object's center)
MX, MY, and MZ:	  Distance object is from viewpoint
*DX, *DY:	  Final display coordinates

*/

void Calc3D ( int WX, int WY, int WZ, int MX, int MY, int MZ, int *DX, int
		*DY)
{
int tmp;
long xa, ya, za;   // Temp variables

   WX=-WX;
   xa=(_yawCosFactor*WX-_yawSinFactor*WZ)>>10;
   za=(_yawSinFactor*WX-_yawCosFactor*WZ)>>10;
   WX=(_rollCosFactor*xa+_rollSinFactor*WY)>>10;
   ya=(_rollCosFactor*WY-_rollSinFactor*xa)>>10;
   WZ=(_pitchCosFactor*za-_pitchSinFactor*ya)>>10;
   WY=(_pitchSinFactor*za+_pitchCosFactor*ya)>>10;
   WX+=MX;
   WY+=MY;
   WZ+=MZ;
   if (WZ==0)
      WZ=-1;
   tmp=AngularPerspFactor/WZ;
   *DX=tmp*WX+400;
   *DY=tmp*WY+300;
}

Ralf.Brown@B.GP.CS.CMU.EDU (04/13/91)

In article <28002@uflorida.cis.ufl.EDU>, jdb@reef.cis.ufl.edu (Brian K. W. Hook) wrote:
}Thanks to everyone who helped with the optimizations.  For those
}interested, I am posting the results of each optimization followed by the
}final source code.
}
}Summary:  WOW!
}
}First pass:     21.86 seconds
}Last pass:      12.20 seconds
}
}I am sure that a couple of optimizations can still be done, most obviously
}those bit shifts (although I really doubt they matter much).  I am not sure
}how accurate these calculations are, but I do know that they don't distort

I don't remember which compiler you said you are using, but if it is a 16-bit
compiler (such as MSC, Zortech, or Turbo), then both multiplies and shifts on
longs make calls to the runtime library.  As I recall, you said the function
originally used 90% of the execution time; with the following assembler
version of the function, you should get your execution time down to under
six seconds.  Note that I've rearranged the order of calculations somewhat,
that it could be optimized further by using SI and DI as temporaries to avoid
memory accesses, and that you will have to supply the necessary wrapper for
calling from your C code.  You can also get better precision by scaling the
sine and cosine factors by 16384 (14 bits), since you only need a range of
-1..+1 (which would be -16384..16384 after scaling); in that case, change all
the 1024s to 16384.


xa	dw   ?	  ; note that these are ints instead of longs!
ya	dw   ?
za	dw   ?

	neg   WX
	mov   ax,yawCosFactor
	imul  WZ
	mov   cx,dx
	mov   bx,ax
	mov   ax,yawSinFactor
	imul  WX
	sub   ax,bx
	sbb   dx,cx
	mov   cx,1024
	idiv  cx	      ; faster than a loop!
	mov   za,ax
        mov   ax,yawSinFactor
	imul  WZ
	mov   cx,dx
	mov   bx,ax
	mov   ax,yawCosFactor
	imul  WX
	sub   ax,bx
	sbb   dx,cx
	mov   cx,1024
	idiv  cx	      ; faster than a loop!
	mov   xa,ax
	imul  rollCosFactor
	mov   cx,dx
	mov   bx,ax
	mov   ax,rollSinFactor
	imul  WY
	add   ax,bx
	adc   dx,cx
	mov   cx,1024
	idiv  cx
	add   ax,MX
	mov   WX,ax
	mov   ax,xa
	imul  pitchSinFactor
	mov   bx,ax
	mov   cx,dx
	mov   ax,za
	imul  pitchCosFactor
	sub   ax,bx
	sbb   dx,cx
	mov   cx,1024
	idiv  cx
	mov   ya,ax
        mov   ax,za
	imul  pitchSinFactor
	mov   cx,dx
	mov   bx,ax
	mov   ax,ya
	imul  pitchCosFactor
	add   ax,bx
	adc   dx,cx
	mov   cx,1024
	idiv  cx
	add   ax,MY
	mov   WY,ax
	mov   ax,ya
	imul  pitchSinFactor
	mov   cx,dx
	mov   bx,ax
	mov   ax,za
	imul  pitchCosFactor
	sub   ax,bx
	sbb   dx,cx
	mov   cx,1024
	idiv  cx
	add   ax,MZ
	jnz   l_1
	dec   ax
l_1:
	mov   cx,ax	       ; WZ doesn't need to be stored in memory
	mov   ax,word ptr AngularPerspFactor
	mov   dx,word ptr AngularPerspFactor+2
	idiv  cx	       ; APF / WZ
	mov   cx,ax	       ; store a copy for later
	imul  WX
	add   ax,400	       ; tmp*WX+400
	mov   _DX,ax
	mov   ax,WY
	mul   cx
	add   ax,300	       ; tmp*WY+300
	mov   _DY,ax



--
{backbone}!cs.cmu.edu!ralf  ARPA: RALF@CS.CMU.EDU   FIDO: Ralf Brown 1:129/3.1
BITnet: RALF%CS.CMU.EDU@CMUCCVMA   AT&Tnet: (412)268-3053 (school)   FAX: ask
DISCLAIMER?  Did  | It isn't what we don't know that gives us trouble, it's
I claim something?| what we know that ain't so.  --Will Rogers