jdb@reef.cis.ufl.edu (Brian K. W. Hook) (04/13/91)
Thanks to everyone who helped with the optimizations. For those
interested, I am posting the results of each optimization followed by the
final source code.
Summary: WOW!
First pass: 21.86 seconds
Optimizations: none (this means all doubles/floats)
Second pass: 21.48 seconds
Optimizations: converted functions to pass pointers to objects instead of
copies of objects
Third pass: 13.40 seconds(!)
Optimizations: converted most of the calculations to fixed point integer
calculations versus floating point
Fourth pass: 13.35 seconds
Optimizations: converted "tmp" variable from long to int
Fifth pass: 12.46 seconds
Optimizations: converted global variable from double to int
Last pass: 12.20 seconds
Optimizations: converted _yaw, _roll, and _pitch to ints from doubles
I am sure that a couple of optimizations can still be done, most obviously
those bit shifts (although I really doubt they matter much). I am not sure
how accurate these calculations are, but I do know that they don't distort
when displaying wireframe images which is all I care about. I chose to
shift by 10 completely arbitrarily.
If you have any suggestions, please post or mail.
This is the actual function. It is based on pseudo code from Lee Adam's "
Hi Performance Interactive Graphics in C" but is a pretty crappy book
otherwise.
/*
WX, WY, and WY: World coordinates of object (relative to object's center)
MX, MY, and MZ: Distance object is from viewpoint
*DX, *DY: Final display coordinates
*/
void Calc3D ( int WX, int WY, int WZ, int MX, int MY, int MZ, int *DX, int
*DY)
{
int tmp;
long xa, ya, za; // Temp variables
WX=-WX;
xa=(_yawCosFactor*WX-_yawSinFactor*WZ)>>10;
za=(_yawSinFactor*WX-_yawCosFactor*WZ)>>10;
WX=(_rollCosFactor*xa+_rollSinFactor*WY)>>10;
ya=(_rollCosFactor*WY-_rollSinFactor*xa)>>10;
WZ=(_pitchCosFactor*za-_pitchSinFactor*ya)>>10;
WY=(_pitchSinFactor*za+_pitchCosFactor*ya)>>10;
WX+=MX;
WY+=MY;
WZ+=MZ;
if (WZ==0)
WZ=-1;
tmp=AngularPerspFactor/WZ;
*DX=tmp*WX+400;
*DY=tmp*WY+300;
}
Ralf.Brown@B.GP.CS.CMU.EDU (04/13/91)
In article <28002@uflorida.cis.ufl.EDU>, jdb@reef.cis.ufl.edu (Brian K. W. Hook) wrote: }Thanks to everyone who helped with the optimizations. For those }interested, I am posting the results of each optimization followed by the }final source code. } }Summary: WOW! } }First pass: 21.86 seconds }Last pass: 12.20 seconds } }I am sure that a couple of optimizations can still be done, most obviously }those bit shifts (although I really doubt they matter much). I am not sure }how accurate these calculations are, but I do know that they don't distort I don't remember which compiler you said you are using, but if it is a 16-bit compiler (such as MSC, Zortech, or Turbo), then both multiplies and shifts on longs make calls to the runtime library. As I recall, you said the function originally used 90% of the execution time; with the following assembler version of the function, you should get your execution time down to under six seconds. Note that I've rearranged the order of calculations somewhat, that it could be optimized further by using SI and DI as temporaries to avoid memory accesses, and that you will have to supply the necessary wrapper for calling from your C code. You can also get better precision by scaling the sine and cosine factors by 16384 (14 bits), since you only need a range of -1..+1 (which would be -16384..16384 after scaling); in that case, change all the 1024s to 16384. xa dw ? ; note that these are ints instead of longs! ya dw ? za dw ? neg WX mov ax,yawCosFactor imul WZ mov cx,dx mov bx,ax mov ax,yawSinFactor imul WX sub ax,bx sbb dx,cx mov cx,1024 idiv cx ; faster than a loop! mov za,ax mov ax,yawSinFactor imul WZ mov cx,dx mov bx,ax mov ax,yawCosFactor imul WX sub ax,bx sbb dx,cx mov cx,1024 idiv cx ; faster than a loop! mov xa,ax imul rollCosFactor mov cx,dx mov bx,ax mov ax,rollSinFactor imul WY add ax,bx adc dx,cx mov cx,1024 idiv cx add ax,MX mov WX,ax mov ax,xa imul pitchSinFactor mov bx,ax mov cx,dx mov ax,za imul pitchCosFactor sub ax,bx sbb dx,cx mov cx,1024 idiv cx mov ya,ax mov ax,za imul pitchSinFactor mov cx,dx mov bx,ax mov ax,ya imul pitchCosFactor add ax,bx adc dx,cx mov cx,1024 idiv cx add ax,MY mov WY,ax mov ax,ya imul pitchSinFactor mov cx,dx mov bx,ax mov ax,za imul pitchCosFactor sub ax,bx sbb dx,cx mov cx,1024 idiv cx add ax,MZ jnz l_1 dec ax l_1: mov cx,ax ; WZ doesn't need to be stored in memory mov ax,word ptr AngularPerspFactor mov dx,word ptr AngularPerspFactor+2 idiv cx ; APF / WZ mov cx,ax ; store a copy for later imul WX add ax,400 ; tmp*WX+400 mov _DX,ax mov ax,WY mul cx add ax,300 ; tmp*WY+300 mov _DY,ax -- {backbone}!cs.cmu.edu!ralf ARPA: RALF@CS.CMU.EDU FIDO: Ralf Brown 1:129/3.1 BITnet: RALF%CS.CMU.EDU@CMUCCVMA AT&Tnet: (412)268-3053 (school) FAX: ask DISCLAIMER? Did | It isn't what we don't know that gives us trouble, it's I claim something?| what we know that ain't so. --Will Rogers