mmblom@cs.vu.nl (Marc Blom) (04/03/91)
'm wondering how fast 3D graphics can really go. I've written some rotation and perspective routines (in assembly) and profiling shows that 60 % of the time is spend in my routine RotatePoint. This routine rotates 1 point around all three axes in only about 900 cycles minimum and about 1100 cycles maximum. Is this fast ? Can someone do better ? (If so PLEASE mail me how you did it). I won't include the source here because it's about 100 lines long but instead i'll tell you what it does: It rotates a point x.word, y.word, z.word (signed) using fixed point (14 bits behind the point) integer calculation. rotating around each ax uses 4 muls, giving a total of 12 muls. (This is the variation in clock cycles I mentioned above, about 900 for best case muls, about 1100 for worst case muls) By the way, these clock cycles are only valid for a plain 68000. (I don't know about 680x0 with x=1,2,3,(4?) I believe you then have to play with cache-sizes etc. to get the best results). I want to have the fastest possible rotation routine so I can concentrate on kicking the Graphics Library AreaMove/Draw/End out (In a system friendly manner, don't worry :-) ) And write a fast blitter routine. The 60 % mentioned above was only relative to my routines part of the execution time, of the total time involved, 15 % was spend in my routines, including RotatePoint, Perspective etc and 85 % was consumed by those wonderfull (yuk, did I say wonderfull ?) Graphics Library Calls. No wonder games writers kick the OS out ! (I hope this doesn't provoke another TO OS OR NOT TO OS discussion !) If someone is interested in the source of RotatePoint, just drop me a line and I will mail it to you. Marc Blom +--------------------------------------------+------------------------+ | Marc Blom | Email: mmblom@cs.vu.nl | What IS the question ? | | Gondel 14/13 | | | | 8243 BL Lelystad | Phone: 03200-46233 (NL) | TO OS OR NOT TO OS, | | The Netherlands | | that is the question ! | +------------------+-------------------------+------------------------+
rjc@geech.gnu.ai.mit.edu (Ray Cromwell) (04/03/91)
In article <9529@star.cs.vu.nl> mmblom@cs.vu.nl (Marc Blom) writes: >'m wondering how fast 3D graphics can really go. I've written some rotation >and perspective routines (in assembly) and profiling shows that 60 % of the >time is spend in my routine RotatePoint. This routine rotates 1 point around >all three axes in only about 900 cycles minimum and about 1100 cycles maximum. >Is this fast ? Can someone do better ? (If so PLEASE mail me how you did it). >I won't include the source here because it's about 100 lines long but instead >i'll tell you what it does: It rotates a point x.word, y.word, z.word (signed) >using fixed point (14 bits behind the point) integer calculation. rotating >around each ax uses 4 muls, giving a total of 12 muls. (This is the variation >in clock cycles I mentioned above, about 900 for best case muls, about 1100 >for worst case muls) By the way, these clock cycles are only valid for a plain >68000. (I don't know about 680x0 with x=1,2,3,(4?) I believe you then have to >play with cache-sizes etc. to get the best results). >I want to have the fastest possible rotation routine so I can concentrate on >kicking the Graphics Library AreaMove/Draw/End out (In a system friendly manner, >don't worry :-) ) And write a fast blitter routine. >The 60 % mentioned above was only relative to my routines part of the execution >time, of the total time involved, 15 % was spend in my routines, including >RotatePoint, Perspective etc and 85 % was consumed by those wonderfull (yuk, >did I say wonderfull ?) Graphics Library Calls. No wonder games writers kick >the OS out ! (I hope this doesn't provoke another TO OS OR NOT TO OS >discussion !) >If someone is interested in the source of RotatePoint, just drop me a line and >I will mail it to you. > > Marc Blom > >+--------------------------------------------+------------------------+ >| Marc Blom | Email: mmblom@cs.vu.nl | What IS the question ? | >| Gondel 14/13 | | | >| 8243 BL Lelystad | Phone: 03200-46233 (NL) | TO OS OR NOT TO OS, | >| The Netherlands | | that is the question ! | >+------------------+-------------------------+------------------------+ > Well, this all depends on what you are trying to do. Do you want to do a flight simulator, or are you doing an animation/demo/model rotation? If the latter, you can precompute all the rotations before run time, and just render them via lookup table. As for the OS, it really has nothing to do with your project. What needs to be optimized is the critical section of your rotation routine. I'd suggest caching the points and computing all the rotations first, then render them later all at once. I have the feeling you're doing something like this: loop { calculate point render point get next point } end loop You should (IHMO) calculate all the pointers first, and if space isn't a problem, unwind the loop and use lots of lookup tables. You can also add in special case optimization for known rotations like 0, 45 and 90 degrees. As for rendering, you don't need to kick out the OS. Just raise your priority, OwnBlitter, and render all the points. The procedure looks like this: start calculate points OwnBlitter() render points DisownBlitter() loop to start This is friendly because while you are calculating (not using the blitter) other programs can use the blitter. Of course Mike and the other 'kill the OS' people will suggest that you Disable() all interupts and do your calculations while the blitter is running in parallel. This is fine if you doing a game, but if it's something like a 3d modeler I'd advise against it. Owning the blitter for long periods of time is very unfriendly. I seem to recall someone posted a 'really fast' line drawing routine a while back. It was highly optmized for speed using the CPU (not the blitter. It also was very long because loops were unrolled and it had lots of special cases for different line slopes.) -- /~\_______________________________________________________________________/~\ |n| rjc@albert.ai.mit.edu Amiga, the computer for the creative mind. |n| |~| .-. .-. |~| |_|________________________________| |_| |________________________________|_|
peterk@cbmger.UUCP (Peter Kittel GERMANY) (04/03/91)
In article <9529@star.cs.vu.nl> mmblom@cs.vu.nl (Marc Blom) writes: >'m wondering how fast 3D graphics can really go. I've written some rotation >and perspective routines (in assembly) and profiling shows that 60 % of the >time is spend in my routine RotatePoint. This routine rotates 1 point around >all three axes in only about 900 cycles minimum and about 1100 cycles maximum. >Is this fast ? Can someone do better ? (If so PLEASE mail me how you did it). All I hear permanently is, use tables, tables, tables instead of real (or integer) calculations. At least for the sin and cos calculations. Estimations about real-life situations show that you don't need tremendous precision normally (but that depends on your real project). Perhaps even 2D or 3D rotations could be simplified into one (more or less) big table. It's worth considering in any case. -- Best regards, Dr. Peter Kittel // E-Mail to \\ Only my personal opinions... Commodore Frankfurt, Germany \X/ {uunet|pyramid|rutgers}!cbmvax!cbmger!peterk
chrisg@cbmvax.commodore.com (Chris Green) (04/04/91)
In article <1047@cbmger.UUCP> peterk@cbmger.UUCP (Peter Kittel GERMANY) writes: >In article <9529@star.cs.vu.nl> mmblom@cs.vu.nl (Marc Blom) writes: >>'m wondering how fast 3D graphics can really go. I've written some rotation >>and perspective routines (in assembly) and profiling shows that 60 % of the >>time is spend in my routine RotatePoint. This routine rotates 1 point around >>all three axes in only about 900 cycles minimum and about 1100 cycles maximum. >>Is this fast ? Can someone do better ? (If so PLEASE mail me how you did it). Doing a quick cycle count on my routine, it looks like 485 worst case. 60% is really high to spend on rotation. 10% is more likely for an amiga. You ARE using a 3x3 matrix and not 3 separate axis rotations, right? -- *-------------------------------------------*---------------------------* |Chris Green - Graphics Software Engineer - chrisg@commodore.COM f | Commodore-Amiga - uunet!cbmvax!chrisg n |My opinions are my own, and do not - killyouridolssonicdeath o |necessarily represent those of my employer.- itstheendoftheworld r *-------------------------------------------*---------------------------d
zap@lysator.liu.se (Zap Andersson) (04/04/91)
peterk@cbmger.UUCP (Peter Kittel GERMANY) writes: >In article <9529@star.cs.vu.nl> mmblom@cs.vu.nl (Marc Blom) writes: >>'m wondering how fast 3D graphics can really go. I've written some rotation >>and perspective routines (in assembly) and profiling shows that 60 % of the >>time is spend in my routine RotatePoint. This routine rotates 1 point around >>all three axes in only about 900 cycles minimum and about 1100 cycles maximum. Instead of rotation around one axis at the time, you may create a rotation matrix, that rotates around all three axis simultaneously, with 9 multiplies. The tecnique is picked up from any elementary cumputer graphics book, but simply, it is: NewX = OldX * XX + OldY * XY + OldZ * XZ NewY = OldY * YX + OldY * YY + OldZ * YZ New>Z = OldZ * ZX + OldY * ZY + OldZ * ZZ where XX,XY,XZ,YX,YY,YZ,ZX,ZY,ZZ is your rotation matrix. You can think of XX,XY,XZ as the (new) direction of the old X axis, and YX,YY,YZ as the (new) direction of the old Y axis and so forth. You may easily build these by taking the points: XX = 1, XY = 0, XZ = 0 YX = 0, YY = 1, YZ = 00 ZX = 0, ZY = 0, ZZ = 1 And rotate these using "your" method. Then Use these to rotate the other points. >-- >Best regards, Dr. Peter Kittel // E-Mail to \\ Only my personal opinions... >Commodore Frankfurt, Germany \X/ {uunet|pyramid|rutgers}!cbmvax!cbmger!peterk -- * * * * * * * * * * * * * * * * * (This rent for space) * My signature is smaller than * Be warned! The letter 'Z' is Copyright 1991 * yours! - zap@lysator.liu.se * by Zap Inc. So are the colors Red, Green and * * * * * * * * * * * * * * * * * Greenish-yellow (Blue was taken by IBM) -- * * * * * * * * * * * * * * * * * (This rent for space) * My signature is smaller than * Be warned! The letter 'Z' is Copyright 1991 * yours! - zap@lysator.liu.se * by Zap Inc. So are the colors Red, Green and * * * * * * * * * * * * * * * * * Greenish-yellow (Blue was taken by IBM)
mmblom@cs.vu.nl (Marc Blom) (04/05/91)
rjc@geech.gnu.ai.mit.edu (Ray Cromwell) wrote: > Well, this all depends on what you are trying to do. Do you want to do >a flight simulator, or are you doing an animation/demo/model rotation? >If the latter, you can precompute all the rotations before run time, and >just render them via lookup table. Nope, a lookup table won't do, i'm hoping to write a elite like space game some day so i don't know how everything has to be rotated in advance. > As for the OS, it really has nothing to do with your project. What needs >to be optimized is the critical section of your rotation routine. I'd suggest >caching the points and computing all the rotations first, then render The OS has everything to do with it !, profiling shows that 85 % of the time is spend in the OS, since i'm only using AreaMove, AreaDraw, AreaEnd & SetAPen, optimising my 3d-routines alone won't do me much good. I have to come up with something faster to replace these OS calls. >them later all at once. I have the feeling you're doing something like this: > >loop { > calculate point > render point > get next point >} end loop What I'm doing looks something like this: loop { discard objects that are not inside sight-pyramid get sin/cos values for this object out of a large table loop { rotate normal of next surface discard surfaces with normal pointing the wrong way (ie. not to screen) loop { rotate next point (of this surface) put point in perspective } end loop } end loop }end loop Sort remaining surfaces (comparing average Z-coords, qsort, only pointers are swapped) loop { Draw next surface /* closest is drawn last */ } end loop > I seem to recall someone posted a 'really fast' line drawing >routine a while back. It was highly optmized for speed using the >CPU (not the blitter. It also was very long because loops were unrolled >and it had lots of special cases for different line slopes.) Love to hear from the one that posted this ! (or someone who has it) peterk@cbmger.UUCP (Peter Kittel GERMANY) wrote: >All I hear permanently is, use tables, tables, tables instead of real >(or integer) calculations. At least for the sin and cos calculations. >Estimations about real-life situations show that you don't need tremendous >precision normally (but that depends on your real project). Perhaps even >2D or 3D rotations could be simplified into one (more or less) big table. >It's worth considering in any case. I wish I knew how to use tables without taking too much memory: I'm using 360 degrees around each angle, so to multiply a coordinate with a sin/cos (this is one table) value i would need 360 words PER coordinate, I'm now using coordinates that range from -16K to + 16K. I don't even want to calculate this ! (22.5 meg !) Offcourse i could use less percision, for instance only rotating by an even number of degrees (0,2,4...,360) and don't use such a large a coordinate system, but i don't want to compromise between speed/presicion just yet ! +--------------------------------------------+------------------------+ | Marc Blom | Email: mmblom@cs.vu.nl | What IS the question ? | | Gondel 14/13 | | | | 8243 BL Lelystad | Phone: 03200-46233 (NL) | TO OS OR NOT TO OS, | | The Netherlands | | that is the question ! | +------------------+-------------------------+------------------------+
peterk@cbmger.UUCP (Peter Kittel GERMANY) (04/05/91)
In article <9554@star.cs.vu.nl> mmblom@cs.vu.nl (Marc Blom) writes: > >peterk@cbmger.UUCP (Peter Kittel GERMANY) wrote: >>All I hear permanently is, use tables, tables, tables instead of real >>(or integer) calculations. At least for the sin and cos calculations. >>Estimations about real-life situations show that you don't need tremendous >>precision normally (but that depends on your real project). > >I wish I knew how to use tables without taking too much memory: I'm using >360 degrees around each angle, so to multiply a coordinate with a sin/cos >(this is one table) value i would need 360 words PER coordinate, No, you can cut down this by a factor of 4 by making use of the identities of sin and cos for offsets of n*90 degrees. This is common technique. > I'm now >using coordinates that range from -16K to + 16K. I don't even want to >calculate this ! (22.5 meg !) Offcourse i could use less percision, for >instance only rotating by an even number of degrees (0,2,4...,360) and >don't use such a large a coordinate system, but i don't want to compromise >between speed/presicion just yet ! Hmm, why is the size of the coordinate system important? In first attempt I would only make a table of sin and cos values in some arbitrary stepping. If you need high precision, then for only one quadrant in small steps; else for the whole circle in bigger steps. In this system you save the sin/cos evaluation, but you have still to multiply your coordinates with these factors. Another thought: If you don't have so many objects, would it be of advantage to build tables of every object coordinate already multiplied with all these sin/cos values? That should save you some more multipli- cations in the innermost loops. But the number of stored table entries has squared, now there you would have to choose very carefully. Anyone already done this? -- Best regards, Dr. Peter Kittel // E-Mail to \\ Only my personal opinions... Commodore Frankfurt, Germany \X/ {uunet|pyramid|rutgers}!cbmvax!cbmger!peterk
bombadil@diku.dk (Kristian Nielsen) (04/06/91)
mmblom@cs.vu.nl (Marc Blom) writes: > >> I seem to recall someone posted a 'really fast' line drawing >>routine a while back. It was highly optmized for speed using the >>CPU (not the blitter. It also was very long because loops were unrolled >>and it had lots of special cases for different line slopes.) > >Love to hear from the one that posted this ! (or someone who has it) > > I think this probably refers to a rutine posted by John Schultz; I'll try and see if I've still got it, or he might repost (again :-) ). However, meanwhile, you might want to take a look at this. It implements line draws to a 4-plane bitmap, and includes a small demo. The main trick is not to operate on x/y coords in the drawing loop, but instead manipulate screen addresses and masks; also, plotting points is done while avoiding looping through each bitplane. Note that the slope of a line is calculated using a divide instruction - this could be optimised by using a slightly different algorithm. Also, I agree - this rutine IS a bit long. Kristian Following long source (for Hisoft DevPac 2): * Test-call. INCDIR Include/ INCLUDE exec/exec_lib.i INCLUDE intuition/intuition.i INCLUDE intuition/intuition_lib.i TEST moveq #0,d0 lea iname(pc),a1 movea.l 4.w,a6 jsr _LVOOpenLibrary(a6) move.l d0,ibase beq noint movea.l d0,a6 lea ns(pc),a0 jsr _LVOOpenScreen(a6) move.l d0,screen beq noscreen move.l d0,scrptr lea nw(pc),a0 jsr _LVOOpenWindow(a6) move.l d0,window beq nowind movea.l d0,a4 movea.l wd_RPort(a4),a4 movea.l rp_BitMap(a4),a4 lea bm_Planes(a4),a4 moveq #5,d7 loop movea.l screen(pc),a0 move.w sc_MouseX(a0),d0 move.w sc_MouseY(a0),d1 move.w #160,d2 move.w #128,d3 movem.l (a4),a0-a3 addq #1,d7 andi.w #$000f,d7 movem.l d0-d7/a0-a6,-(a7) bsr CpuLine movem.l (a7)+,d0-d7/a0-a6 movea.l 4.w,a6 movea.l window(pc),a0 movea.l wd_UserPort(a0),a0 jsr _LVOGetMsg(a6) tst.l d0 beq.s loop movea.l d0,a1 jsr _LVOReplyMsg(a6) movea.l window(pc),a0 movea.l ibase(pc),a6 jsr _LVOCloseWindow(a6) nowind movea.l screen(pc),a0 jsr _LVOCloseScreen(a6) noscreen movea.l 4.w,a6 movea.l ibase(pc),a1 jsr _LVOCloseLibrary(a6) noint rts ns dc.w 0,0,320,256,4 dc.b -1,-1 dc.w 0 dc.w CUSTOMSCREEN dc.l 0,0,0,0 nw dc.w 0,1,320,255 dc.b -1,-1 dc.l CLOSEWINDOW dc.l WINDOWCLOSE|WINDOWDRAG dc.l 0,0 dc.l windowtitle scrptr ds.l 1 dc.l 0 dc.w 0,0,0,0,CUSTOMSCREEN window ds.l 1 screen ds.l 1 ibase ds.l 1 iname dc.b 'intuition.library',0 windowtitle dc.b '*FAST*! Cpu line rutine - demo.',0 EVEN ************************************************************************ * Fast CPU line rutine. * * Main Entry point: Draw a line from (d0,d1) to (d2,d3). * a0-a3 are plane pointers. * d7 is line color 0-15. ************************************************************************ CpuLine cmp.w d1,d3 ;For simplicy, we don't ever draw a line bge.s CpuLine2 ;upwards; instead, the points are swapped. exg d0,d2 exg d1,d3 ;Secondary entry point: Enter here if you know for sure that d1<=d3. CpuLine2 * Calculate jump-offset from color-value. add.w d7,d7 add.w d7,d7 * Calculation of the memory address of the starting point. * NOTE! This is specific for a 40-byte wide screen. * (It could be argued that this is not very general... also I am not certain * that a mulu wouldn't be quicker, at least on some members of the 680xx * family. Still room for improvement.... * Also, I use only WORD access to display memory. This makes for full speed * in old 16bit chipmem, but is slower (I guess) than longword access on an * A3000, should you be so lucky as to own one of those beasts. I guess that * there are just too many different amigas out there for the perfect optimi- * zation. move.w d1,d4 add.w d4,d4 add.w d4,d4 ;y*4 add.w d1,d4 ;y*5 lsl.w #3,d4 ;*40 move.w d0,d5 lsr.w #4,d5 add.w d5,d5 add.w d5,d4 ;d4=offset into bitplanes. add.w d4,a0 add.w d4,a1 add.w d4,a2 add.w d4,a3 move.w d0,d4 andi.w #$000f,d4 ;d4=bit no (from the left) of first point * Now determine whether the line slants to the left or to the right. sub.w d1,d3 ;The end-point isn't really usefull, but sub.w d0,d2 ;the vector from (d0,d1) is! blt lineleft ;Our line is slanting to the left. * Initialising some values. move.w #$8000,d0 ;Mask for set pixel. move.w #$7fff,d1 ;Mask for clear pixel. * (Yes I know - d0 is not needed for color 0 + same for d1. Feel free to * optimize further if you use these colors a lot...) move.w d1,d5 ;Store just less than 0.5 for rounding. ror.w d4,d0 ror.w d4,d1 ;Get bit in correct position. ;NOTE - Change this for different plane width. moveq #40,d4 ;Width of planes in bytes. * To draw the line with no 'holes' in it (and no overlap of points either), * either the x-coord or the y-coord has to be changed by one for each point * plotted, while the other is only changed occationally. cmp.w d2,d3 blt alwaysright ;Jump if x is to be changed always. * Now we have to calculate the 'slant' of the line, ie. what persentage * of the time do we have to increment the x-coord. * Now if only i knew which of divu and divs were faster.... * First, however, avoid division by zero. tst.w d3 beq singlepoint ;Branch to draw only a single point. * Need special case for d2=d3 to avoid overflow. cmp.w d2,d3 beq.s deg45A swap d2 clr.w d2 ;Need to calculate x*2^16/y. divu d3,d2 ;d2 now contains line-slant. jmp jump_tabA(pc,d7.w);Pick right loop from color. deg45A move.l #$FFFF,d2 jmp jump_tabA(pc,d7.w);Pick right loop from color. jump_tabA * This junk just creates 16 branches to labels tightloopA0..15. * The most tricky bit is the 'label\<num>' bit. This expands to a label * with the decimal representation of num at its end, so if num=5, it * gives 'label5'. With my Devpac 2, this works nicely; hopefully it is * also supported by other assemblers (else buy Devpac - it's great!) bradefA MACRO bra tightloopA\<num> ENDM num SET 0 REPT 16 bradefA num SET num+1 ENDR * This creates the 16 loops, one for each color. loopdefA MACRO * This is the loop that needs to be optimized real hard. * It outputs one point with only * always 11 instructions, sometimes 14, occasionally 18. * this is 12 words, 15 words, 19 words. tightloopA\<num> * Handle bit 0. IFEQ (num)&1 and.w d1,(a0) ELSEIF or.w d0,(a0) ENDC * Handle bit 1. IFEQ (num)&2 and.w d1,(a1) ELSEIF or.w d0,(a1) ENDC * Handle bit 2. IFEQ (num)&4 and.w d1,(a2) ELSEIF or.w d0,(a2) ENDC * Handle bit 3. IFEQ (num)&8 and.w d1,(a3) ELSEIF or.w d0,(a3) ENDC add.w d4,a0 add.w d4,a1 add.w d4,a2 add.w d4,a3 ;increase y-coord. add.w d2,d5 bcc.s gonextA\<num> ;Check if x is to be increased. IFNE num-15 ;Clear bit not needed for color15. ror.w #1,d1 ENDC IFNE num ;Set bit not needed for color 0. ror.w #1,d0 bcc.s gonextA\<num> ;Check for word-boundary. ELSEIF bcs.s gonextA\<num> ENDC addq.w #2,a0 ;Need to handle word boundary. addq.w #2,a1 addq.w #2,a2 addq.w #2,a3 gonextA\<num> dbra d3,tightloopA\<num> rts ENDM num SET 0 REPT 16 loopdefA num SET num+1 ENDR * Following are the remaining 3 octants. alwaysright * Now we have to calculate the 'slant' of the line, ie. what persentage * of the time do we have to increment the y-coord. * Now if only i knew which of divu and divs were faster.... * First, however, avoid division by zero. tst.w d2 beq singlepoint ;Branch to draw only a single point. swap d3 clr.w d3 ;Need to calculate y*2^16/x. divu d2,d3 ;d2 now contains line-slant. jmp jump_tabC(pc,d7.w);Pick right loop from color. jump_tabC * This junk just creates 16 branches to labels tightloopC0..15. * The most tricky bit is the 'label\<num>' bit. This expands to a label * with the decimal representation of num at its end, so if num=5, it * gives 'label5'. With my Devpac 2, this works nicely; hopefully it is * also supported by other assemblers (else buy Devpac - it's great!) bradefC MACRO bra tightloopC\<num> ENDM num SET 0 REPT 16 bradefC num SET num+1 ENDR * This creates the 16 loops, one for each color. loopdefC MACRO * This is the loop that needs to be optimized real hard. * It outputs one point with only * always 11 instructions, sometimes 14, occasionally 18. * this is 12 words, 15 words, 19 words. * * This could be optimized by plotting points in 4 registers instead of * going directly to memory. This would improve very flat lines, but would * slow down lines nearer to 45 degrees unless another special case were * added; I guess I was just too lazy. tightloopC\<num> * Handle bit 0. IFEQ (num)&1 and.w d1,(a0) ELSEIF or.w d0,(a0) ENDC * Handle bit 1. IFEQ (num)&2 and.w d1,(a1) ELSEIF or.w d0,(a1) ENDC * Handle bit 2. IFEQ (num)&4 and.w d1,(a2) ELSEIF or.w d0,(a2) ENDC * Handle bit 3. IFEQ (num)&8 and.w d1,(a3) ELSEIF or.w d0,(a3) ENDC IFNE num-15 ;Clear bit not needed for color15. ror.w #1,d1 ENDC IFNE num ;Set bit not needed for color 0. ror.w #1,d0 bcc.s checkC\<num> ;Check for word-boundary. ELSEIF bcs.s checkC\<num> ENDC addq.w #2,a0 ;Need to handle word boundary. addq.w #2,a1 addq.w #2,a2 addq.w #2,a3 checkC\<num> add.w d3,d5 bcc.s gonextC\<num> ;Check if x is to be increased. adda.w d4,a0 adda.w d4,a1 adda.w d4,a2 adda.w d4,a3 ;increase y-coord. gonextC\<num> dbra d2,tightloopC\<num> rts ENDM num SET 0 REPT 16 loopdefC num SET num+1 ENDR lineleft neg.w d2 ;get abs. value of x. * Initialising some values. move.w #$8000,d0 ;Mask for set pixel. move.w #$7fff,d1 ;Mask for clear pixel. * (Yes I know - d0 is not needed for color 0 + similar for d1. Feel free to * optimize further if you use these colors a lot...) move.w d1,d5 ;Store just less than 0.5 for rounding. ror.w d4,d0 ror.w d4,d1 ;Get bit in correct position. ;NOTE - Change this for different plane width. moveq #40,d4 ;Width of planes in bytes. * To draw the line with no 'holes' in it (and no overlap of points either), * either the x-coord or the y-coord has to be changed by one for each point * plotted, while the other is only changed occationally. cmp.w d2,d3 blt alwaysleft ;Jump if x is to be changed always. * Now we have to calculate the 'slant' of the line, ie. what persentage * of the time do we have to increment the x-coord. * Now if only i knew which of divu and divs were faster.... * First, however, avoid division by zero. tst.w d3 beq singlepoint ;Branch to draw only a single point. cmp.w d3,d2 beq.s deg45B swap d2 clr.w d2 ;Need to calculate x*2^16/y. divu d3,d2 ;d2 now contains line-slant. jmp jump_tabB(pc,d7.w);Pick right loop from color. deg45B move.w #$FFFF,d2 jmp jump_tabB(pc,d7.w);Pick right loop from color. jump_tabB * This junk just creates 16 branches to labels tightloopA0..15. * The most tricky bit is the 'label\<num>' bit. This expands to a label * with the decimal representation of num at its end, so if num=5, it * gives 'label5'. With my Devpac 2, this works nicely; hopefully it is * also supported by other assemblers (else buy Devpac - it's great!) bradefB MACRO bra tightloopB\<num> ENDM num SET 0 REPT 16 bradefB num SET num+1 ENDR * This creates the 16 loops, one for each color. loopdefB MACRO * This is the loop that needs to be optimized real hard. * It outputs one point with only * always 11 instructions, sometimes 14, occasionally 18. * this is 12 words, 15 words, 19 words. tightloopB\<num> * Handle bit 0. IFEQ (num)&1 and.w d1,(a0) ELSEIF or.w d0,(a0) ENDC * Handle bit 1. IFEQ (num)&2 and.w d1,(a1) ELSEIF or.w d0,(a1) ENDC * Handle bit 2. IFEQ (num)&4 and.w d1,(a2) ELSEIF or.w d0,(a2) ENDC * Handle bit 3. IFEQ (num)&8 and.w d1,(a3) ELSEIF or.w d0,(a3) ENDC add.w d4,a0 add.w d4,a1 add.w d4,a2 add.w d4,a3 ;increase y-coord. add.w d2,d5 bcc.s gonextB\<num> ;Check if x is to be increased. IFNE num-15 ;Clear bit not needed for color15. rol.w #1,d1 ENDC IFNE num ;Set bit not needed for color 0. rol.w #1,d0 bcc.s gonextB\<num> ;Check for word-boundary. ELSEIF bcs.s gonextB\<num> ENDC subq.w #2,a0 ;Need to handle word boundary. subq.w #2,a1 subq.w #2,a2 subq.w #2,a3 gonextB\<num> dbra d3,tightloopB\<num> rts ENDM num SET 0 REPT 16 loopdefB num SET num+1 ENDR alwaysleft * Now we have to calculate the 'slant' of the line, ie. what persentage * of the time do we have to increment the y-coord. * Now if only i knew which of divu and divs were faster.... * First, however, avoid division by zero. tst.w d2 beq singlepoint ;Branch to draw only a single point. swap d3 clr.w d3 ;Need to calculate y*2^16/x. divu d2,d3 ;d2 now contains line-slant. jmp jump_tabD(pc,d7.w);Pick right loop from color. jump_tabD * This junk just creates 16 branches to labels tightloopD0..15. * The most tricky bit is the 'label\<num>' bit. This expands to a label * with the decimal representation of num at its end, so if num=5, it * gives 'label5'. With my Devpac 2, this works nicely; hopefully it is * also supported by other assemblers (else buy Devpac - it's great!) bradefD MACRO bra tightloopD\<num> ENDM num SET 0 REPT 16 bradefD num SET num+1 ENDR * This creates the 16 loops, one for each color. loopdefD MACRO * This is the loop that needs to be optimized real hard. * It outputs one point with only * always 11 instructions, sometimes 14, occasionally 18. * this is 12 words, 15 words, 19 words. * * This could be optimized by plotting points in 4 registers instead of * going directly to memory. This would improve very flat lines, but would * slow down lines nearer to 45 degrees unless another special case were * added; I guess I was just too lazy. tightloopD\<num> * Handle bit 0. IFEQ (num)&1 and.w d1,(a0) ELSEIF or.w d0,(a0) ENDC * Handle bit 1. IFEQ (num)&2 and.w d1,(a1) ELSEIF or.w d0,(a1) ENDC * Handle bit 2. IFEQ (num)&4 and.w d1,(a2) ELSEIF or.w d0,(a2) ENDC * Handle bit 3. IFEQ (num)&8 and.w d1,(a3) ELSEIF or.w d0,(a3) ENDC IFNE num-15 ;Clear bit not needed for color15. rol.w #1,d1 ENDC IFNE num ;Set bit not needed for color 0. rol.w #1,d0 bcc.s checkD\<num> ;Check for word-boundary. ELSEIF bcs.s checkD\<num> ENDC subq.w #2,a0 ;Need to handle word boundary. subq.w #2,a1 subq.w #2,a2 subq.w #2,a3 checkD\<num> add.w d3,d5 bcc.s gonextD\<num> ;Check if x is to be increased. adda.w d4,a0 adda.w d4,a1 adda.w d4,a2 adda.w d4,a3 ;increase y-coord. gonextD\<num> dbra d2,tightloopD\<num> rts ENDM num SET 0 REPT 16 loopdefD num SET num+1 ENDR * Our (silly) caller passed us the same point for start and end - just give * him the single lousy point he wants. singlepoint jmp jump_tabE(pc,d7.w);Pick right plotrutine from color. jump_tabE * This junk just creates 16 branches to labels plotE0..15. * The most tricky bit is the 'label\<num>' bit. This expands to a label * with the decimal representation of num at its end, so if num=5, it * gives 'label5'. With my Devpac 2, this works nicely; hopefully it is * also supported by other assemblers (else buy Devpac - it's great!) bradefE MACRO bra plotE\<num> ENDM num SET 0 REPT 16 bradefE num SET num+1 ENDR plotdefE MACRO * Plot a point in color num. * Handle bit 0. plotE\<num> IFEQ (num)&1 and.w d1,(a0) ELSEIF or.w d0,(a0) ENDC * Handle bit 1. IFEQ (num)&2 and.w d1,(a1) ELSEIF or.w d0,(a1) ENDC * Handle bit 2. IFEQ (num)&4 and.w d1,(a2) ELSEIF or.w d0,(a2) ENDC * Handle bit 3. IFEQ (num)&8 and.w d1,(a3) ELSEIF or.w d0,(a3) ENDC rts ENDM num SET 0 REPT 16 plotdefE num SET num+1 ENDR
holgerl@amiux.UUCP (Holger Lubitz) (04/06/91)
In article <1991Apr3.085847.14287@mintaka.lcs.mit.edu> rjc@geech.gnu.ai.mit.edu (Ray Cromwell) writes: > I seem to recall someone posted a 'really fast' line drawing >routine a while back. It was highly optmized for speed using the >CPU (not the blitter. It also was very long because loops were unrolled >and it had lots of special cases for different line slopes.) I probably missed it. Could any kind soul who archived it email it to me or simply repost ? Thanks in advance, Holger -- Holger Lubitz | holgerl@amiux.uucp Kl. Drakenburger Str. 24 | holgerl@amiux.han.de D-W-3070 Nienburg | cbmvax.commodore.com!cbmehq!cbmger!amiux!holgerl
baker@wbc.enet.dec.com (04/06/91)
Actually, back in the late 70's or early 80's, I saw a paper by someone who was doing realtime rotations of a 3-D data-plot on an 8-bit micro. Using a table or two, he had reduced the rotation calculations to a couple integer multiplies and a shift. I don't remember the algorithm, but I can probably find the reference. Regards, ********************************************************* * Art Baker | "Read ME, * * baker@wbc.enet.dec.com | Dr Memory..." * * PLINK: A*BAKER | * *********************************************************
jonabbey@cs.utexas.edu (Jonathan David Abbey) (04/06/91)
In article <9554@star.cs.vu.nl> mmblom@cs.vu.nl (Marc Blom) writes: |I wish I knew how to use tables without taking too much memory: I'm using |360 degrees around each angle, so to multiply a coordinate with a sin/cos |(this is one table) value i would need 360 words PER coordinate, I'm now |using coordinates that range from -16K to + 16K. I don't even want to |calculate this ! (22.5 meg !) Offcourse i could use less percision, for |instance only rotating by an even number of degrees (0,2,4...,360) and |don't use such a large a coordinate system, but i don't want to compromise |between speed/presicion just yet ! | Use quanta of rotation 1/256th of a circle. That way you can do rotations with a single indexed offset into your trig tables. Short, sweet, and oh so quick. >+--------------------------------------------+------------------------+ >| Marc Blom | Email: mmblom@cs.vu.nl | What IS the question ? | >| Gondel 14/13 | | | >| 8243 BL Lelystad | Phone: 03200-46233 (NL) | TO OS OR NOT TO OS, | >| The Netherlands | | that is the question ! | >+------------------+-------------------------+------------------------+ -- ------------------------------------------------------------------------------- Jonathan David Abbey \"Take your place on the great Mandela" P,P&M the university of texas at austin \ jonabbey@cs.utexas.edu "Love me, love computer science/math?/psychology? \ (512) 472-2052 my Amiga" -Me
limonce@pilot.njin.net (Tom Limoncelli +1 201 408 5389) (04/06/91)
In article <9529@star.cs.vu.nl> mmblom@cs.vu.nl (Marc Blom) writes: > 'm wondering how fast 3D graphics can really go. I've written some rotation > and perspective routines (in assembly) and profiling shows that 60 % of the > time is spend in my routine RotatePoint. This routine rotates 1 point around > all three axes in only about 900 cycles minimum and about 1100 cycles max. Yow! You're using the wrong algorithm! I recommend "Computer Graphics, principles and practice" by Foley, vanDam, Feiner & Hughes. Chap 5 has some better ways of doing this kind of stuff. It really makes you appreciate those linear algebra classes that were so tempting to sleep through. They bring a big rotate/scale/move routine down to one precomputation then a matrix-multiple for every point in the display. By doing one precomputation you save a LOT of work for all those other points. Also, comp.graphics has a Monthly Posting Of Frequently Asked Questions (and their Answers). It includes *many* good tips. Some of them are actual simple formulas, some are snotty "here's a list of books... go to the library!" but, hey, the net isn't a replacement for good research. Tom -- One thousand, one hundred, seventy five people died of AIDS last week. Did someone mention a war in Iraq?
Alex_Topic@resbbs.UUCP (Alex Topic) (04/06/91)
Well talking about 3d graphics.. Which the BLITTER should be used for this type of thing on the Amiga. Do you have any idea if CBM is working on a new CUSTOM CHIP set.. I mean with a fast blitter, more sprites...ect I know the Amiga is a nice put together computer.. But they could of made it a bit better. It should of had upto 256 colors in Low res, plus the SPRITES could of atleast been alittle more powerfull. ONe system I like is the NEO GEO.. it has 300+ Sprites.. 2048 on screen colors... God knows if they have a BLITTER type chip.. Oh well dont take me as a complainer, I think the Amiga has alot power for the price, but it could have had more abilities.. oh well Cya all around.. A.t.
ben@epmooch.UUCP (Rev. Ben A. Mesander) (04/06/91)
>In article <9576@star.cs.vu.nl> mmblom@cs.vu.nl (Marc Blom) writes: [...] >I think i misunderstood you, I am allready using tables for the sin/cos >values, I thought you meant tables for the multiplies, then the coordinate >system's size is very important, the larger your coordinate system, the >larger the multable. I think often you can reduce the size of the multable by carefully looking at your algorithms. I've done something like this for graphics in Z-80 assembler, where I used properties of the screen addressing to reduce my multable (everything had to fit in 32K) similarly to the trig identities for reducing the sin/cos table. Look at your problem very carefully... >Marc Blom, Vrije Universiteit Amsterdam, Email: mmblom@cs.vu.nl | short .sig -- | ben@epmooch.UUCP (Ben Mesander) | "Cash is more important than | | ben%servalan.UUCP@uokmax.ecn.uoknor.edu | your mother." - Al Shugart, | | !chinet!uokmax!servalan!epmooch!ben | CEO, Seagate Technologies |
mmblom@cs.vu.nl (Marc Blom) (04/07/91)
Newsgroups: comp.sys.amiga.programmer Subject: Re: Fast 3d Graphics Summary: Expires: References: <20338@cbmvax.commodore.com> <9554@star.cs.vu.nl> <1073@cbmger.UUCP> Sender: Followup-To: Distribution: Organization: Fac. Wiskunde & Informatica, VU, Amsterdam Keywords: fast 3d optimizing assembly cycles In article <1073@cbmger.UUCP> peterk@cbmger.UUCP (Peter Kittel GERMANY) writes: >In article <9554@star.cs.vu.nl> mmblom@cs.vu.nl (Marc Blom) writes: >> >> >>I wish I knew how to use tables without taking too much memory: I'm using >>360 degrees around each angle, so to multiply a coordinate with a sin/cos >>(this is one table) value i would need 360 words PER coordinate, > >No, you can cut down this by a factor of 4 by making use of the identities >of sin and cos for offsets of n*90 degrees. This is common technique. > Yes, I know and i am using this, i just forgot it. >> I'm now >>using coordinates that range from -16K to + 16K. I don't even want to >>calculate this ! (22.5 meg !) Offcourse i could use less percision, for >>instance only rotating by an even number of degrees (0,2,4...,360) and >>don't use such a large a coordinate system, but i don't want to compromise >>between speed/presicion just yet ! > >Hmm, why is the size of the coordinate system important? >In first attempt I would only make a table of sin and cos values in some >arbitrary stepping. If you need high precision, then for only one quadrant >in small steps; else for the whole circle in bigger steps. In this system >you save the sin/cos evaluation, but you have still to multiply your >coordinates with these factors. > I think i misunderstood you, I am allready using tables for the sin/cos values, I thought you meant tables for the multiplies, then the coordinate system's size is very important, the larger your coordinate system, the larger the multable. >-- >Best regards, Dr. Peter Kittel // E-Mail to \\ Only my personal opinions... >Commodore Frankfurt, Germany \X/ {uunet|pyramid|rutgers}!cbmvax!cbmger!peterk -- Marc Blom, Vrije Universiteit Amsterdam, Email: mmblom@cs.vu.nl | short .sig
galetti@uservx.afwl.af.mil (04/09/91)
In article <20338@cbmvax.commodore.com>, chrisg@cbmvax.commodore.com (Chris Green) writes: > > Doing a quick cycle count on my routine, it looks like 485 worst case. > 60% is really high to spend on rotation. 10% is more likely for an amiga. > You ARE using a 3x3 matrix and not 3 separate axis rotations, right? > Does your routine rotate about an arbitrary axis or about one of the principle axes? If it rotates about an arbitrary axis, do you assume the axis is fixed or do you allow it to vary? If you allow it to vary, there are a few additional multiplications you have to perform to set up the rotation matrix, not to mention a few additional trig. functions. For an arbitrary axis, 485 cycles sounds awfully low. > -- > *-------------------------------------------*---------------------------* > |Chris Green - Graphics Software Engineer - chrisg@commodore.COM f > | Commodore-Amiga - uunet!cbmvax!chrisg n > |My opinions are my own, and do not - killyouridolssonicdeath o > |necessarily represent those of my employer.- itstheendoftheworld r > *-------------------------------------------*---------------------------d ___________________________________________________________________________ / Ralph Galetti Internet: galetti@uservx.afwl.af.mil \ | PL/LITT Interests: computers, music, computers | | Kirtland AFB, NM 87117-6008 and music, golf, sleep. | \__"No, they couldn't actually prove that it was HIS vomit" - Nigel Tufnel__/