alan@oz.nm.paradyne.com (Alan Lovejoy) (02/23/90)
In article <1990Feb19.174926.7899@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >In article <7404@pdn.paradyne.com> alan@oz.paradyne.com () writes: >>>[small BitBlts] >>>overhead of deciding what BitBlt to do completely swamps the BitBlting >>>itself. (Credit to Pike, once again, for pointing this out.) >> >>Sorry, no. It only takes roughly 50 cycles (on an 88k) to setup for a blit, >>if you aren't generating code on the fly (and for the common cases, you >>don't generate code on the fly). Since a character blit has a copyrect >>whose width is 7 to 15 (typically) and whose height is also 7 to 15, that >>means we need to blit roughly 10 words at roughly 10 cycles per word... > >I'd be interested to know exactly what you include under "setup". Pike's >numbers showed a much greater disparity the other way, counting *all* >the overhead. Consulting my source code reveals the following: Given two pointers to two bitmaps (source and destination) which are structs describing a) the pixel width and height of the bitmaps, b) the pixel depth, and c) the base address for the bits in the bitmap, and given the two origins (in source and destination) of the the copyrect and the extent (width and height) of the copyrect, my code can "set up" for the bitblt loop in 38 cycles on an 88k. Of course, this does not count procedure call overhead (other than LOCAL (as opposed to caller) register save/restore), and it does not account for clipping (which takes about 10 cycles if everything is within proper bounds). Add a small (less than 10) number of cycles for figuring out that one of the precooked blits can be used. Total overhead with clipping and two levels of procedure call (the first is the generic blit function which does clipping, determines which blit to use, perhaps generates the code for it, and then calls it) is very close to 60 cycles. So "roughly 50 cycles" was a slightly understated approximation, but not by much, especially since it is often possible to a) reuse the specific blit function several times once the correct one has been determined (and perhaps generated), and b) avoid clipping. I suspect the major reason for the low cycle count here is the 88k MUL instruction, which only has a four cycle latency, and which is fully pipelined (6 multiplies can be in the pipe at the same time). There are NO PIPELINE STALLS in my code! Since there are 5 multiplies by numbers which cannot be known at translation time, even an R3000 will have much worse cycle counts than the 88k for setup (add roughly 30 NO-OP cycles for an R3000, hundreds of cycles for most other CPUs). Also, I use EXTU and AND to do division and modulus by 32 (the number of bits per word); if one uses DIV, cycle counts will of course explode drastically. ____"Congress shall have the power to prohibit speech offensive to Congress"____ Alan Lovejoy; alan@pdn; 813-530-2211; AT&T Paradyne: 8550 Ulmerton, Largo, FL. Disclaimer: I do not speak for AT&T Paradyne. They do not speak for me. Mottos: << Many are cold, but few are frozen. >> << Frigido, ergo sum. >>
henry@utzoo.uucp (Henry Spencer) (02/24/90)
In article <7452@pdn.paradyne.com> alan@oz.paradyne.com (Alan Lovejoy) writes: >>>... It only takes roughly 50 cycles (on an 88k) to setup for a blit... >>I'd be interested to know exactly what you include under "setup". Pike's >>numbers showed a much greater disparity the other way, counting *all* >>the overhead. > >Consulting my source code reveals the following: > >Given two pointers to two bitmaps (source and destination) which are structs >describing... Where do you get those two pointers? For character blitting you'll need font lookup and possibly some other bits of overhead, depending on exactly how the stuff is being invoked by higher levels. The relevant measurement of overhead for character blitting is "end to end", including absolutely everything. However, done well, that's probably not going to add a whole lot more. That *is* an impressively fast setup you've got there. -- "The N in NFS stands for Not, | Henry Spencer at U of Toronto Zoology or Need, or perhaps Nightmare"| uunet!attcan!utzoo!henry henry@zoo.toronto.edu