cs326ag@ux1.cso.uiuc.edu (Loren J. Rittle) (05/12/91)
In article <rw6H$#t1@cs.psu.edu> melling@cs.psu.edu (Michael D Mellinger) writes: >Wimp, chicken, afraid you might lose... :-) :-) First off, I agree with Ethan (thus this is why you have not seen me in this *mess* before today)! Only the better man can restrain himself in these flame wars, or so they say... :-) :-) One point Ethan, zero points Mike. No really, I'm glad to see that you are starting to ask the `right' questions instead of just flaming, etc. >Actually, could you answer(or someone who knows) my blitter vs. the >040 question? I think that it is definitely legitimate. Consider it >a purely Amiga question. There must be more to the blitter. OK Mike, I'll will assume that you are being true to your word and that you are not trying to start a flame war over this... Let me state that all information contained within this post was taken from BlitLab (ON A FISH DISK!) v1.3 by Tomas Rokicki of Radical Eye Software. [side plug, if you want to know about the blitter, get this package!] In turn, Tomas took all his information from: * Hardware Manual * ROM Kernel Manual * the ROM's * lots of empirical testing done by himself. Tomas wrote (a long while ago, so forgive the fact that he calls graphics memory, CHIP memory :-): "The blitter comprises part of the Agnes chip in the Amiga, and can only access CHIP memory. (CHIP memory is the lower 512K of memory in the current Amiga models, but this may change.) To the 68000, it appears as a set of approximately twenty sixteen bit write only registers. It can access memory at 7.2 megabytes per second, or twice the bandwidth of the 68000 (although, as we shall see, it doesn't always run this fast.) Any video memory accesses can slow the blitter down, whether for screen refresh or for the 68000. For instance, the standard two bit deep high resolution workbench screen can slow the blitter down by approximately 30\%. A low resolution single bit plane screen can slow it down by about 8\%. A high resolution four bit plane screen can slow down the blitter by about 60\%. The blitter is so fast, however, that even with this handicap it performs its tasks many times faster than the 68000. [Please note that the slow downs Tomas talks about *would* effect any other processor talking with graphics memory... - LJR] "The blitter uses four DMA channels to perform its work; these are labeled A, B, and C (sources) and D (destination). Any or all of them may be disabled independently. The destination can be calculated from any of 256 possible logical equations on A, B, and C. The A and B sources can be shifted up to 15 bits to the right, and the first and last word in a line from the A source can be masked by a constant. Each of the four channels has its own modulo. The blitter also has an area fill and a line draw mode." This last paragraph describes the blitter! It is far more than a mere memory mover (the one task where the '030 and '040 can beat the blitter). The blitter can (for a complex example ;-), read from memory, shift, w/ mask read from another memory location, shift it, combine the results with any of the possible logical operation with two inputs and write to a final memory location (that could be the same as one of the first two) all in, as you will see below, 6 clock cycles! Granted the clock cycles are of the 7.12 MHz types, but show me the '030 *or* '040 code that can do the same in less time... Tomas goes on to say: "So, all of those fancy operations are fine and dandy, but just how fast is the blitter, anyway? This depends entirely on which DMA channels are turned on. You might be using a DMA channel as a constant, but unless it is turned on, it does not count against you. The minimum blitter cycle is four clocks; the maximum is eight. Use of the A register is always free. Use of the B register always adds two clocks to the blitter cycle. Uso of either C or D is free, but use of both adds another two clocks. Thus, a copy cycle, using A and D, takes four clocks per cycle; a copy cycle using B and D takes six clocks per cycle, and a generalized bit copy using B, C, and D takes eight clocks. When in line mode, each pixel takes eight clocks. "The clock is the 7.18 MHz system clock. To calculate the total time for the blit in microseconds, after setup, you use the equation $$t={nHW\over 7.18}$$ where $t$ is the time in microseconds, $n$ is the number of clocks per cycle, and $H$ and $W$ are the height and width of the blit, respectively. "Actually, this is a minimum time, which is strictly impossible. Display data fetches, 68000 cycles, and other operations can steal cycle bandwidth away from the blitter. One way to eliminate most of this overhead is to call the macro {\tt OFF\_DISPLAY} which turns off the display; this is not a friendly thing to do, however. Don't forget to call {\tt ON\_DISPLAY} after the blit is finished!" [Again, bear in mind that the 680x0 would be forced to deal with display data fetches, etc - LJR] Please forgive the TeX command in there, I didn't want to modify his words in any way. :-) >Otherwise, why would Commodore keep investing money in upgrading the >blitter when they could simply buy a cheap RISC(what's an 030 go for >these days) and save themselves a whole lot of money. Interesting point, but the blitter is still not too bad as compared to other solutions C= might pick up instead. The value/cost ratio has to be close to infinity due to the fact that C= makes them themselves in their NMOS foundries and the original research/development costs have long been covered. Until the blitter is getting beat by things that cost less, I don't see C= switching as it would require a large amount of upgrades to existing OS software. This does not mean that C= should not be looking into the problem now and waiting until it is too late! In sum, when doing `normal' operations such as copy memory, the blitter can be put to shame by an '030 [in the A2000 with '030 board moves will be about 1.7 times as fast. In the A3000 with it's about 2.8 (from CpuBlit.doc). The A3000 does better because the '030 can access the graphics memory in 32-bit chunks]. But with the *same* memory sub-system, an '040 would not do better than an '030 in either of the above listed cases! An '030 doing the memory copy is already saturating the graphics memory bus, so an '040 can't do better. The old von Neumann bottleneck wins again. Now, the `complex' operations that the blitter can do, *can't* *be* *touched* by an '030 or an '040. It's like comparing a software data encryption package with a custom chip solution. Running the algorithm on a general purpose CPU is NO match for the hand crafted hardware solution. The blitter is basically a highly refined memory mover *and* memory manipulator. The general purpose CPU just can't match it. To boot, it has a built-in line drawing mode and area fill between lines mode! The blitter is indeed a well matched co-processor for a graphics based home computer such as the Amiga. The design of the blitter also proves that throwing a lot of money at a problem is not always the best way to solve it. Let me say a few last words about that last (as of yet unbacked up) statement. Recently, many other computer companies have tried to solve the same problems that the Amiga blitter solved SIX years ago! Most are using RISC graphic engines with costly high preformance memory subsystems to keep up with data fetches and writes and *instruction* fetches. The blitter was designed so that once setup, only data fetches and writes need be considered. These RISC designs try (at more expense) to get around the memory I/O problem by adding an I-cache (and cache controller, etc all at MORE cost). The blitter approach has stood the test of time. It would be my hope that when the time is right, C= would update the blitter for 32-bit accesses and 4 the clock rate. This would yield a blitter that would go 8 times the speed of the current blitter (the magic number needed to get people to see that they should upgrade! :-). [hell, this is advocacy, thus this last paragraph. :-] Have a good day, [Mike, I'd like to see more of this side of you, it's much more pleasant.] Loren J. Rittle [send *all* mail replies to l-rittle@uiuc.edu or lrg7030@uxa.cso.uiuc.edu because this account is going away in under two days!]