[comp.sys.amiga.advocacy] Blitter

cs326ag@ux1.cso.uiuc.edu (Loren J. Rittle) (05/12/91)

In article <rw6H$#t1@cs.psu.edu> melling@cs.psu.edu (Michael D Mellinger) writes:
>Wimp, chicken, afraid you might lose... :-) :-)

First off, I agree with Ethan (thus this is why you have not seen
me in this *mess* before today)!  Only the better man can restrain
himself in these flame wars, or so they say... :-) :-)  One point
Ethan, zero points Mike.  No really, I'm glad to see that you are
starting to ask the `right' questions instead of just flaming, etc.

>Actually, could you answer(or someone who knows) my blitter vs.  the
>040 question?  I think that it is definitely legitimate.  Consider it
>a purely Amiga question.  There must be more to the blitter.

OK Mike, I'll will assume that you are being true to your word and that
you are not trying to start a flame war over this...

Let me state that all information contained within this post was taken
from BlitLab (ON A FISH DISK!) v1.3 by Tomas Rokicki of Radical Eye
Software.  [side plug, if you want to know about the blitter, get this
package!]  In turn, Tomas took all his information from:
 * Hardware Manual
 * ROM Kernel Manual
 * the ROM's
 * lots of empirical testing done by himself.

Tomas wrote (a long while ago, so forgive the fact that he calls graphics
memory, CHIP memory :-):
"The blitter comprises part of the Agnes chip in the Amiga, and can only
access CHIP memory.  (CHIP memory is the lower 512K of memory in the
current Amiga models, but this may change.)  To the 68000, it appears as a
set of approximately twenty sixteen bit write only registers.  It can
access memory at 7.2 megabytes per second, or twice the bandwidth of
the 68000 (although, as we shall see, it doesn't always run this fast.)
Any video memory accesses can slow the blitter down, whether for
screen refresh or for the 68000.  For instance, the standard two bit
deep high resolution workbench screen can slow the blitter down by
approximately 30\%.  A low resolution single bit plane screen can slow it
down by about 8\%.  A high resolution four bit plane screen can slow down
the blitter by about 60\%.  The blitter is so fast, however, that even
with this handicap it performs its tasks many times faster than the 68000.
[Please note that the slow downs Tomas talks about *would* effect any
other processor talking with graphics memory... - LJR]

"The blitter uses four DMA channels to perform its work; these are labeled
A, B, and C (sources) and D (destination).  Any or all of them may be
disabled independently.  The destination can be calculated from any of
256 possible logical equations on A, B, and C.  The A and B sources can
be shifted up to 15 bits to the right, and the first and last word in a
line from the A source can be masked by a constant.  Each of the four
channels has its own modulo.  The blitter also has an area fill and a
line draw mode."

This last paragraph describes the blitter!  It is far more than a mere
memory mover (the one task where the '030 and '040 can beat the blitter).
The blitter can (for a complex example ;-), read from memory, shift, w/ mask
read from another memory location, shift it, combine the results with any of
the possible logical operation with two inputs and write to a final memory
location (that could be the same as one of the first two) all in, as you
will see below, 6 clock cycles!  Granted the clock cycles are of the 7.12
MHz types, but show me the '030 *or* '040 code that can do the same
in less time...

Tomas goes on to say:
"So, all of those fancy operations are fine and dandy, but just how fast
is the blitter, anyway?  This depends entirely on which DMA channels
are turned on.  You might be using a DMA channel as a constant, but unless
it is turned on, it does not count against you.  The minimum blitter
cycle is four clocks; the maximum is eight.  Use of the A register is
always free.  Use of the B register always adds two clocks to the
blitter cycle.  Uso of either C or D is free, but use of both adds
another two clocks.  Thus, a copy cycle, using A and D, takes four
clocks per cycle; a copy cycle using B and D takes six clocks per
cycle, and a generalized bit copy using B, C, and D takes eight clocks.
When in line mode, each pixel takes eight clocks.

"The clock is the 7.18 MHz system clock.  To calculate the total time
for the blit in microseconds, after setup, you use the equation
$$t={nHW\over 7.18}$$
where $t$ is the time in microseconds, $n$ is the number of clocks
per cycle, and $H$ and $W$ are the height and width of the blit,
respectively.

"Actually, this is a minimum time, which is strictly impossible.
Display data fetches, 68000 cycles, and other operations can steal
cycle bandwidth away from the blitter.  One way to eliminate most of
this overhead is to call the macro {\tt OFF\_DISPLAY}
which turns off the display; this is not a friendly thing to do,
however.  Don't forget to call {\tt ON\_DISPLAY} after the blit
is finished!"
[Again, bear in mind that the 680x0 would be forced to deal with
display data fetches, etc - LJR]

Please forgive the TeX command in there, I didn't want to modify
his words in any way. :-)

>Otherwise, why would Commodore keep investing money in upgrading the
>blitter when they could simply buy a cheap RISC(what's an 030 go for
>these days) and save themselves a whole lot of money.

Interesting point, but the blitter is still not too bad as compared to
other solutions C= might pick up instead.  The value/cost ratio has to
be close to infinity due to the fact that C= makes them themselves in
their NMOS foundries and the original research/development costs have
long been covered.  Until the blitter is getting beat by things that
cost less, I don't see C= switching as it would require a large amount
of upgrades to existing OS software.  This does not mean that C= should
not be looking into the problem now and waiting until it is too late!

In sum, when doing `normal' operations such as copy memory, the blitter
can be put to shame by an '030 [in the A2000 with '030 board moves will
be about 1.7 times as fast.  In the A3000 with it's about 2.8 (from 
CpuBlit.doc).  The A3000 does better because the '030 can access the
graphics memory in 32-bit chunks].  But with the *same* memory sub-system,
an '040 would not do better than an '030 in either of the above listed cases!
An '030 doing the memory copy is already saturating the graphics memory bus,
so an '040 can't do better.  The old von Neumann bottleneck wins again.

Now, the `complex' operations that the blitter can do, *can't* *be* *touched*
by an '030 or an '040.  It's like comparing a software data encryption
package with a custom chip solution.  Running the algorithm on a general
purpose CPU is NO match for the hand crafted hardware solution.  The
blitter is basically a highly refined memory mover *and* memory manipulator.
The general purpose CPU just can't match it.  To boot, it has a built-in
line drawing mode and area fill between lines mode!  The blitter is indeed
a well matched co-processor for a graphics based home computer such as the
Amiga.  The design of the blitter also proves that throwing a lot of
money at a problem is not always the best way to solve it.

Let me say a few last words about that last (as of yet unbacked up) statement.
Recently, many other computer companies have tried to solve the same problems
that the Amiga blitter solved SIX years ago!  Most are using RISC graphic
engines with costly high preformance memory subsystems to keep up with data
fetches and writes and *instruction* fetches.  The blitter was designed
so that once setup, only data fetches and writes need be considered.  These
RISC designs try (at more expense) to get around the memory I/O problem
by adding an I-cache (and cache controller, etc all at MORE cost).  The
blitter approach has stood the test of time.  It would be my hope that when
the time is right, C= would update the blitter for 32-bit accesses and 4 the 
clock rate.  This would yield a blitter that would go 8 times the speed of 
the current blitter (the magic number needed to get people to see that they 
should upgrade! :-).  [hell, this is advocacy, thus this last paragraph. :-]

Have a good day, [Mike, I'd like to see more of this side of you, it's
much more pleasant.]
Loren J. Rittle
[send *all* mail replies to l-rittle@uiuc.edu or lrg7030@uxa.cso.uiuc.edu
 because this account is going away in under two days!]