[comp.sys.amiga.hardware] Bitplanes - good or bad

wayneck@tekig5.PEN.TEK.COM (Wayne C Knapp) (03/29/90)

After many countless hours of programming on my Amiga.  I'm becoming to 
realize that a major short comming of the Amiga is the fact that it uses
a bitplane memory design for graphics and any real graphics require the
processor.  I can see many advantages for bitplanes in simple games, but
very few applications outside of games seem to really bennefit from bitplanes.
I suppose that there are a lot of examples one can dream up like putting
a image of a cockpit in the some of the front planes and the image of the
outside terrain in the other bit planes which would make for a pretty hot
fight type of simulation or game.  However, if you want to do something else 
like just plotting a single pixel the problem of the bitplanes becomes apparent.

I'm not taking about some simple games or some special simulation, I'm
concerned about the more general problem of the user working on the screen
and producing a image.   I think this is fair since the Amiga is supposed
to be best at animation, video type type things and multi-media stuff.  The
most basic operation to much of the graphics for these type of things is
plotting a pixel.  None of the custom hardware aids in this task, in fact 
in high res it works against you!  This may sound shocking but if you think
about it is clear.  

The copper mostly only does setting up of other hardware and syncing or the 
hardware during the display.  It is very flexible at the tasks it is designed
for but it is also very limited.  Clearly the copper is no aid at all in
actually writting to bitmaps.  Almost every other display system I can think
of has hardware that does what the Amiga copper does.  Of coarse differently
since designs vary widely, but still the same functions.   I would say that
the Amiga copper is more flexible than most display controllers, but that 
is basicly what it is, a display controller.  It is useful sometimes but not
for the general problem of plotting a pixel.

What about the blitter?  Well it is okay for moving blocks of memory, has
very useful logical operations and can even be used for simple filling and
line drawing.  Still for the most part it isn't useful for general problems.
It has several problems.  One it requires a large amount of setup to use.
Two it has a very limited amount of sources, (only three) and for many 
operations this means there is only two sources to use since one of the 
sources is often the destination.  This limits the blitter to working on
one bitplane at a time and also limits the complexity of the blits.  My titler
program I wrote (Animation:Titler) often uses up to 4 three source blits to
render a character.  At that point I may be far better of doing the blits 
with the CPU since I could keep much of the temporary information such as
masks in CPU registers between bitplanes and greatly reduce the number of
memory accesses needed to render the character.  (Assuming a 68020 or better
so that the blit code could live in on chip cache.)  A lot of work to be sure
but not more than writing the blitter interface code in the first place. 
The blitter certainly has its place, it is often useful, but sadly it doesn't
do the job of just plotting a pixel reasonably.   

So now we are out of special hardware and we still have to plot the pixel.
The CPU is the only choice.  So now one has to do the following just to 
plot a pixel:

   Compute the work offset of the pixel in the bitplane
   Compute a mask for the pixel
   For each bitplane n {
      read the correct data word from the bitplane 
      mask out the bit corresponding to this pixel. 
      if the bit in the pixel corresponding to this bitplane is set
         or in a 1 into the word from the bitmap via the pixel mask
      write the correct word back to the bitplane
   }

Which in simple terms means, at least 2 * (number of planes) for every
plotted pixel.  The more bitplanes there are the slower it goes.   Due 
to this, I don't really want '24 bitplanes'.  Minimum of 48 memory access
per pixel plotted!!!  There would be some programs that would run faster on
C64s!!! 

However, just think about a packed pixed format where pixels are on 32 bit
boundaries.  Then to plot one pixel it only requires one write.  It would
be a little wasteful of memory if you are only using a few colors, but memory
is getting cheap enough to make it reasonable.  Blitting still works fine
too.  A 512x480x24 bit display would only require 720k of memory.  One could
go with pixels on word boundaries, this would cost only 360k for a display
and would allow the same blitter to be used.  These would be true color
modes to make things simple.

Clearly this would not compatible with the current system.  If it was done
on the Amiga it would have to be in addition to the current modes.  However,
a packed pixel mode would have much better performance for many types of
graphics applications.

                                                 Wayne Knapp

hue@netcom.UUCP (Jonathan Hue) (03/29/90)

In article <5917@tekig5.PEN.TEK.COM> wayneck@tekig5.PEN.TEK.COM (Wayne C Knapp) writes:
>
>Clearly this would not compatible with the current system.  If it was done
>on the Amiga it would have to be in addition to the current modes.  However,
>a packed pixel mode would have much better performance for many types of
>graphics applications.

There's a way to be compatible.  We had a similar problem a few years back
when we were designing a frame buffer.  The software people couldn't decide
if they wanted four adjacent bytes to be RGBA for one pixel, or four adjacent
pixels of R, G, B, or A.  There were good reasons for either type of access.
What we finally did was have the frame buffer map to two separate address
ranges, when you accessed through one range of addresses you got RGBARGBARGBA,
when you accessed through the other you got

			  RRRRRRRRRRRRRRRR...
			  BBBBBBBBBBBBBBBB...
			  GGGGGGGGGGGGGGGG...
			  AAAAAAAAAAAAAAAA...

Going to a bitplane oriented system introduces more complexity, but it could
be done.  One problem is that VRAM (if you decide to use it) is usually 4-bits
wide; 1-bit wide VRAM would make things much simpler.  If you support "frame
buffer anywhere in CHIP RAM", like it is today, you would probably run into
a few more problems, and some alignment restrictions.

-Jonathan

daveh@cbmvax.commodore.com (Dave Haynie) (03/30/90)

In article <5917@tekig5.PEN.TEK.COM> wayneck@tekig5.PEN.TEK.COM (Wayne C Knapp) writes:

>After many countless hours of programming on my Amiga.  I'm becoming to 
>realize that a major short comming of the Amiga is the fact that it uses
>a bitplane memory design for graphics and any real graphics require the
>processor.  

You have to define "graphics" considerably more carefully to make that a
true statement.  What you're really talking about is pixel by pixel image
processing.  In that case, you're right.  This is no new discovery, but
very well known -- bitplane architectures are at their worst in the case
where a CPU needs to read a pixel, perform an operation on it, and then
write the pixel back out.  There are quite a few advantages of bitplane
architectures as well; they generally support high-level operations better.
If, for example, you need to draw a line or fill an area, the packed pixel
approach much operate on every single pixel, and that operation may very
well be a masking operation, not just a simple fill.  The bitplane 
architecture need only operate on the affected planes, which in the worst
case means it has as much work to do as the packed-pixel machine.  However,
this is something easily parallelized -- there's nothing preventing all
bitplanes from being operated on simultaneously.  The National Semiconductor
8500 series graphics system does this.

You'll notice some of the more advance graphics chips around have caught
onto the idea that neither packed pixels nor bitplanes are always a win.
It's possible to allow both types of access.  It gets pretty complex when
you let both happen at once, but that may not always be necessary.  In most
cases, the type of pixel addressing that's most useful changes with the
application.  If you're doing lots of image processing on an image in video
memory (personally, I'd move it into fast 32 bit memory if I had lots of
processing to do, anyway), you want deep packed pixels and something on
the order of a PAL/NTSC resolution.  If you're running a 2D CAD or DTP 
application, you want a few bitplanes, but something megapixel.

However, in most of the work done on small computers these days, there's
no big win in either architecture.  The big wins are having something that'll
draw for you in parallel with and/or faster than your main CPU.  The Amiga's
graphics chips are the simplest case of such a system, and that's why with
most of the things most people do on the Amiga, they find it faster at
graphics than other machines in it's class (eg, I'll put an A2000 up against
any other 16 bit/~8MHz/GUI-based machine out there in overall graphics
speed).  

>                                                 Wayne Knapp

-- 
Dave Haynie Commodore-Amiga (Systems Engineering) "The Crew That Never Rests"
   {uunet|pyramid|rutgers}!cbmvax!daveh      PLINK: hazy     BIX: hazy
                    Too much of everything is just enough

cmcmanis@stpeter.Sun.COM (Chuck McManis) (03/30/90)

A pretty decent description from Wayne about bitplanes and there
strengths and weaknesses. 

Yes, bitplanes are better for games. The Amiga was designed as a
game machine originally and this comes through in its architecture.

And, as Wayne empirically determined, using the blitter for things
like writing a pixel is not very efficient. So the question Jeff
Porter (and Amiga guy) might ask himself is "Given this architecture,
how can I make it go faster?" From where I'm sitting, there seems
to be two possible answers. 

One, the blitter wants to work on one bitplane, in part because 
logical operations on multiple bit pixels are tougher. So to 
enhance the speed of the operation, you can enhance the parallelism.
This is sometimes referred to as the "Blitter per Bitplane" approach.
Given four blitters, drawing into four bitplanes and sharing all
registers except the source/destination pointers in common. 
Operating in this mode would be slightly more than 4 times faster
than the current scheme. The slightly more comes from the fact
that in the current system there are some register reload times
which are common in our mythical multiblitter. 

The nice thing about this idea is that it can be "backward" compatible
on a register level with the existing Amiga because older programs
would just start the "one" blitter. Of course for extra power the
chip would run the blitters without common registers as well so that
one could be decoding MFM data from the floppy while the other was
outlining a box on the screen. So far everything looks wonderful, but...

And you knew it was coming. How would this multiheaded blitter talk
to memory ? Now the Amiga blitter is a "Word" blitter and not a true
"Bit" blitter. What it does is read in a word, (16 bits) operate on
it and write it back out, possibly shifted left or right. The speed
comes from the fact that the memory cycle of the blitter looks something
like : (psuedo timing diagram)

	  Bus Cycle 1          Bus Cycle 2
A1-A11	|--------<-Address->|-------<-Address->|--
Read*	|--------__________-|------------------|--
Write*	|-------------------|--------_________-|--

The only way to utilize the massive parallelism of 4 blitters (or N
blitters) is if they can all read data from memory _at the same time_.
Hmmm, bad news eh? If you have serialize each blitters request then
you are suddenly only marginially faster than a single blitter being
run 4 times. Now this is possible, but it means N-way accessible
memory (or N copies of the memory) and N buses for the blitters to
all simultaneously get to the memory. That of course is a fundamental
change to the way in which the Amiga is built. 

The other option is to use a pixel blitter rather than a blitter
that operates on bits. And there are lots of examples of those. 
(The 32010 comes to mind) 

Here's an idea for a new system :

	  +----------------------------+
	+-| blitter - 1Meg VRAM - 1 bit| ---+
        | +----------------------------+    |     +-------+
	| +----------------------------+    |     |       +--- Red
	+-| blitter - 1Meg VRAM - 1 bit| -+ +-----+       |
        | +----------------------------+  +-------+ Video +--- Green
	| +----------------------------+  +-------+ RDAC  |
	+-| blitter - 1Meg VRAM - 1 bit| -+ +-----+       +--- Blue
        | +----------------------------+    |     |       |
	| +----------------------------+    |     +----- -+
	+-| blitter - 1Meg VRAM - 1 bit| ---+
        | +----------------------------+
  Common Bus

You could add blitter/bitplane combinations as required. The RDAC
could handle "eight" of then for 256 out of 16M colors. It might
work and it would surely be fun to build :-)

--Chuck McManis
uucp: {anywhere}!sun!cmcmanis   BIX: <none>   Internet: cmcmanis@Eng.Sun.COM
These opinions are my own and no one elses, but you knew that didn't you.
"If it didn't have bones in it, it wouldn't be crunchy now would it?!"

dave@cs.arizona.edu (David P. Schaumann) (03/30/90)

In article <133675@sun.Eng.Sun.COM>, cmcmanis@stpeter.Sun.COM (Chuck McManis) writes:
| [ ... ]
| One, the blitter wants to work on one bitplane, in part because 
| logical operations on multiple bit pixels are tougher. So to 
| enhance the speed of the operation, you can enhance the parallelism.
| This is sometimes referred to as the "Blitter per Bitplane" approach.
| Given four blitters, drawing into four bitplanes and sharing all
| registers except the source/destination pointers in common. 
| Operating in this mode would be slightly more than 4 times faster
| than the current scheme. The slightly more comes from the fact
| that in the current system there are some register reload times
| which are common in our mythical multiblitter. 
| 
| [...]
| So far everything looks wonderful, but...
| 
| And you knew it was coming. How would this multiheaded blitter talk
| to memory ? Now the Amiga blitter is a "Word" blitter and not a true
| "Bit" blitter. What it does is read in a word, (16 bits) operate on
| it and write it back out, possibly shifted left or right. The speed
| comes from the fact that the memory cycle of the blitter looks something
| like : (psuedo timing diagram)
| 
| [ ... ]
| 
| The only way to utilize the massive parallelism of 4 blitters (or N
| blitters) is if they can all read data from memory _at the same time_.
| Hmmm, bad news eh?

Not necessarily.

First, assume: blitter operations are independant.  That is, given a task
  for the blitters, the order they perform their sub-tasks is unimportant.

Second, give the Amiga interleaved memory, and start the blitters on staggered
  words in memory.

I realize there would be a problem in the case where the memory is not 
contiguous, but I think this would occur rarely enough that you would still
get nearly a 4x speed up (assuming 4 blitters.)  Of course, all this would
cost $$$, but you knew that, didn't you?

| 
| --Chuck McManis
| uucp: {anywhere}!sun!cmcmanis   BIX: <none|   Internet: cmcmanis@Eng.Sun.COM
| These opinions are my own and no one elses, but you knew that didn't you.
| "If it didn't have bones in it, it wouldn't be crunchy now would it?!"


Dave Schaumann		| "Constable Parrot ate one of those!"
dave@cs.arizona.edu	|  ;)

Sullivan@cup.portal.com (sullivan - segall) (04/10/90)

>In article <10432@cbmvax.commodore.com>, daveh@cbmvax.commodore.com (Dave Haynie) 
w
>rites: 
>> In article <5917@tekig5.PEN.TEK.COM> wayneck@tekig5.PEN.TEK.COM (Wayne C Knapp) 
w
>rites: 
>> 
>> >After many countless hours of programming on my Amiga.  I'm becoming to 
>> >realize that a major short comming of the Amiga is the fact that it uses 
>> >a bitplane memory design for graphics and any real graphics require the 
>> >processor.  
>> 
>> You have to define "graphics" considerably more carefully to make that a
>> true statement.  What you're really talking about is pixel by pixel image
>> processing.  
>
>This is a resonable point.  I didn't do a good job of choosing my words here.
>Of coarse the Amiga is an excellent platform for many kinds of graphics. 
>Everyone must be aware of how nice a job the Amiga does with windowing and
>paint type fuctions.  What I should said is,  many graphic fuctions require 
>the processor.   I'm sorry about my poor choice of wording.
>
>However, there are many things happening in the graphics field today that
>were only being touched on in 1985.  When I first read about the Amiga in 
>BYTE and read about the reasons for using bitplanes I was very impressed! 
>I rushed to get an Amiga.  Since then things like raytraceing on micros,
>image processing on micros, spline based interfaces, 3D animations, and
>many other things have started to become common place.  It is in these areas
>that the extra hardware is often of little use when dealing with bitplanes.
>Prehaps, future Amigas will be better suited to these type of problems.  The
>last 5 years has been explosive in graphics and the demands on computers is
>increasing all the time.  
>
>Pixel by pixel image processing is only one of many areas where bitplanes are
>not an aid.   
>
>Currently I'm trying to figure out how to get improved drawing speeds for
>a spline based interface.  It is hard.  I will probably end up living with
>a just barely usable system speed-wise since it is just too hard to get the
>speed I would like out of the bitplanes.  The problem is changing points.  If
>I was only using one or two colors it would be easier since I could assign a
>bitplane per color.  However, I'm at the point were I can't get more speed
>without giving up colors or other things I don't want to.  Since this is
>the user interface I force into dealing with the bitplanes.  Converting an
>image from fast RAM into the bitplanes is just too slow.
>
Wayne, I'm not going to tell you how to write your programs, but it seems 
to me that you are working the problem from the wrong end.  Rather than treating
a bit-plane based system like a pixel based system, why not draw your splines
into an empty plane, and then blit-or it into the appropriate on planes, or
blit-nand it into the inappropriate planes.  You could even draw into 16
different planes and blit combine them later.  If your splines are really 
short then perhaps it would be better to recalculate them for each plane.
Anyway, as long as you use the former method, the CPU is *NOT* required for
drawing the screen (beyond the initial calculations.)


In any case, 'of course' is spelled O-F C-O-U-R-S-E.

Sorry, it hurts my brain to see 'of coarse'. -ss
 
                           -Sullivan Segall
_________________________________________________________________
 
/V\  Sullivan  was the first to learn how to jump  without moving.
 '   Is it not proper that the student should surpass the teacher?
To Quote the immortal Socrates: "I drank what?" -Sullivan
_________________________________________________________________
 
Mail to: ...sun!portal!cup.portal.com!Sullivan or
         Sullivan@cup.portal.com