[comp.sys.amiga] 3000 wishes

IMS103@PSUVM.BITNET (10/06/89)

    I am not a hardware person so I might be wrong on this but,  is there
any reason you could not bring up the speed of the custom chips?  Like
making 14 or 28 mhz versions of the blitter?

    Also, I think a good idea would to include *two* serial ports *standard*
on the Amiga 3000.  I would go out and buy the ASDG multi-serial port card
right now but I am saving up for a GVP 80 hard-card.  Sigh, so many products
and so little money to buy them with.
-------
+--------------------------------------------------------------+
| "Man, is an endangered species" - Terl   _Battlefield Earth_ |
| IMS103@PSUVM.BITNET (Ian Smith)             L Ron Hubbard    |
+--------------------------------------------------------------+

usenet@cps3xx.UUCP (Usenet file owner) (10/06/89)

Here's an idea for improving the graphics chipset.

Make sprites work in hires, and allow them to be as wide as a playfield;
while your at it, make them as deep (bitplane wise) too.

Now make it so that you can have at least 32 indipendent sprites
per scan line.

You know what you just did? Windows in hardware! Make things go
*real* fast.
REAL NAME: Joe Porkka   porkka@frith.egr.msu.edu

cmcmanis%pepper@Sun.COM (Chuck McManis) (10/07/89)

In article <4875@cps3xx.UUCP> porkka@frith.UUCP (Joe Porkka) writes:
> Make sprites work in hires, and allow them to be as wide as a playfield;
> make them as deep (bitplane wise) too... make it so that you can have 32 
> indipendent sprites per scan line.

I designed one hardware graphics systems, and helped with another that
was part of a multiperson group that was working on the Intel 82786. 
Both had something similar to this, and both attacked the problem in 
different ways. The difficulties that come up are similar though.

Sprites and windows can be thought of as a memory management problem. One
linear space (the viewscreen) may be composed of several discrete chunks
of a larger workspace. On a pixel by pixel basis you get to decide where
that pixel will come from in the workspace. Fortunately, you can make some
optimizations because you know that pixels will be accessed in sequential
order. The problem is access time for the translation tables. Since a scan
line may be as short as 14 microseconds (for a non interlaced 1K X 1K display
at 66Hz) you need to do pixel translations in as few as 14 nanoseconds. 
And if you can do 14 nanosecond translations then you can have an arbitrary
number of windows aligned on arbitrary boundaries on your screen. Now however
if you want to do "sprites" which can be "transparent" you may do your 
translation, only to find out that the sprite you translated two has a 
transparent pixel, and now you have to find the pixed "under" it. If you 
used up your 14 nanoseconds getting to the first sprite, your hosed because
the beam will move on. Anyway, it isn't this bad at NTSC rates. With a 
640 X (200/400) screen and a 15Khz scan rate, you only have to map pixels
within 99 nanoseconds. So visualize the following scene at the pixel 
multiplexor :

    Beam Position
        X    Y
        |    |
        V    V
	sprite 0 ------\
	sprint 1 ------\\
	sprite 2 ------\\\             +-----+
	sprite 3 -------\\\\ +-----+   |     +---> Red
	sprite 4 ------------+ MUX +---+ DAC +---> Grn
	sprite 5 -------//// +-----+   |     +---> Blu
	sprite 6 -------///            +-----+
	sprite 7 -------//
	playfield ------/

So the MUX or some sort of arbitration circuit has to lookup the pixel color
of sprite 0, and if it's transparent fall through to sprite 1, ..., to sprite
7 and then finally pick up the playfield data. All within the 99ns the beam
has to find that information. Common ways to cheat are to "freeze" the values
and start queueing up stuff from memory when HBLANK hits, and while you get 
behind in fetching stuff you started out ahead, so that the beam just catches
up to you when you hit the next HBLANK.

So an expensive way to do this might be to put each window in the "proper"
place in it's own bank of VRAMs. [You might be able to multiplex windows
that didn't overlap like VSprites with clever programming.] Then you 
scan all banks of VRAM simultaneously for data. In the display unit you
simply keep a bunch of address comparators that hold the LeftEdge, TopEdge,
RightEdge, and BottomEdge values, all ANDed together so that they generate
a "1" bit when the beam is in that "window". Since the propgation time on
these comparators is pretty fast (like 10ns) we don't have to worry about
that. If you are clever and want to make them sprite like, you can put 
a "zero" detect in to AND with the comparator output and that would pull
the "we're in this window" bit down if the pixel at that location was zero.
Now you divide the pixel clock into 4 subclocks (each ~25ns in this case)
and time it like this :

	0		1		2		3
	 _______	 _______	 _______	 _______
clock	/	\_______/	\_______/	\_______/	\_______
		________________________________________________________
inwin	-------<________________________________________________________>-
				________________________________________
iszero	-----------------------<________________________________________>-
						________________________
is_top	---------------------------------------<________________________>-
							________________
valid_pixel -------------------------------------------<________________>-

So if you can read my crude timing diagram, everything latches on the falling
edge of C0 (4X pixel clock) and that ends up that by the rising edge of
phase 3 you can clock the "true" pixel onto the video shifter bus and 
then out to the dacs. Note that only on the falling edge of phase 2 will
you have an accurate picture of which pixel is "topmost" this from an
arbitration of priorities between the falling edge of phase 1 and before
the falling edge of phase 2. That means you have to arrive at the correct
priority in about 25ns, given a setup time of 5ns and a settling time of
3 - 4ns, you have to keep those propogation times down. You can probably
do this with a XOR priority encoder scheme. Anyway, for 32 "window/screen/sprites"
you will need 32 banks of VRAM (again this will be the maximum number of windows
on a line, if you can live with fewer windows/line you could reduce that.) 
Assuming 8 bit pixels, (this is an improvement after all) and a 640 X 200+
screen you will need 512KB of VRAM for each window, leaving you with 
16MB of VRAM for the display. Which is definitely doable but it will get  
a bit expensive. Interestingly enough on a monochrome screen you only need
2MB of VRAM, and that would make for a pretty awesome X terminal or some
such.

--Chuck McManis
uucp: {anywhere}!sun!cmcmanis   BIX: cmcmanis  ARPAnet: cmcmanis@sun.com
These opinions are my own and no one elses, but you knew that didn't you.
"If I were driving a Macintosh, I'd have to stop before I could turn the wheel."

rico@dehn. (Rico Tudor) (10/07/89)

In article <125964@sun.Eng.Sun.COM> cmcmanis@sun.UUCP (Chuck McManis) writes:
>In article <4875@cps3xx.UUCP> porkka@frith.UUCP (Joe Porkka) writes:
>> Make sprites work in hires, and allow them to be as wide as a playfield;
>> make them as deep (bitplane wise) too... make it so that you can have 32 
>> indipendent sprites per scan line.
>
>I designed one hardware graphics systems, and helped with another that
>was part of a multiperson group that was working on the Intel 82786. 
>Both had something similar to this, and both attacked the problem in 
>different ways. The difficulties that come up are similar though.
>
>Sprites and windows can be thought of as a memory management problem. One

I commend Chuck McManis for his detailed and insightful article.  Sprites, as
provided by the Amiga, are inexpensive and useful, but have rigid
limitations.  TI's graphics processor implements sprites by providing two
dedicated bitplanes.  However, the most general design would eliminate the
difference between sprites, windows and playfields, using all bitplanes as
"data".  This design is described below, and requires hardware for multiple
bitplanes, a colormap, vertical and horizontal scrolling.  Experts will
notice that Ami already has most of these features.

Step 1: Overlapping Objects
---------------------------
	Imagine there's two windows (it isn't hard to do): a little one in
	front of a big one on the screen.  It might look like this:
		b b l l L
		b b L l L
		B B b b b
	Assume each window is one bit deep.  Then the big window is colored
	'B' and 'b'.  The little window is colored 'L' and 'l'.  Residing in
	VRAM are the complete images of each window, in separate bitplanes:
		b b b b b	- - l l L
		b b B b b	- - L l L	'-' means don't care
		B B b b b	- - - - -
	Another bitplane, the "mask plane" looks like this:
		0 0 1 1 1
		0 0 1 1 1
		0 0 0 0 0
	The idea is to load the colormap so that the pixel will be L/l where
	the mask is '1', and B/b otherwise.  This boolean function is called
	"cookie-cutter" in some circles:
		colormap entry (base 2)		pixel value to screen
		-----------------------		---------------------
		0lb				b
		0lB				B
		0Lb				b
		0LB				B
		1lb				l
		1lB				l
		1Lb				L
		1LB				L
	This example used rectangular objects, but they can be oval, or any
	set of pixels, even disconnected.  By adding bitplanes, any number
	of objects can be displayed:
		maskplanes = roundup( logbase2( objects));
		totalplanes = maskplanes + objects*depth;
	Actually, objects need not have the same depth.

Step 2: Moving an Object
------------------------
	Applications in the previous example can draw into an object by
	changing bits in the VRAM image, without concern for other objects
	that might be obscuring the view: a luxurious programmer environment.
	This luxury is maintained when the object is moved by employing the
	scrolling hardware.

	Each bitplane has independent address pointers (BPLxPT), which step
	through ascending 16-bit words of VRAM.  During vertical blanking,
	an object's address pointer can be set lower or higher in VRAM,
	shifting the object's position on the screen.  The maskplane address
	pointer is also adjusted.  This can deal with vertical scrolls of one
	scanline, but jumps horizontally in increments of 16 pixels (not
	acceptable for replacing sprites).  Since the object's bitmaps are
	not copied in VRAM to effect the move, the user sees smooth motion
	for an object of any "size".

Step 3: Scrolling an Object
---------------------------
	As all CLI users are aware, scrolling a large window is slow.  This
	is because AmigaDOS is copying all the bits in the window.  The
	action is even worst when small windows sit on top: the scrolling
	becomes jagged and irregular.  Solution: don't copy the bits.
	How?  Very much like Step 2, except easier.  The object's bitplane
	address pointer is changed, but not the maskplane address pointer.

I have glossed over a small matter involving maskplane updates when three or
more objects exist.  I have also not discussed the glaring problem of memory
size and bandwidth implied by this design; I am confident that Chuck will
not leave this as it stands.

The problem with full-width sprites, desired by the original poster, lies in
the demand for memory bandwidth, a resource in short supply on the Amiga.
Such a sprite, if full-height, would use as many cycles as two low-res
bitplanes.  With four hi-res bitplanes enabled, there is sufficient bandwidth
in each horizontal scantime for just one such sprite.

My implementation of the Amix windowing system was so tight on memory cycles
to chip ram, I was unable to use the video hardware as a cookie-cutter.
Instead, the function was performed using a 3-source blitter op.

I desire the following improvements in Amiga graphics.  Firstly, bit
alignment for video data fetch per address pointer (playfield scroll is
insufficient).  Secondly, more DMA bandwidth: a factor of ten would be okay.
Thirdly, more bitplanes and larger colormap.  And dump sprites.

peter@sugar.hackercorp.com (Peter da Silva) (10/07/89)

In article <1221@accuvax.nwu.edu> rico@cbmvax.commodore.com (Rico Tudor) writes:
>I desire the following improvements in Amiga graphics.  Firstly, bit
>alignment for video data fetch per address pointer (playfield scroll is
>insufficient).  Secondly, more DMA bandwidth: a factor of ten would be okay.
>Thirdly, more bitplanes and larger colormap.  And dump sprites.

As you said earlier in this article, sprites are cheap. And sprites are also
very useful for certain things, at least one of which is really important...
the mouse pointer. Now I know that other windowing systems get away with
no hardware pointer, but you can always tell... the pointer flickers, is jumpy,
and occasionally glitches, leaving little pointer-shaped turds on the
screen. This is becoming less common, but I remember seeing it referred to
in comp.windows.x in the past 6 months so it still happens even there.

With a hardware pointer the programmer can ignore the stupid thing.
-- 
Peter "Have you hugged your wolf today" da Silva      `-_-'
...texbell!sugar!peter, or peter@sugar.hackercorp.com  'U`
``Back off dude! I'm a topologist!''
	-- Andrew Molitor <amolitor@eagle.wesleyan.edu>