[comp.arch] Dynamic Display Architecture

ldo@waikato.ac.nz (Lawrence D'Oliveiro, Waikato University) (04/15/91)

I've been talking with some friends about different display architectures,
notably contrasting the hardware-intensive approach of the Commodore Amiga
with the software-intensive one of the Apple Macintosh.

Now, raster displays are neat, but it takes a lot of memory accesses
to move those bits around, like when you're doing animation. The Amiga's
sprites and other capabilities are powerful, but I can't help thinking there
are too many arbitrary numbers hard-wired into that architecture for it to
be a long-term solution.

But one of the interesting bits of hardware in that machine is a processor
called the "Copper": its sole job is to watch the video beam, and poke
various machine registers (that you specify) when the beam gets to particular
positions on the screen. For example, you could change to a different
frame buffer halfway down the screen, or redefine a set of colour table
registers. It's true the CPU can do all this as well, but it makes
sense to pass as much of the load as possible to the Copper.

This set me to thinking: what if you have a very fast, reasonably
general-purpose processor, whose sole job was to feed a stream of pixel
data to the video beam? In other words, it would be directly controlling
the intensities of the R, G and B components of the beam as it traced
out the raster. In the simplest case, this processor could be reading
data from a frame buffer. It could even use the data it reads as an
index into a separate "colour table" array before feeding the results
to the beam.

But, depending on how much processing time you have, it could get
much more fancy than this. You could read from several different
areas of memory, producing the equivalent of any number of "sprites".
And that's just the beginning.

Assuming (just looking at the machine I'm running now) a 640 * 480
display refreshed at 67Hz, you're looking at generating about 20
million pixels per second. Is this practical?

Is the current generation of RISC machines up to it?

Lawrence D'Oliveiro                       fone: +64-71-562-889
Computer Services Dept                     fax: +64-71-384-066
University of Waikato            electric mail: ldo@waikato.ac.nz
Hamilton, New Zealand    37^ 47' 26" S, 175^ 19' 7" E, GMT+12:00
"...so she tried to break into the father bear's computer, but it was
too hard. Then she tried to break into the mother bear's computer, but
that was too easy..."

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (04/15/91)

In article <1991Apr15.200955.3438@waikato.ac.nz> ldo@waikato.ac.nz (Lawrence D'Oliveiro, Waikato University) writes:

| But one of the interesting bits of hardware in that machine is a processor
| called the "Copper": its sole job is to watch the video beam, and poke
| various machine registers (that you specify) when the beam gets to particular
| positions on the screen. For example, you could change to a different
| frame buffer halfway down the screen, or redefine a set of colour table
| registers. It's true the CPU can do all this as well, but it makes
| sense to pass as much of the load as possible to the Copper.

  This sounds like the Intel video controller, part number forgotten.
This is the one they announced, dropped, then said they'd make because
people had committed to it.

  I believe the ability to have a list of frame buffers, with the size
and starting pixel of each, was one of the features. It sounded like the
perfect display chip, but the TI came out at the same time and had a lot
of other useful features, and the Intel didn't sell well.

  This from memory, and on a Monday morning, too.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
        "Most of the VAX instructions are in microcode,
         but halt and no-op are in hardware for efficiency"

jesup@cbmvax.commodore.com (Randell Jesup) (04/16/91)

In article <3340@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes:
>  This sounds like the Intel video controller, part number forgotten.
>This is the one they announced, dropped, then said they'd make because
>people had committed to it.
>
>  I believe the ability to have a list of frame buffers, with the size
>and starting pixel of each, was one of the features. It sounded like the
>perfect display chip, but the TI came out at the same time and had a lot
>of other useful features, and the Intel didn't sell well.

	I think that was a "hardware windows" chip - a fixed number of hardware
windows, each with their own display memory ptrs, modes, color tables, etc.
The problem with this approach is "what happens when I need N+1 windows?"
Another thing the copper get you is an arbitrary number of virtual screens
than can be slid over each other, flipped between, etc, all very fast.  Then
there are tricks you can play with a copper like scrolling a screen without
moving any bits (Amiga Unix terminal screens use this trick).

	A "general purpose" display coprocessor can do far more arbitrary
operations, even a quite dumb one.  The reason you don't want the main CPU
doing this is latency issues (though a dedicated GP CPU combined with the
right display hardware might do as well, though perhaps at the cost of 
dual-porting registers or making a separate bus for it) and bandwidth issues.
Also, some operations may take a larger number of cycles for a general
purpose CPU, and when modifying an active display you need fixed, fast
response times (it's almost closer to the requirements for a DSP than a
general purpose CPU).  The worst thing possible is for the timing to be
non-fixed (ala caches).

	Of course, the current Amiga Copper is old technology (released in
1985).  Given more and faster silicon (the original custom chips are in 3u
NMOS) there are many enhancements and additions one would make to them.
Some are quite obvious, like bulk color register loads.  Some are more
esoteric.  As to what enhancements are or will be made, I'm afraid I can't
talk about that subject.

-- 
Randell Jesup, Keeper of AmigaDos, Commodore Engineering.
{uunet|rutgers}!cbmvax!jesup, jesup@cbmvax.commodore.com  BIX: rjesup  
Disclaimer: Nothing I say is anything other than my personal opinion.
Thus spake the Master Ninjei: "To program a million-line operating system
is easy, to change a man's temperament is more difficult."
(From "The Zen of Programming")  ;-)

jallen@libserv1.ic.sunysb.edu (Joseph Allen) (04/17/91)

In article <20670@cbmvax.commodore.com> jesup@cbmvax.commodore.com (Randell Jesup) writes:
>In article <3340@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes:
>>  This sounds like the Intel video controller, part number forgotten.

>>  I believe the ability to have a list of frame buffers, with the size
>>and starting pixel of each, was one of the features

>	I think that was a "hardware windows" chip - a fixed number of hardware
>windows, each with their own display memory ptrs, modes, color tables, etc.
>The problem with this approach is "what happens when I need N+1 windows?"
>Another thing the copper get you is an arbitrary number of virtual screens
>than can be slid over each other, flipped between, etc, all very fast.  Then
>there are tricks you can play with a copper like scrolling a screen without
>moving any bits (Amiga Unix terminal screens use this trick).

I had an idea for a hardware windowing circuit once which was very simple and
which would eliminate most of the problems windows have today.  All you do is
break up the screen into small (maybe character sized) blocks.  Then for each
block you have a pointer to where in memory the actual data is.  It's really as
simple as normal character refresh memory,  but with no font chip and with
wider (32-bits instead of 8) refresh memory.

Then:
	Moving windows is easy, just move pointers around.  Since you only have
(say) 4096 pointers instead of an entire video screen to move it's very fast.

	Processes can simply write to their "local" screens instead of having
to go through software to calculate the window addresses.  If you have a
machine in which the main memory is also the video memory, then each process
can appear to have it's own video memory.

	Higher-level video chips could be used on the "local" screens directly
instead of having to know about windows.

	When a window overlaps another, pointers are simply not made to the
hidden parts- and the program in the window doesn't have to know that it has
been overlapped.

	The only real disadvantage is that the window sizes are quantized on
the block sizes instead of on pixels.  I feel this is a very small price to
pay compared to having to calculate window overlaps everytime you write to the
screen or any of the other perversions done in current window technology.

So.. has anyone used/seen/proposed something like this?  Why are all modern
screens simple flat Mac-like clones (I suppose portability is one issue)?

>	Of course, the current Amiga Copper is old technology (released in
>1985).  Given more and faster silicon (the original custom chips are in 3u
>NMOS) there are many enhancements and additions one would make to them.
>Some are quite obvious, like bulk color register loads.  Some are more
>esoteric.  As to what enhancements are or will be made, I'm afraid I can't
>talk about that subject.

Aw...

>Randell Jesup, Keeper of AmigaDos, Commodore Engineering.

Ah-Ha! So now we see who's responsible for AmigaDOS not being a UNIX port/clone
as it should have been...  (whoever made ':' mean "start the path in root" for
AmigaDOS path strings should be shot along with the person who made the shell
wildcard '#*' (or whatever) and not just '*' along with whoever decided to
copy the MS-DOS device: convention etc., etc., etc.. :-)

(well at least it's better than MAC-OS)

--
#define h 23 /* Height */         /* jallen@ic.sunysb.edu (129.49.12.74) */
#define w 79 /* Width */                       /* Amazing */
int i,r,b[]={-w,w,1,-1},d,a[w*h];m(p){a[p]=2;while(d=(p>2*w?!a[p-w-w]?1:0:0)|(
p<w*(h-2)?!a[p+w+w]?2:0:0)|(p%w!=w-2?!a[p+2]?4:0:0)|(p%w!=1?!a[p-2]?8:0:0)){do
i=3&(r=(r*57+1))/d;while(!(d&(1<<i)));a[p+b[i]]=2;m(p+2*b[i]);}}main(){r=time(
0L);m(w+1);for(i=0;i%w?0:printf("\n"),i!=w*h;i++)printf("#\0 "+a[i]);}

zik@dec19.cs.monash.edu.au (Michael Saleeba) (04/17/91)

In <1991Apr17.051746.15592@sbcs.sunysb.edu> jallen@libserv1.ic.sunysb.edu (Joseph Allen) writes:

>I had an idea for a hardware windowing circuit once which was very simple and
>which would eliminate most of the problems windows have today.  All you do is
>break up the screen into small (maybe character sized) blocks.  Then for each
>block you have a pointer to where in memory the actual data is.  It's really as
>simple as normal character refresh memory,  but with no font chip and with
>wider (32-bits instead of 8) refresh memory.

One device which did exactly this was the Texas Instruments TMS9929A (and
others in the same family). This was a rather low-end device by today's
standards (256*192 graphics), but was the graphics engine for at least
two major machines, the Texas Instruments 9900 (?) and the Japanese MSX.
It also had hardware sprites and some other goodies. Quite a nice system 
in a limited sort of way, and it certainly made eight-pixel scrolling quick 
as you point out. Unfortunately standard bit-scrolling was slower than ever
since you had to keep track of all those pointers (actually character-
generator blocks). There were some tricks to get around this, but basically
it ended up slow and particularly awkard to program for, if you took advantage
of all the features.

 ______      _
|___  /  _  | | __	"I don't want the world - I just want your half."
   / /  |_| | |/ /
  / /    _  |   / 		Name:		Michael Saleeba
 / /__  | | |   \ 		At:		Monash University
/_____| |_| |_|\_\		E-mail:		zik@bruce.oz.au

paul@taniwha.UUCP (Paul Campbell) (04/18/91)

In article <1991Apr17.051746.15592@sbcs.sunysb.edu> jallen@libserv1.ic.sunysb.edu (Joseph Allen) writes:
>I had an idea for a hardware windowing circuit once which was very simple and
>which would eliminate most of the problems windows have today.  All you do is
>break up the screen into small (maybe character sized) blocks.  Then for each
>block you have a pointer to where in memory the actual data is.  It's really as
>simple as normal character refresh memory,  but with no font chip and with
>wider (32-bits instead of 8) refresh memory.

Of course this isn't a new idea, let's look at why it's hard ....

Let's assume you are using VRAMs (for performance) and you have an 8-bit
display, since the VRAM's max clock frequency is ~40MHz (25nS) in order
to get the 100MHz pixel rate you need for a 1M pixel display @ 75Hz you
need a 4:1 interleave on the video side, this means you have a 32-bit
(4x8-bit pixels) data path, OK, so far so good.

You are clocking your (4) pixels @100MHz/4 = 25MHz = 40nS/pixel, to switch to
a new chunk you need to do the VRAM read transfer to change the memory
address, this cycle takes (minimum) 180nS, also assume that you want to be
able to do at least one framestore access (or refresh etc etc) from the host
at the same rate (otherwise your rendering will be TERRIBLE!) then you are
going to have to leave another 180ns available each cycle (remember the read
transfer cycles are 'real-time' so you have to schedule all other cycles in
between). This means that you can only do a read transfer every 
4*(180+180)/40 = 36 pixels - assuming you want to do this on power of two
boundaries you have to limit the width of your chunks to 64 pixels wide. Of
course you also have to get information on where the next pixel will start,
if you fetch it from the same framestore then you get 4*(180+180+180)/40 = 54
pixels (still within your 64-pixels) - but you graphics performance (rendering
rate) just went down again, this time to 1/3.

Of course if you are using 1-bit pixels then the numbers are much different
(and more practical) - but not scalable. 

This is not to say you can't build such a system - lots of expensive SRAM
and a big fan initially come to mind and there are other trickier ways to
do it - all of them require throwing lots of expensive silicon at the problem.

Oh BTW, I know the guy who got the patent on your idea :-)

	Paul
-- 
Paul Campbell    UUCP: ..!mtxinu!taniwha!paul     AppleLink: CAMPBELL.P

"But don't we all deserve.
 More than a kinder and gentler fuck" - Two Nice Girls, "For the Inauguration"

ldo@waikato.ac.nz (Lawrence D'Oliveiro, Waikato University) (04/18/91)

Do any of these hardware windowing systems support anything resembling
the Macintosh Palette Manager? This seems to me the only reasonable way
to manage colour-table devices in a multi-tasking, multi-display
environment.

The way it works is that, instead of applications having to manipulate
device colour tables directly, you instead attach a "palette" of requested
colours to each window. The Palette Manager looks at the frontmost window,
looks at the hardware capabilities of the screens it's on, and sets
up the colour hardware as appropriate to optimize the display of that
window. If there are any colour registers left over, it then assigns some
to the next window from the front, and so on.

Note that a window might straddle multiple screens, with different
hardware capabilities: one screen might only support black-and-white,
another might have 8 bits per pixel, and yet another might be a 24-bit
direct-RGB display with no colour table at all. You really don't want the
headache of having to worry about all this, not to mention trying
to arbitrate access to display hardware among multiple applications running
at once. And then have to redo it all when the user changes the screen
mode while your program is running.

There are lots of options to give the application fine control, and
access to special functions. In the simplest case, you can adjust
the tolerance of a palette entry, so that you can tell the system that
you would be satisfied with a less-than-exact colour match. This could
let you share colour registers with other windows on the same screen.

There are even functions specifically to allow you to do colour-table
animation, including animating a previously-generated image (Of course,
these only work on a display *with* a colour table). And you can
specify that certain palette entries are only active at certain
screen depths (or only on black-and-white displays, or only on colour
ones), which lets you hand-tune the display to look the best,
no matter what kind of display hardware you're running on.

Is there any windowing hardware which will support this, or, at least,
not prevent the OS from supporting something like this?

Lawrence D'Oliveiro                       fone: +64-71-562-889
Computer Services Dept                     fax: +64-71-384-066
University of Waikato            electric mail: ldo@waikato.ac.nz
Hamilton, New Zealand    37^ 47' 26" S, 175^ 19' 7" E, GMT+12:00
The rest of this message is printed in that special font they use
for the expiry dates on food packages.

gd@geovision.gvc.com (Gord Deinstadt) (04/18/91)

jallen@libserv1.ic.sunysb.edu (Joseph Allen) writes:

>I had an idea for a hardware windowing circuit once which was very simple and
>which would eliminate most of the problems windows have today.  All you do is
>break up the screen into small (maybe character sized) blocks.  Then for each
>block you have a pointer to where in memory the actual data is.  It's really as
>simple as normal character refresh memory,  but with no font chip and with
>wider (32-bits instead of 8) refresh memory.

Hey!  That's my idea! :-)

>So.. has anyone used/seen/proposed something like this?  Why are all modern
>screens simple flat Mac-like clones (I suppose portability is one issue)?

I even prototyped the code.  WITHOUT hardware support, just implementing this
in software, you get a really fast windowing system because there are no
shifts in your display update - just block moves.  And it is dead easy to
figure out which pieces of the screen you have to update.  The most difficult
part is mapping pixel coordinates to memory locations - so line drawing slows
down 10 to 20%.  But it made windowing useable on a 4.77 MHz PC.  I never got
a chance to use it in a product, alas.

>--
That's my .sig lead-in too!  Are you sure you aren't me?-)
--
Gord Deinstadt  gdeinstadt@geovision.UUCP
Ask me about my fast polygon fill, ideally suited for hardware.

gd@geovision.gvc.com (Gord Deinstadt) (04/18/91)

zik@dec19.cs.monash.edu.au (Michael Saleeba) writes:
[Block-based windowing]
>One device which did exactly this was the Texas Instruments TMS9929A (and
>others in the same family). [...]
>It also had hardware sprites and some other goodies. Quite a nice system 
>in a limited sort of way, and it certainly made eight-pixel scrolling quick 
>as you point out. Unfortunately standard bit-scrolling was slower than ever
>since you had to keep track of all those pointers (actually character-
>generator blocks).

The idea (well, as I saw it) is to NOT DO standard bit-scrolling - force
windows to fall on boundaries that are convenient to the hardware.  This
makes more sense on cheap H/W than trying to be totally general and taking
forever to do anything (ie. trying to deliver everything and delivering
nothing).  Of course an application may still have to bit-scroll part
of an image within a window, but this is the application's problem; it
is slower, but the slowness "feels" better because the computer is actually
"doing" something at the time.  It was only marginally slower anyway -
as I had it set up, each window was a regular rectangular array in memory.
The additional processing was caused by the awkward ordering of pixels
within each block or tile.
The pointers were only used by the windowing system as a fast index for
repainting.  Essentially, that's all this is - an inverted index for display
memory.
--
Gord Deinstadt  gdeinstadt@geovision.UUCP

jesup@cbmvax.commodore.com (Randell Jesup) (04/21/91)

In article <1991Apr17.051746.15592@sbcs.sunysb.edu> jallen@libserv1.ic.sunysb.edu (Joseph Allen) writes:
>I had an idea for a hardware windowing circuit once which was very simple and
>which would eliminate most of the problems windows have today.  All you do is
>break up the screen into small (maybe character sized) blocks.  Then for each
>block you have a pointer to where in memory the actual data is.  It's really as
>simple as normal character refresh memory,  but with no font chip and with
>wider (32-bits instead of 8) refresh memory.

	Sounds a lot like "character graphics" - define a custom font, and
store character numbers for the screen.  If you use 16 bit "characters", and
reasonable sized "characters", you can put up anything arbitrarily.  There
are some signifigant drawbacks to this approach, and a few advantages.

>	When a window overlaps another, pointers are simply not made to the
>hidden parts- and the program in the window doesn't have to know that it has
>been overlapped.

	This is (effectively) a very limited version of a graphics copper.
It loses all the bandwidth advantages from making use of the fact that the
windows are sequential within the window (you are effectively executing a
copper instruction for each "character" on the display).  You could get the
same sort of effects (any many more interesting ones) with a reasonably
functional and fast copper (people have made hires windows in the middle of
lores screens using tricks like this, I'm told).

>	The only real disadvantage is that the window sizes are quantized on
>the block sizes instead of on pixels.  I feel this is a very small price to
>pay compared to having to calculate window overlaps everytime you write to the
>screen or any of the other perversions done in current window technology.

	Small is in the eye of the beholder.  Agreed, overlaps are a pain,
but most of the work is done when windows are created/moved/etc.  It's not
cheap, though.

>>Randell Jesup, Keeper of AmigaDos, Commodore Engineering.
>
>Ah-Ha! So now we see who's responsible for AmigaDOS not being a UNIX port/clone
>as it should have been...  (whoever made ':' mean "start the path in root" for
>AmigaDOS path strings should be shot along with the person who made the shell
>wildcard '#*' (or whatever) and not just '*' along with whoever decided to
>copy the MS-DOS device: convention etc., etc., etc.. :-)

	Don't talk to me, talk to some grad students and profs at Cambridge,
the birthplace of Tripos.  In 2.0, you can make * == #? by flipping a bit.
As for ':', etc, it's different, but different does not mean bad.  Be careful
about getting Unixitis (OS likes/dislikes are almost as religious as editors).
At least it's in C/asm now instead of BCPL.  It even has some things Unix
doesn't nowadays.  AmigaOS 2.0 is a major change - I took over AmigaDos
(dos.library) in ~June 1989.

	It wasn't ever going to be a Unix clone (on a 68000 with 256K memory
in 1985).  The Exec kernel is similar to Xinu, though, to this day.  We do
have Amiga Unix, though (see comp.unix.amiga).

-- 
Randell Jesup, Keeper of AmigaDos, Commodore Engineering.
{uunet|rutgers}!cbmvax!jesup, jesup@cbmvax.commodore.com  BIX: rjesup  
Disclaimer: Nothing I say is anything other than my personal opinion.
Thus spake the Master Ninjei: "To program a million-line operating system
is easy, to change a man's temperament is more difficult."
(From "The Zen of Programming")  ;-)