[comp.arch] PMAX/DS3100 frame buffers

david@sun.com (Do you, Peanut Butter, take this Jelly...) (03/15/89)

Does anyone know for sure if the PMAX frame buffers are cached?  It would
also be nice to know the frame buffer access cycle time if that's public
info.

Any comments on the pros and cons of caching frame buffers?

-- 
David DiGiacomo, Sun Microsystems, Mt. View, CA  sun!david david@sun.com

keith@mit-vax.LCS.MIT.EDU (Keith Packard) (03/22/89)

In article <93849@sun.Eng.Sun.COM> david@sun.com (Do you, Peanut Butter, take this Jelly...) writes:
>Does anyone know for sure if the PMAX frame buffers are cached?

The PMAX frame buffers are not cached.  Naturally, the hardware would 
allow this, but it is disabled.

>Any comments on the pros and cons of caching frame buffers?

I've been convinced that caching the frame buffer is a bad idea.  The reason
is obvious in the 8-bit color frame buffer mode.  In 8 bit mode, the machine
is almost never reading data back from the frame buffer for drawing.  When
blting, you almost always read a tremendous amount of sequential data, so
the amount of useful data in the cache is probably zero.  Worse yet, you're
ending up with a data cache full of data you'll never need again (i.e. the
bits just blted).

A monochrome system is less clear; I've been told that benchmarks on the
PMAX tend to favor an uncached system.

Now, if the cache was a bit fancier and did burst mode for filling
a long run of data on a read miss, then blt might be faster with
the cache turned on.

I'd rather see special blt hardware though.  I can imagine all sorts
of fun uses (like system/user data copy) for a fast piece of general
memory copy hardware.

keith packard
keith@expo.lcs.mit.edu

henry@utzoo.uucp (Henry Spencer) (03/23/89)

In article <5845@mit-vax.LCS.MIT.EDU> keith@mit-vax.UUCP (Keith Packard) writes:
>I'd rather see special blt hardware though.  I can imagine all sorts
>of fun uses (like system/user data copy) for a fast piece of general
>memory copy hardware.

Before asking for copy hardware, find out whether your CPU is good enough
(when programmed carefully) to saturate the memory bandwidth doing copying.
Many modern CPUs are, which means that simple data copying cannot possibly
be speeded up with a hardware copier.  To see real benefits, you either
need a memory system that the CPU cannot fully exploit, or some additional
complexity in the operation to be done that slows the CPU down.
-- 
Welcome to Mars!  Your         |     Henry Spencer at U of Toronto Zoology
passport and visa, comrade?    | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

jrg@Apple.COM (John R. Galloway) (03/23/89)

In article <1989Mar22.175616.1420@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
>In article <5845@mit-vax.LCS.MIT.EDU> keith@mit-vax.UUCP (Keith Packard) writes:
>>I'd rather see special blt hardware though.  I can imagine all sorts
>>of fun uses (like system/user data copy) for a fast piece of general
>>memory copy hardware.
>
>Before asking for copy hardware, find out whether your CPU is good enough
>(when programmed carefully) to saturate the memory bandwidth doing copying.

Its not quite that simple, since if the copy hardware runs direclty out
of memory (or has its own cache) then the CPU can be running in its own
cache and doing other useful work while the copy is going on, until the
copy is interrupted for the CPU to get another cache line.
apple!jrg	John R. Galloway, Jr.       contract programmer, San Jose, Ca

These are my views, NOT Apple's, I am a GUEST here, not an employee!!

jg@crltrx.crl.dec.com (Jim Gettys) (03/24/89)

In article <27738@apple.Apple.COM> jrg@Apple.COM (John R. Galloway) writes:
>
>Its not quite that simple, since if the copy hardware runs direclty out
>of memory (or has its own cache) then the CPU can be running in its own
>cache and doing other useful work while the copy is going on, until the
>copy is interrupted for the CPU to get another cache line.
>apple!jrg	John R. Galloway, Jr.       contract programmer, San Jose, Ca
>
>These are my views, NOT Apple's, I am a GUEST here, not an employee!!


It is interesting to note, that with the exception of scrolling, the PMAX
(DECstation 3100) color frame buffer code ends up outperforming the GPX 
hardware on uVAXes, sometimes by radical amounts.
(or at least, the statement will be true in the next release of the software,
anyway; most things do already in the first release on the DECstation, but
more work has been done since).

So as usual, general purpose hardware catches up with special purpose hardware
of the previous.  And for many operations, the graphics hardware in the
GPX was more of a hindrance than a help.

Many operations are completely bound by memory bandwidth between
the processor and the frame buffer.

Also note that most frame buffer operations are WRITES, and not READS, so
with most caches, (for example, the write through cache on the PMAX),
you quickly end up being limited by the memory subsystem.  And graphics 
operations generally go romping through substantial chunks of memory quickly,
flushing the data from the cache.

Running the display cached would interfere with other programs on the machine
(like the X server, or your favorite application).

Preliminary experiments showed that running cached did not help color, and
may have helped monochrome somewhat, but the mono case was far from obvious,
particularly when you consider interference with other programs running on the
machine.  So at this instant, we are running the frame buffers uncached.

At some point it would be interesting to rerun the tests; the frame buffer
can be run either cached or unchached with a simple driver change.

All this will have to be re-examined as cache sizes and organizations change;
for example, if the cache were 10 times larger, the probability of hitting
the cache for the scrolling case would be very much higher (if you have
a 64k D cache, as on the PMAX, a large window on a 8 bit frame buffer
would have a good chance of still being in the cache the next time you
scrolled one line).
				- Jim Gettys

jrg@Apple.COM (John R. Galloway) (03/25/89)

>
>
>It is interesting to note, that with the exception of scrolling, the PMAX
>(DECstation 3100) color frame buffer code ends up outperforming the GPX 
>hardware on uVAXes, sometimes by radical amounts.

Gosh I sure would like to have one of these beasts.

>Also note that most frame buffer operations are WRITES, and not READS, so

Thats interesting.  I would have thought that there would be a lot of
bit blit type activiity (scrolling or window movement, or (with backing store,
or was that save under) expose events) there would be pretty heavy read/write
actions, and then an additional amount of plain writes (new stuff from the
application).  If current systems can completely use up the memory bandwidth
perhaps its time for smart frame buffers that can do multiple concurent bit
blits internally (with out using up external bandwidth unless there is a
conflict), but then if writes really dominate this wouldn't help.  Where did
the data being written come from (i.e. is this a bit blit with X in the middle
or is it really writing new data?)  So whats the next step for more
performance (I assme that there are graphics operations on the 3100 that are
not as fast as you would like??).

apple!jrg	John R. Galloway, Jr.       contract programmer, San Jose, Ca

These are my views, NOT Apple's, I am a GUEST here, not an employee!!