david@sun.com (Do you, Peanut Butter, take this Jelly...) (03/15/89)
Does anyone know for sure if the PMAX frame buffers are cached? It would also be nice to know the frame buffer access cycle time if that's public info. Any comments on the pros and cons of caching frame buffers? -- David DiGiacomo, Sun Microsystems, Mt. View, CA sun!david david@sun.com
keith@mit-vax.LCS.MIT.EDU (Keith Packard) (03/22/89)
In article <93849@sun.Eng.Sun.COM> david@sun.com (Do you, Peanut Butter, take this Jelly...) writes: >Does anyone know for sure if the PMAX frame buffers are cached? The PMAX frame buffers are not cached. Naturally, the hardware would allow this, but it is disabled. >Any comments on the pros and cons of caching frame buffers? I've been convinced that caching the frame buffer is a bad idea. The reason is obvious in the 8-bit color frame buffer mode. In 8 bit mode, the machine is almost never reading data back from the frame buffer for drawing. When blting, you almost always read a tremendous amount of sequential data, so the amount of useful data in the cache is probably zero. Worse yet, you're ending up with a data cache full of data you'll never need again (i.e. the bits just blted). A monochrome system is less clear; I've been told that benchmarks on the PMAX tend to favor an uncached system. Now, if the cache was a bit fancier and did burst mode for filling a long run of data on a read miss, then blt might be faster with the cache turned on. I'd rather see special blt hardware though. I can imagine all sorts of fun uses (like system/user data copy) for a fast piece of general memory copy hardware. keith packard keith@expo.lcs.mit.edu
henry@utzoo.uucp (Henry Spencer) (03/23/89)
In article <5845@mit-vax.LCS.MIT.EDU> keith@mit-vax.UUCP (Keith Packard) writes: >I'd rather see special blt hardware though. I can imagine all sorts >of fun uses (like system/user data copy) for a fast piece of general >memory copy hardware. Before asking for copy hardware, find out whether your CPU is good enough (when programmed carefully) to saturate the memory bandwidth doing copying. Many modern CPUs are, which means that simple data copying cannot possibly be speeded up with a hardware copier. To see real benefits, you either need a memory system that the CPU cannot fully exploit, or some additional complexity in the operation to be done that slows the CPU down. -- Welcome to Mars! Your | Henry Spencer at U of Toronto Zoology passport and visa, comrade? | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
jrg@Apple.COM (John R. Galloway) (03/23/89)
In article <1989Mar22.175616.1420@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >In article <5845@mit-vax.LCS.MIT.EDU> keith@mit-vax.UUCP (Keith Packard) writes: >>I'd rather see special blt hardware though. I can imagine all sorts >>of fun uses (like system/user data copy) for a fast piece of general >>memory copy hardware. > >Before asking for copy hardware, find out whether your CPU is good enough >(when programmed carefully) to saturate the memory bandwidth doing copying. Its not quite that simple, since if the copy hardware runs direclty out of memory (or has its own cache) then the CPU can be running in its own cache and doing other useful work while the copy is going on, until the copy is interrupted for the CPU to get another cache line. apple!jrg John R. Galloway, Jr. contract programmer, San Jose, Ca These are my views, NOT Apple's, I am a GUEST here, not an employee!!
jg@crltrx.crl.dec.com (Jim Gettys) (03/24/89)
In article <27738@apple.Apple.COM> jrg@Apple.COM (John R. Galloway) writes: > >Its not quite that simple, since if the copy hardware runs direclty out >of memory (or has its own cache) then the CPU can be running in its own >cache and doing other useful work while the copy is going on, until the >copy is interrupted for the CPU to get another cache line. >apple!jrg John R. Galloway, Jr. contract programmer, San Jose, Ca > >These are my views, NOT Apple's, I am a GUEST here, not an employee!! It is interesting to note, that with the exception of scrolling, the PMAX (DECstation 3100) color frame buffer code ends up outperforming the GPX hardware on uVAXes, sometimes by radical amounts. (or at least, the statement will be true in the next release of the software, anyway; most things do already in the first release on the DECstation, but more work has been done since). So as usual, general purpose hardware catches up with special purpose hardware of the previous. And for many operations, the graphics hardware in the GPX was more of a hindrance than a help. Many operations are completely bound by memory bandwidth between the processor and the frame buffer. Also note that most frame buffer operations are WRITES, and not READS, so with most caches, (for example, the write through cache on the PMAX), you quickly end up being limited by the memory subsystem. And graphics operations generally go romping through substantial chunks of memory quickly, flushing the data from the cache. Running the display cached would interfere with other programs on the machine (like the X server, or your favorite application). Preliminary experiments showed that running cached did not help color, and may have helped monochrome somewhat, but the mono case was far from obvious, particularly when you consider interference with other programs running on the machine. So at this instant, we are running the frame buffers uncached. At some point it would be interesting to rerun the tests; the frame buffer can be run either cached or unchached with a simple driver change. All this will have to be re-examined as cache sizes and organizations change; for example, if the cache were 10 times larger, the probability of hitting the cache for the scrolling case would be very much higher (if you have a 64k D cache, as on the PMAX, a large window on a 8 bit frame buffer would have a good chance of still being in the cache the next time you scrolled one line). - Jim Gettys
jrg@Apple.COM (John R. Galloway) (03/25/89)
> > >It is interesting to note, that with the exception of scrolling, the PMAX >(DECstation 3100) color frame buffer code ends up outperforming the GPX >hardware on uVAXes, sometimes by radical amounts. Gosh I sure would like to have one of these beasts. >Also note that most frame buffer operations are WRITES, and not READS, so Thats interesting. I would have thought that there would be a lot of bit blit type activiity (scrolling or window movement, or (with backing store, or was that save under) expose events) there would be pretty heavy read/write actions, and then an additional amount of plain writes (new stuff from the application). If current systems can completely use up the memory bandwidth perhaps its time for smart frame buffers that can do multiple concurent bit blits internally (with out using up external bandwidth unless there is a conflict), but then if writes really dominate this wouldn't help. Where did the data being written come from (i.e. is this a bit blit with X in the middle or is it really writing new data?) So whats the next step for more performance (I assme that there are graphics operations on the 3100 that are not as fast as you would like??). apple!jrg John R. Galloway, Jr. contract programmer, San Jose, Ca These are my views, NOT Apple's, I am a GUEST here, not an employee!!