[comp.arch] Blitters and design philosophy

henry@utzoo.uucp (Henry Spencer) (08/01/88)

In article <401@ma.diab.se> pf@ma.UUCP (Per Fogelstr|m) writes:
>... Do that with the CPU. Even with external hardware the CPU will not
>be able to generate the addresses fast enough. (You have to generate a new
>address each 50-100ns).

Have you, pray tell, seen the manual for the AMD 29000?  I have.  Please
read it before proclaiming this performance to be beyond that of a CPU.
(The 29000 is available today, although it's not yet cheap.)  Or the
manual of any modern RISC-based machine, for that matter.

To sort of paraphrase a comment the Mips people have made about memory:
think twice before building specialized hardware to do something a
general-purpose CPU can do, because many people are putting enormous
resources into making the g-p CPUs better and faster, and they may well
catch up with you -- probably sooner than you think.  Exploiting mass-
market products can work better than trying to compete with them.
-- 
MSDOS is not dead, it just     |     Henry Spencer at U of Toronto Zoology
smells that way.               | uunet!mnetor!utzoo!henry henry@zoo.toronto.edu

pf@diab.se (Per Fogelstr|m) (08/03/88)

In article <1988Aug1.062659.25971@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
>In article <401@ma.diab.se> pf@ma.UUCP (Per Fogelstr|m) writes:
>>... Do that with the CPU. Even with external hardware the CPU will not
>>be able to generate the addresses fast enough. (You have to generate a new
>>address each 50-100ns).
>
>Have you, pray tell, seen the manual for the AMD 29000?  I have.  Please
>read it before proclaiming this performance to be beyond that of a CPU.

As a matter of fact i do have an Am29000 manual right here beside me. And
i have read it. I suggest that you read the NS8500 series manuals. If you
do you will find out that the 8500 RGP has three ALU's working in paralell,
one for addresses, one for data (error computation), and one ALU performing
clipping and picking operation. And then there is also a line pattern
computation going on at the same time. All this gives a 2 clock cycle/10Mpix
drawing rate. My graphics memory glows in the dark :-).

And Henry, even if a cpu is generating adresses often, is it usable? It's
like looking at the clock frequency for a flipflop without taking feedback
time in consideration.

Perhaps, Tim Olsen (at Amd) can clear this discussion up. I would belive that
he can present some figures on how fast the 29000 are doing graphics. And i'l
try to match it with the figures from my own NS8500 design.
And if i'm wrong, i will of course admit that !

>(The 29000 is available today, although it's not yet cheap.)

At last something important. The specialized processor (NS8500 RGP) i am
using costs less than $100. This processor needs a NS8511 BPU for each
bitplane plus some buffers. Total cost for these chips c:a $300 for an
8 bitplane system exept memory and video dac's. But okay, when the
Am29000 is below $100 i might consider. But i still have to do something
to replace the BPU's so i'm able to process all bitplanes at a time.
And i hope the code for doing the things the 8500 can, comes with it.
>
>To sort of paraphrase a comment the Mips people have made about memory:
>think twice before building specialized hardware to do something a
>general-purpose CPU can do, because many people are putting enormous
>resources into making the g-p CPUs better and faster, and they may well
>catch up with you -- probably sooner than you think.  Exploiting mass-
>market products can work better than trying to compete with them.
>-- 
What i figure out from this statement is that Henry needs to look around
a litle. You don't have to build your blitter yourself. There is good
silicon on the shelfs today, and if it makes your product better, what's
wrong with using it ? Especially when its cheap. Im not sayin that one
should break one's neck to achive something that a CPU can do better
because it's not worth it. BUT i dont belive that RISC processors are the
"ultimate solution" to everything.
I belive that a CPU like the 29000 for ex.  will do a good job on a 1024x768
screen, but what about 2600x2048 ? Even a specialized processor will have
problems here to make it look really good. But mostly because we must think
about new ways to access the frame buffer. So i think, its best to put
things in it's context. Use whatever solves your problem best in the current
situation.

henry@utzoo.uucp (Henry Spencer) (08/06/88)

In article <409@ma.diab.se> pf@ma.UUCP (Per Fogelstr|m) writes:
>And Henry, even if a cpu is generating adresses often, is it usable? ...

Well, you'd be surprised what can be done with a cpu if you invest a level
of effort comparable to that needed to design a custom chip...  I've never
said that getting fast graphics out of a cpu could be done without a lot of
thought and experimentation.  Most code, running on most cpus, is "first
cut" code -- nobody has ever sat down and seriously thought about it for
a while to see if it could be made faster or better.  It is truly amazing
what can be accomplished if you really do this seriously.

But I admit that the point needs study beforehand.  Note, "study", not
preconceived notions that it has to be hardware because software is too slow.
-- 
MSDOS is not dead, it just     |     Henry Spencer at U of Toronto Zoology
smells that way.               | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

mcdonald@uxe.cso.uiuc.edu (08/06/88)

I have always doubted the value of special purpose hardware, though 
I have been responsible for some real doozies myself. In the vast 
majority of cases, general is better. A recent example from my own
experience: a frient wanted to port a TeX screen previewer I had 
written for the  (very general-purpose) IBM-PC to his whizz-bang
super-dooper graphics Iris 4. One problem - image copies from memory
to the graphics display are real losers on this beast, because they have to
be done essentially one bit at a time. The hardware prevents doing serious
block copies!
There is a function call to do a row at a time , but we suspect it actually
does it pixel by pixel.

news@hoover.UUCP (news) (08/07/88)

in article <409@ma.diab.se>, pf@diab.se (Per Fogelstr|m) says:
> 
> In article <1988Aug1.062659.25971@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
>>In article <401@ma.diab.se> pf@ma.UUCP (Per Fogelstr|m) writes:
>>
>>Have you, pray tell, seen the manual for the AMD 29000?  I have.  Please
> 
> Perhaps, Tim Olsen (at Amd) can clear this discussion up. I would belive that
> he can present some figures on how fast the 29000 are doing graphics. And i'l
> try to match it with the figures from my own NS8500 design.
> And if i'm wrong, i will of course admit that !
> 
>>(The 29000 is available today, although it's not yet cheap.)

Hi Folks,

   It seems that this argument is beginning to smack of ridiculosity(?), 
approaching the proverbial apples vs. oranges war. The AM29000, while being 
a processor of reasonable performance, really can't hold a candle to the 
RGP (DP8500) in terms of graphics (i.e. line drawing or blitting, filling,
etc.) performance (then again, neither can the QPDM).
   I was at Siggraph last week, and surprisingly, NSC amd AMD were next door
to each other... AMD did NOT offer the 29000 as a graphics engine, but rather
as a front end host/xform processor. They also had a rather cute "dancing 
trio" of wireframe figures, Temptations style, on the 29000. The QPDM was 
shown as a graphics engine. We (NSC) showed the RGP family, and seemed to have
quite a flock of attendees hot on the RGP concept in comparison to the other
merchant semi company's offerings, no surprise owing to the provocative ads
seen in most of the trades lately. There are a good number of companies signed
up for the 8500 family, and if the show is any indication, we'll have quite a
few more shortly. Simply put, for any interesting (i.e. >= 8 planes) appli-
cations for chipset type solutions, the 8500 family will blow away the others 
in "traditional" graphics functions. Check out the pixel ports on the BPU for
some wild shading ideas, as well...
   Regarding availability and costs, come and get 'em! Production parts! The
cost is also quite reasonable, with an RGP and 8 BPUs running at $98 in 10k
qty. The RGP family really is not aimed at mere PC VGA-type apps.
   Looking forward to some great flames...

Mike Gehl, Graphics Architecture, National Semiconductor Denver> 

bcase@cup.portal.com (08/08/88)

|At last something important. The specialized processor (NS8500 RGP) i am
|using costs less than $100. This processor needs a NS8511 BPU for each
|bitplane plus some buffers. Total cost for these chips c:a $300 for an
|8 bitplane system exept memory and video dac's. But okay, when the
|Am29000 is below $100 i might consider. But i still have to do something
|to replace the BPU's so i'm able to process all bitplanes at a time.
|And i hope the code for doing the things the 8500 can, comes with it.

You see, this is exactly the problem with special purpose anythings, CISC
instructions included:  they might not be usable if they don't fit exactly
your situation.  WRT the NS8500 stuff, what if I don't want to use a
planar frame buffer because I *must* be backward compatible with some other
organization?  Of course, if the special-puropse something fits like a glove,
then great!

msf@prandtl.nas.nasa.gov.nas.nasa.gov (Michael S. Fischbein) (08/08/88)

In article <46500023@uxe.cso.uiuc.edu> mcdonald@uxe.cso.uiuc.edu writes:
>
>I have always doubted the value of special purpose hardware, though 
>I have been responsible for some real doozies myself. In the vast 
>majority of cases, general is better. A recent example from my own
>experience: a frient wanted to port a TeX screen previewer I had 
>written for the  (very general-purpose) IBM-PC to his whizz-bang
>super-dooper graphics Iris 4.

Of course, a better way to do this would be to translate the required TeX
fonts to the Iris format and simply print them on the screen, using the
Iris special purpose hardware instead of fighting it.  Of course, this would
require rewriting the TeX screen previewer from your (small-memory addressing,
small file space capability, severely limited processing power, no hardware
display scaling or clipping, etc) IBM-PC for the whizz-bang super-dooper
(optimized for real time animation) Iris.


		mike


Michael Fischbein                 msf@ames-nas.nas.nasa.gov
                                  ...!seismo!decuac!csmunix!icase!msf
These are my opinions and not necessarily official views of any
organization.

pf@diab.se (Per Fogelstr|m) (08/09/88)

In article <7984@cup.portal.com> bcase@cup.portal.com writes:
>|At last something important. The specialized processor (NS8500 RGP) i am
>|using costs less than $100. ....................
>
>................ WRT the NS8500 stuff, what if I don't want to use a
>planar frame buffer because I *must* be backward compatible with some other
>organization?  Of course, if the special-puropse something fits like a glove,
>then great!

Well in that case i would use the TI 320?0 Graphics processor :-). But i don't
think you are completly right. We are planning to replace a HITACHI 63483
design with the RGP. And the Hitachi are by no means "planar". However we are
not using the "Replace if less than" etc. operations, and it helps ;-).
The reason for using planar architecture is SPEED and FLEXIBILITY. You will
have the same speed in a monochrome as in a 64+ deep frame buffer. And now
to the really nice things; It doesn't cost anything before i'm really
expanding!  Expansion is achived by adding a bitplane alu (BPU), the frame
buffer memory chips, and a shift register. And i don't have to rewrite my
software. The same software goes for both monochrome AND color!. (I don't
need to care for the X-window mfb/cfb nightmare).
By the way, For raster op's planar or pixel organization doesn't matter much
unless i have to do op's involving more than one plane. (Such as replace if
less or equal etc.) And if i have to do it the DP8500 can handle that to.
However it's not as fast as one could wish.  (About 5us to process a pixel).

aglew@urbsdc.Urbana.Gould.COM (08/09/88)

>>I have always doubted the value of special purpose hardware, though 
>>I have been responsible for some real doozies myself. In the vast 
>>majority of cases, general is better. A recent example from my own
>>experience: a frient wanted to port a TeX screen previewer I had 
>>written for the  (very general-purpose) IBM-PC to his whizz-bang
>>super-dooper graphics Iris 4.
>
>Of course, a better way to do this would be to translate the required TeX
>fonts to the Iris format and simply print them on the screen, using the
>Iris special purpose hardware instead of fighting it.  Of course, this would
>require rewriting the TeX screen previewer from your (small-memory addressing,
>small file space capability, severely limited processing power, no hardware
>display scaling or clipping, etc) IBM-PC for the whizz-bang super-dooper
>(optimized for real time animation) Iris.

So, a working program, with acceptable performance on an already existing
processor, has to be completely reworked in order to even barely run on
a "high performance" graphics system?

This, of course, doesn't even consider that it may be impossible to get
TeX semantics using an existing font generator on the Iris - in the same
way that most troff output to Postscript printers positions each character
individually.

daveh@cbmvax.UUCP (Dave Haynie) (08/11/88)

in article <1988Aug1.062659.25971@utzoo.uucp>, henry@utzoo.uucp (Henry Spencer) says:

> In article <401@ma.diab.se> pf@ma.UUCP (Per Fogelstr|m) writes:
>>... Do that with the CPU. Even with external hardware the CPU will not
>>be able to generate the addresses fast enough. (You have to generate a new
>>address each 50-100ns).

> Have you, pray tell, seen the manual for the AMD 29000?  I have.  Please
> read it before proclaiming this performance to be beyond that of a CPU.
> (The 29000 is available today, although it's not yet cheap.)  Or the
> manual of any modern RISC-based machine, for that matter.

Not even just in RISC chips, either.  This new 33MHz 68030 I just got in (also
not at all cheap) can run normal memory cycles in 60ns, burst cycles for cache
fills in 30ns.  That's certainly in the aforementioned ballpark.

Of course, this part is also very "not yet cheap".  It would make a killer
graphics engine, as would an AMD 29K or just about any CPU at that speed.  But
if you're going to that kind of high end to begin with, you can for the most
part get much better graphic performance out of some special purpose graphic
chip set like the National 8500 series.  For quite a bit less, too.  No reason
you can't make the host CPU a hot micro, and use it for anything that it can
do faster than the custom display processors.  But no matter how you slice it,
an 8500 system with 24 bitplanes is scrolling 24 planes in parallel, while the
'030 or 29K will have to scroll 24 planes one-at-a-time.  

> MSDOS is not dead, it just     |     Henry Spencer at U of Toronto Zoology
> smells that way.               | uunet!mnetor!utzoo!henry henry@zoo.toronto.edu
-- 
Dave Haynie  "The 32 Bit Guy"     Commodore-Amiga  "The Crew That Never Rests"
   {ihnp4|uunet|rutgers}!cbmvax!daveh      PLINK: D-DAVE H     BIX: hazy
		"I can't relax, 'cause I'm a Boinger!"

rminnich@super.ORG (Ronald G Minnich) (08/11/88)

In article <28200187@urbsdc> aglew@urbsdc.Urbana.Gould.COM writes:
>So, a working program, with acceptable performance on an already existing
>processor, has to be completely reworked in order to even barely run on
>a "high performance" graphics system?
   well, ok, but does that say that we shouldn't have high performance
graphics? Or does that say, maybe, that our support tools are quickly
running out of gas? I think the latter. Looking at "Hello, world" for
X11 written in C (see 'going for baroque' in a recent Unix Review)
strengthens the impression. 
Lessee, where do the followups go? Not here ...
ron
P.S. 'acceptable performance'. Gee, i never knew there was such a thing.
seems its always too slow to me :-)