[sci.virtual-worlds] Cray & The Connection Machine: Making a Marriage in Heaven

kilian@poplar.cray.com (Alan Kilian) (04/01/91)

>From: herbt@apollo.sarnoff.com (Herbert H Taylor III)

Now we've got something to talk about. Yippeeee!

O.K. Herb thinks that VR is mostly if not entirely integer math.

> For one thing VW would not appear to be as much of a GIGA-FLOP problem as a
> GIGA-OP problem.
>
> With the exception of polygonally rendered virtual worlds there do not appear
> to be that many places where floating point operation is critical. And while
> polygon rendering does entail floating point such operations are better
> accomodated (sic) with special dedicated processors at the "point" of need 
> (as in the SGI).
>
> Other approaches forego polygon rendering entirely and hence possess little
> or no floating point.
>
> Certainly, input and display processing is primarily fixed point digital
> signal processing.

What we have here is a difference in application. What I would like to do with
VR is to be able to have a running CFD (Computational Fluid Dynamics) lab and
introduce objects into the fluid flow. For example I'd pull a wing from the
"wing library" and stick it onto an airplane body from the "body library"
and I would immediately see the fluid flow around the wing/body structure.
I could resize the wing by grabbing the tip and pulling. I could change the
angle of attack by grabbing the nose or tail of the body and pulling.
This obviously takes a ton of floatingpoint arithmetic to do. It depends on
the application and the grid sizes, but it's lots anyway.

How about a running quantum chemistry application doing enzyme catalyzed
reactions. I cound grab a protein from the library and plotch it down and then
grab an enzyme and push it toward the protein. The chemistry package would
do the analysis and I could see the enzyme deform the protein and then watch
the protein be cleaved into two parts. Floats floats everywhere you look.

Some other floatingpoint intensive applications: Car crash simulation, weather
modeling and prediction, structural optimisation, electrical circuit 
simulation, oil reservoir simulation, aircraft (Or any) control simulations,
heat transfer modeling, injection molding, nuclear plant simulation. The list
just goes on and on and on. And on.
Basically everything "science" is all floatingpoint.

So, the moral of the story is that we need gigaflops.

> A second question concerns whether the Cray's essentially "single
> processor" model is right for VW.
>
> First, consider a VW display at 1Kx1Kx30fps (they'll be here sooner then we
> think...). The total number of data points which must be processed is
> 3x10**7 (x2 for stereo headset). Assume in the simplest case that these data
> points correspond to pixels. How many operations per pixel are required?

Edited. The point is that if you have to compute a lot on a per pixel basis
then you can't do it.
O.K. That means that raytracing is out for a rendering stage. Now if we use
polygon rendering then maybe we only have to compute a few (maybe 20) values
per polygon and a polygon might cover a few hundred pixels. Then we can get
some work done.
 
>   Massively Parallel architectures do in fact appear to be a more
> natural and cost effective fit for high end VW processing.

Oooooohhhhhh I don't think that a "fact".  Let's try the next statement.

> 2048 moderately powerful processors running at 20 Mhz is 41 GigaOPS - or
> one operation effectively every 24 picoseconds. As long as each of the
> world components can be cast in an appropriate data parallel model and
> can sustain continuous I/O then complex, real-time virtual worlds can
> be managed. 

O.K. if you have 20 megaOPERATIONS per processor and we have one object per
processor and we have 60 frames per second then each object can compute about
a third of a million OPERATIONS per frame. This seems reasonable, but I don't
think the FLOATINGPOINT computations are nearly this fast so if we do 
all of our floatingpoint in software the rate probably goes down to maybe
3000 floatingpoint operations per frame. This seems too slow. Did you see how
I used "probably" and "maybe" In that last sentence? Did ya? That's called keep
you're ass covered just in case this guy comes up with a 41 Gigaflop machine.

Now this is too slow for 1 or 2048 objects. The point is that you have to have
simple objects but you can have a lot of them. I am assuming that each object
needs to be computed atomically so that you cannot have more then one process
or working on an object at a time. This simply depends on your objects.

With a gigaflop cpu as you add objects to the world the whole thing slows
down and when you get to maybe 500 objects it's just as slow as the above 
example was with 2048 but the point is that up to 100 objects it's many times
faster.

>   Now there is all that I/O to consider - either to the display
> subsystem or to the network.

>                        In a VR system each component must perform its
> functions continuously with LITTLE OR NO LATENCY.

No. this is simply not true. The latency from head motion to display generation
on the HIT labs VPL based VR system is from 1 to 4 seconds. This is a long time
in computer lives. and this is arguable the best VR system in production.
You definately want the latency to be small but to require "LITTLE OR NO" is
simply silly. You do need to have each machine running asynchronously 
independant of it's predecessor. If the simulation can't keep up for one frame
or of a packet gets lost you can't have the image "jerk". It can be slightly
wrong for that frame but it can't simply stop.

> A lost packet
> becomes a visible display defect in the continuous real-time virtual
> world unless there is sufficient overhead to permit retransmissions.

No again. It's not overhead, it's how fast can you fully compute the world.
If you can fully compute the world in 1/1000 second then you have a lazy .016
seconds to get the data to somewhere else. If it takes you 1/61 second
to compute the world then you only have .0002 second to get the data out of
here. It's all in how fast the floatingpoint simulations run.

> If multiple virtual worlds share a common interaction space then all
> "round trips" must also meet this criteria.  This applies to VW
> systems regardless of whether they employ a head mounted display or
> data glove technology.  The VW display and input processors must be
> able to provide some form of continuous visual feedback to virtual
> world participants - so that individual responses to multiple
> processed inputs can be in continuous real-time.

Right on.

>   None of the previous discussion of world processing considered the
> cost of transfering (sic) processed world data to a display or I/O facility.
> In systems with video rams this might be transparent.  For example, in
> the Princeton Engine [Chin, et al] up to seven entire scanlines are
> simultaneously copied out once every horizontal line time, both
> continuously and transparantly(sic). (NTSC=63usecs) However, a single
> processor would seem to suffer a significant memory I/O bottleneck
> unless a signifcant (sic)  "back door" is in place. Presumedly systems such
> as Alan describes on Cray provide an equivalent facility.

I don't understand this but I'll comment on it anyway. (What the hell, it 
never stops anyone else)

I think I should have tried to describe my theoretical VR system first.
I'm not suggesting that a small number of supercomputer CPUs can do all the
work in a VR system. I'm suggesting the following:

Start with a simple system to read the input hardware. This would be the
head location and orientation, the hand location and orientation and the
finger positions in a VPL system. This system would pack this information
up and urp it onto a network. This data is small and does not occur very
frequently. Like 60 times per secone or so. Not very high packet rate for
ethernet for example.

Now send this data to a supercomputer which would do the world simulation.
Here depending on you're application is where you bolt a Cray and a Connection
machine together to do the physical simulation of the world.

Then send the world state to a pair of graphics machines. These machines would
be in charge of taking the "world" and generating pixels and displaying it
on the head mounted display. 

The state of the world might be scalar position and velocity vectors for
moving objects and the graphics machines would have the objects description
or it might be polygons for an isosurface of a flow field. It just depends.

> 
>   One other sidenote on I/O. We believe that for Virtual Worlds frame
> rates will ultimately need to be higher then 30 fps.

Again Right on. And more than 640 X 480 X 256 colors.

>  Although high speed disk I/O may be crucial to virtual world data 
> archives, it is not clear how it impacts the management of the world
> itself.  World data originates from real-time sources such as camaras (sic)
> or data gloves, must be processed continuously in real-time and is
> displayed on real-time display systems.

No I don't agree with this at all. World data originates from the world. It is
a continuously running simulation of something. User data comes asynchronously
and slowly from real time sources.

AGAIN it depends on you're application.

> Next, Alan brags a little about MAIN memory:
>    In all fairness to the CM2 At least two that I am aware of have
> been loaded with 8 gigabytes... (Insert shameless Princeton Engine Plug:
> One Princeton Engine has been configured with 1 Gigabyte of video rate
> memory - that's 1000 real-time temporal frames...)

I don't know what "real-time temporal frames" means but I can get 640*480*3*1000
to be about 1 Gigabyte.

> Now, Seymore, if you were out in your field and 2048 Pit Bulls
> suddenly came at you from all directions would you rather have 2048
> hunters with shotguns or a couple of Cruise Missiles?
>   -herb taylor (on massively parallel machines)

Just one Cruise missile would work fine thank you. The odds of getting blasted
by one of the hunter is a little too high for my tastes.

I liked the reply:
  If you were out in a field and had bird seeds to pick up which would you
  rather have 1024 chickens or 2 strong oxen?

I'd like to thank Herb for his views. I really appreciate it when someone can
express HIS (Hi Bob ;-)) ideas via Email and not get snotty. It's a thing we
should all emulate. Thanks again Herb.

I think we'd all drool over a Cray YMP bolted to a Connection machine bolted
to two Evans and Sullivan workstations and a bit of VPL here and there.

Oh, and some software to boot.

 -Alan Kilian kilian@cray.com                  612.683.5499
  Cray Research, Inc.           | If you were plowing a field what would you
  655 F Lone Oak Drive          | rather use? 2 strong oxen or 1024 chickens?
  Eagan  MN,     55121          | -Seymour Cray (On massivly paralell machines)

-- 

uselton@nas.nasa.gov (Samuel P. Uselton) (04/03/91)

[MODERATOR'S NOTE:  This is an appropriate coda to the Cray-Connection debate.
Thanks! -- Bob J.]

From: kilian@poplar.cray.com (Alan Kilian)

>What we have here is a difference in application. What I would like to do with
>VR is to be able to have a running CFD (Computational Fluid Dynamics) lab and
>introduce objects into the fluid flow. For example I'd pull a wing from the
>"wing library" and stick it onto an airplane body from the "body library"
>and I would immediately see the fluid flow around the wing/body structure.
>I could resize the wing by grabbing the tip and pulling. I could change the
>angle of attack by grabbing the nose or tail of the body and pulling.
>This obviously takes a ton of floatingpoint arithmetic to do. It depends on
>the application and the grid sizes, but it's lots anyway.

Alan, I don't think you realize how many tons of floating point arithmetic you
are specifying.  We (NAS Systems Division, NASA Ames) have a high end Cray
YMP8 and some very advanced code for doing CFD simulation.  To calculate
the flow over a wing in isolation takes hours.  That doesn't count the prep 
time deciding how to grid the volume surrounding the wing.  Designing grids
for complex surfaces (like aircraft bodies) takes man-months to man-years.
So I think it will be quite some time before you can 
>"pull a wing from the
>"wing library" and stick it onto an airplane body from the "body library".
A stated "Grand Challenge" of the Aerodynamics side of NASA (and of the NAS
project in particular) is to get such CFD flow solutions running in interactive
speeds by the year 2000.  Just being able to change the angle of attack or
lower the flaps is EXTREMELY demanding.  I don't think you will see aircraft
designers
>resize the wing by grabbing the tip and pulling
because they have a lot more analytical requirements; they don't "eyeball" 
designs.

>So, the moral of the story is that we need gigaflops.

Teraflops are required!  ANY current machine is going to have to evolve 
quite a bit to get there.
I think that ALL vendors who try to stay near the high end of supercomputing
performance are going to need both many processors AND powerful floating
point processors.  How about MANY, POWERFUL floating point processors?
Rumblings I hear say that BOTH Crays are working on systems in which the
number of processors go up.  And with the recent upgrade to our CM (oh yes,
we have one of them too - 32K processors, 1K Weiteks,...) the "new programming
model (at least for us in the CFD world) is to think of the machine as 
1K Weiteks rather than 32K 1 bit processors.

But what does this all have to do with VR?  Really?  Only that there is a
high-end set of "worlds" that people would like to explore, that can absorb
as much resource as is available.  And people with supercomputers can probably
afford VR hardware, software, development,....

For now we are computing our CFD simulations off-line (on Crays, CMs and 
Intel i860 hypercubes) and developing VR technology to explore the results
interactively.

>
>>   Now there is all that I/O to consider - either to the display
>> subsystem or to the network.
>
>>                        In a VR system each component must perform its
>> functions continuously with LITTLE OR NO LATENCY.
>
>No. this is simply not true. The latency from head motion to display generation
>on the HIT labs VPL based VR system is from 1 to 4 seconds. This is a long time
>in computer lives. 

In MY experience, head tracking latency greater than 1/2 second is really 
distracting, and becomes uncomfortable quickly.  The lag has 2 parts:
(a) how long does it take to get the new head position info and transform
it into useful shape;
(b) once you know the new view, how long to redraw the picture.

>and this is arguable the best VR system in production.
VPL may have the most press, have been doing production longest, and have 
greatest market penetration and visibility.  
I would never accept something with 1 to 4 second lag in head tracking.
(1) I suspect VPL does better "most of the time".
(2) I like our boom mounted head tracking viewer for "real work" although
I don't think it gives quite as flashy demos.  - 
higher resolution, 
no polhemus to get distorted by metal, video signals,...
head track in well under 1/60th second including conversion to useful form,



I also agree with much of what the other poster (whose message I failed to
copy) about special purpose rendering hardware.  We drive our VR with an
SGI 340/VGX, soon to be upgraded to 380/VGX, because we want to put more
visualization primitives into the flow field, and update them.  (The 
project is designed for exploring UNSTEADY flows.)