[sci.virtual-worlds] Last words on Cray-Connection Machine debate?

herbt@apollo.sarnoff.com (Herbert H Taylor III) (03/28/91)
 
  As I very much want to avoid steping into a flame "war in progress"
I would like to respond specifically to some of Alan's points that are
relevant to sci.virtual-worlds readers - namely - the computational
requirements for sustaining complex and interesting Virtual Worlds.
Having strived (along with several of my collegues) for almost ten
years to build a supercomputer specifically oriented toward video
processing I remain in awe of those giants who blazed the trail. My
personal SPEED Hall of Fame goes something like this: Nolan Ryan,
Seymore Cray, Danny Hillis...

After clarifying Crays cooling strategy Alan declares:

** Oh how about 8 CPUs capable of 2.2 Gigaflops? (That's 2.2G for all 8, not
** for each processor; and this is a demonstrated number for a real science
** application, not a matrix multiply loop)

  No one argues that a Cray is NOT a very, very fast machine - but
that does not mean that it is an effective VW machine. For one thing
VW would not appear to be as much of a GIGA-FLOP problem as a GIGA-OP
problem.  If one decomposes a virtual world into simple computational
elements such as "input" processors and world processors, as well as,
network and "output" display processors, then one can gain a sense of
the performance requirements through the entire pipeline. With the
exception of polygonally rendered virtual worlds there do not appear
to be that many places where floating point operation is critical. And
while polygon rendering does entail floating point such operations are
better accomodated with special dedicated processors at the "point" of
need (as in the SGI). Other approaches forego polygon rendering
entirely and hence possess little or no floating point. Certainly,
input and display processing is primarily fixed point digital signal
processing.

  A second question concerns whether the Cray's essentially "single
processor" model is right for VW. First, consider a VW display at
1Kx1Kx30fps (they'll be here sooner then we think...). The total
number of data points which must be processed is 3x10**7 (x2 for
stereo headset). Assume in the simplest case that these data points
correspond to pixels. How many operations per pixel are required? A
simple television-like decoder would require 100's of operations per
pixel.  If the data stream included MPEG-like compression then there
could be >> 1000 operations per pixel. (In fact an 8x8 DCT alone takes
about 800 16bit operations per pixel.) Dr. Fukinuki of Hitachi has
placed the number for HDTV resolution at 10**4 operations per pixel
[Kahaner]. Remember, if you want use a high resolution (HDTV) head
mounted display you will have to do a lot more processing then is done
in present displays. We believe that for virtual worlds to approach
PROCESSED HDTV resolutions between 1,000 and 10,000 operations per
pixel will be required. Using the lower figure we have 3x10**10
operations per pixel per second. For a single processor model that
translates to 3.3x10-11 seconds per operation, or 33picoseconds.
Perhaps, Alan could advise us as to when the "Cray 33" will be extant?
Even, assuming one could effectively utilize all processors of a 64
processor Cray with a 2NS instruction clock and matched interprocessor
communication there would barely be enough time to complete the
required processing. And this is a minimum system.

  Massively Parallel architectures do in fact appear to be a more
natural and cost effective fit for high end VW processing. 2048
moderately powerful processors running at 20 Mhz is 41 GigaOPS - or
one operation effectively every 24 picoseconds. As long as each of the
world components can be cast in an appropriate data parallel model and
can sustain continuous I/O then complex, real-time virtual worlds can
be managed. 

Alan also points out:

** You want to try virtual reality with REALLY complicated worlds and tons
** of I/O? How about demonstrated TCP/IP performance, Cray to Cray, of over
** 300 Megabits per second! What? 300 Megabits with TCP/IP? No way!!
** Yep. We do it every day.

  Now there is all that I/O to consider - either to the display
subsystem or to the network. The data pipe Alan describes on the Cray
is awesum, perhaps, when applied to scientific data visualization or
file transfer protocols, but how does it impact Virtual Worlds mapped
across multiple Crays? In a VR system each component must perform its
functions continuously with LITTLE OR NO LATENCY. A lost packet
becomes a visible display defect in the continuous real-time virtual
world unless there is sufficient overhead to permit retransmissions.
If multiple virtual worlds share a common interaction space then all
"round trips" must also meet this criteria.  This applies to VW
systems regardless of whether they employ a head mounted display or
data glove technology.  The VW display and input processors must be
able to provide some form of continuous visual feedback to virtual
world participants - so that individual responses to multiple
processed inputs can be in continuous real-time.

  None of the previous discussion of world processing considered the
cost of transfering processed world data to a display or I/O facility.
In systems with video rams this might be transparent.  For example, in
the Princeton Engine [Chin, et al] up to seven entire scanlines are
simultaneously copied out once every horizontal line time, both
continuously and transparantly. (NTSC=63usecs) However, a single
processor would seem to suffer a significant memory I/O bottleneck
unless a signifcant "back door" is in place. Presumedly systems such
as Alan describes on Cray provide an equivalent facility.

  One other sidenote on I/O. We believe that for Virtual Worlds frame
rates will ultimately need to be higher then 30 fps. For example, to
project images from a remotely sensed combustion experiment will
require hundreds of frames per second of aquisition - but in a burst
of only a few seconds duration. In order to walk through the data set
we must have considerable flexibility in the play back frame rate.
Ultimatly one would like to trade off spatial resolution with frame
rate - providing the video equivalent of multiple exposures spread
across a high resolution image which can be "played" in a virtual
window. 

Alan then describes extraordinary disk I/O rates:

** And you get up to 2 100 MegaBYTE per second channels, many, many, many
** 12.5 MegaBYTE per second disk drive channels (which can run simultaneously).

 Although high speed disk I/O may be crucial to virtual world data 
archives, it is not clear how it impacts the management of the world
itself.  World data originates from real-time sources such as camaras
or data gloves, must be processed continuously in real-time and is
displayed on real-time display systems.

Next, Alan brags a little about MAIN memory:

** Hey, how about 4294967000 Bytes of MAIN memory? (Yep that's 4.2 _billion._)

   In all fairness to the CM2 At least two that I am aware of have
been loaded with 8 gigabytes... (Insert shameless Princeton Engine Plug:
One Princeton Engine has been configured with 1 Gigabyte of video rate
memory - that's 1000 real-time temporal frames...)
 
**   As an added bonus you get real live UNIX. And those pesky "ps" commands
**   run on a real supercomputer cpu, not a teensy weensy processor.

  Again, to be fair the CM2 should best be thought of as containing
NOT 65000 "teensy weensy" processors but several thousand 32 or 64bit
Weitec FPU's.  This has been true for quite a few years now.

Finally, Alan waxes whimsical:

**   | If you were plowing a field what would you
**   | rather use? 2 strong oxen or 1024 chickens?
**   | -Seymour Cray (on massively parallel machines)

Now, Seymore, if you were out in your field and 2048 Pit Bulls
suddenly came at you from all directions would you rather have 2048
hunters with shotguns or a couple of Cruise Missiles?
  -herb taylor (on massively parallel machines)

References:

"Kahaner Report: Parallel Computing in Japan (Part 3)" David Kahaner.
Available by anonymous FTP from host cs.arizona.edu.  The reports
can be found in directory "japan/kahaner.reports".

"The Princeton Engine: A Real-time Video System Simulator" Chin,
Taylor, et al. IEEE Transactions on Consumer Electronics, Vol 34, No
2, 1988.


--