herbt@apollo.sarnoff.com (Herbert H Taylor III) (03/28/91)
As I very much want to avoid steping into a flame "war in progress" I would like to respond specifically to some of Alan's points that are relevant to sci.virtual-worlds readers - namely - the computational requirements for sustaining complex and interesting Virtual Worlds. Having strived (along with several of my collegues) for almost ten years to build a supercomputer specifically oriented toward video processing I remain in awe of those giants who blazed the trail. My personal SPEED Hall of Fame goes something like this: Nolan Ryan, Seymore Cray, Danny Hillis... After clarifying Crays cooling strategy Alan declares: ** Oh how about 8 CPUs capable of 2.2 Gigaflops? (That's 2.2G for all 8, not ** for each processor; and this is a demonstrated number for a real science ** application, not a matrix multiply loop) No one argues that a Cray is NOT a very, very fast machine - but that does not mean that it is an effective VW machine. For one thing VW would not appear to be as much of a GIGA-FLOP problem as a GIGA-OP problem. If one decomposes a virtual world into simple computational elements such as "input" processors and world processors, as well as, network and "output" display processors, then one can gain a sense of the performance requirements through the entire pipeline. With the exception of polygonally rendered virtual worlds there do not appear to be that many places where floating point operation is critical. And while polygon rendering does entail floating point such operations are better accomodated with special dedicated processors at the "point" of need (as in the SGI). Other approaches forego polygon rendering entirely and hence possess little or no floating point. Certainly, input and display processing is primarily fixed point digital signal processing. A second question concerns whether the Cray's essentially "single processor" model is right for VW. First, consider a VW display at 1Kx1Kx30fps (they'll be here sooner then we think...). The total number of data points which must be processed is 3x10**7 (x2 for stereo headset). Assume in the simplest case that these data points correspond to pixels. How many operations per pixel are required? A simple television-like decoder would require 100's of operations per pixel. If the data stream included MPEG-like compression then there could be >> 1000 operations per pixel. (In fact an 8x8 DCT alone takes about 800 16bit operations per pixel.) Dr. Fukinuki of Hitachi has placed the number for HDTV resolution at 10**4 operations per pixel [Kahaner]. Remember, if you want use a high resolution (HDTV) head mounted display you will have to do a lot more processing then is done in present displays. We believe that for virtual worlds to approach PROCESSED HDTV resolutions between 1,000 and 10,000 operations per pixel will be required. Using the lower figure we have 3x10**10 operations per pixel per second. For a single processor model that translates to 3.3x10-11 seconds per operation, or 33picoseconds. Perhaps, Alan could advise us as to when the "Cray 33" will be extant? Even, assuming one could effectively utilize all processors of a 64 processor Cray with a 2NS instruction clock and matched interprocessor communication there would barely be enough time to complete the required processing. And this is a minimum system. Massively Parallel architectures do in fact appear to be a more natural and cost effective fit for high end VW processing. 2048 moderately powerful processors running at 20 Mhz is 41 GigaOPS - or one operation effectively every 24 picoseconds. As long as each of the world components can be cast in an appropriate data parallel model and can sustain continuous I/O then complex, real-time virtual worlds can be managed. Alan also points out: ** You want to try virtual reality with REALLY complicated worlds and tons ** of I/O? How about demonstrated TCP/IP performance, Cray to Cray, of over ** 300 Megabits per second! What? 300 Megabits with TCP/IP? No way!! ** Yep. We do it every day. Now there is all that I/O to consider - either to the display subsystem or to the network. The data pipe Alan describes on the Cray is awesum, perhaps, when applied to scientific data visualization or file transfer protocols, but how does it impact Virtual Worlds mapped across multiple Crays? In a VR system each component must perform its functions continuously with LITTLE OR NO LATENCY. A lost packet becomes a visible display defect in the continuous real-time virtual world unless there is sufficient overhead to permit retransmissions. If multiple virtual worlds share a common interaction space then all "round trips" must also meet this criteria. This applies to VW systems regardless of whether they employ a head mounted display or data glove technology. The VW display and input processors must be able to provide some form of continuous visual feedback to virtual world participants - so that individual responses to multiple processed inputs can be in continuous real-time. None of the previous discussion of world processing considered the cost of transfering processed world data to a display or I/O facility. In systems with video rams this might be transparent. For example, in the Princeton Engine [Chin, et al] up to seven entire scanlines are simultaneously copied out once every horizontal line time, both continuously and transparantly. (NTSC=63usecs) However, a single processor would seem to suffer a significant memory I/O bottleneck unless a signifcant "back door" is in place. Presumedly systems such as Alan describes on Cray provide an equivalent facility. One other sidenote on I/O. We believe that for Virtual Worlds frame rates will ultimately need to be higher then 30 fps. For example, to project images from a remotely sensed combustion experiment will require hundreds of frames per second of aquisition - but in a burst of only a few seconds duration. In order to walk through the data set we must have considerable flexibility in the play back frame rate. Ultimatly one would like to trade off spatial resolution with frame rate - providing the video equivalent of multiple exposures spread across a high resolution image which can be "played" in a virtual window. Alan then describes extraordinary disk I/O rates: ** And you get up to 2 100 MegaBYTE per second channels, many, many, many ** 12.5 MegaBYTE per second disk drive channels (which can run simultaneously). Although high speed disk I/O may be crucial to virtual world data archives, it is not clear how it impacts the management of the world itself. World data originates from real-time sources such as camaras or data gloves, must be processed continuously in real-time and is displayed on real-time display systems. Next, Alan brags a little about MAIN memory: ** Hey, how about 4294967000 Bytes of MAIN memory? (Yep that's 4.2 _billion._) In all fairness to the CM2 At least two that I am aware of have been loaded with 8 gigabytes... (Insert shameless Princeton Engine Plug: One Princeton Engine has been configured with 1 Gigabyte of video rate memory - that's 1000 real-time temporal frames...) ** As an added bonus you get real live UNIX. And those pesky "ps" commands ** run on a real supercomputer cpu, not a teensy weensy processor. Again, to be fair the CM2 should best be thought of as containing NOT 65000 "teensy weensy" processors but several thousand 32 or 64bit Weitec FPU's. This has been true for quite a few years now. Finally, Alan waxes whimsical: ** | If you were plowing a field what would you ** | rather use? 2 strong oxen or 1024 chickens? ** | -Seymour Cray (on massively parallel machines) Now, Seymore, if you were out in your field and 2048 Pit Bulls suddenly came at you from all directions would you rather have 2048 hunters with shotguns or a couple of Cruise Missiles? -herb taylor (on massively parallel machines) References: "Kahaner Report: Parallel Computing in Japan (Part 3)" David Kahaner. Available by anonymous FTP from host cs.arizona.edu. The reports can be found in directory "japan/kahaner.reports". "The Princeton Engine: A Real-time Video System Simulator" Chin, Taylor, et al. IEEE Transactions on Consumer Electronics, Vol 34, No 2, 1988. --