hht@sarnoff.com (Herbert H. Taylor x2733) (03/07/91)
We are interested in exploring NOW what VR might be like for the general computing world in another ten years - much as Xerox Park gave us a vision in the 70's of what computing would be like in the 80's and 90's. And although the system we describe here is very expensive, we believe that in another ten years it will be a typical system. In fact at SIGGRAPH someone pointed out that the typical computer of the year 2000 will have 1G of RAM, operate at a 1GOP, have 1G I/O, etc. - our system exceeds that performance now. We have developed a Video Supercomputer (aka the Princeton Engine) which can continuously process multiple simultaneous streams of video input and output. When we originally conceived the machine in 1983 we intended it to be used for simulating in continuous real-time, proposed digital television receivers. However, it has since found a happy home in a number of research fields including algorithms for HDTV, data compression, neural nets, image pyramids, scientific data and volume visualization and hopefully, VR. For example, we would like to combine the processing power of the Princeton Engine with high frame rate, high resolution displays - to create and manage a virtual world built from "real" elements. To date all applications on the Princeton Engine exploit in some way either real-time input or output and usually both. The following ascii-gram sumarizes the architecture. 8Bit A to D's 9 Bit DAC's (48 bits input) _____________________________ (64 bits output) Video In 1 ---->| The Princeton Engine |----> Video Out 1 (R) | 2048x16BIT SIMD/DSP Procs |----> Video Out 2 (G) Video In 2 ---->| |----> Video Out 3 (B) | o Processor Architecture | Video In 3 ---->| - Seven data paths |----> Video Out 4 (R) (Optional D1/D2) | - Mpy and Alu |----> Video Out 5 (G) | - NN & Cut Through IP Comm |----> Video Out 6 (B) Video In 4 ---->| - 144 Bit Wide Inst Word | /|\ | - 64 3-Port Register File | | Video In 5 ---->| - 1GigaByte Video Rate Ram | OUTPUT | - Hardware LUT | Clocked At 28, 56 MHZ Video In 6 ---->|_____________________________|----> Video Out 7 (D1/D2) /|\ /|\ /|\ INPUT | | | Sampled at 14,28,56,81MHZ | D1/D2 Clocked at 13.5/14MHZ Instruction Clock at 14MHZ The Princeton Engine is a SIMD architecture (ala CM2 and MassPar) comprised of up to 2048 16bit DSP processors. It differs from those machines in several respects including the ability to continuously perform video rate I/O, flowing the video transparently through the array of processors. The "front-end" is comprised of six Analog to Digital converters while the "back-end" is comprised of seven D to A's. Alternatively, any of the analog inputs or outputs can be substituted with a digital D1/D2 interface. All 13 video data streams are independent of the instruction stream. With very little overhead any or all of the six video input streams can be directed to processor local memory based frame buffers. Video streams can then be "fused" or individually processed. "Video" Data Glove ------------------ By positioning camaras (including IR camaras) spatially around the virtual participant it will be possible to achieve a "whole body" to virtual world interaction which is not possible with a physical data glove. To our knowledge, this concept has never been tried in VR because of the inordinate amount of video processing required - but it can be done utilizing the Princeton Engines unique video processing power. In the Princeton Engine up to six simultaneous real-time video input streams are possible. There is very little computational overhead to process multiple video streams nor to "fuse" them with artificial world data providing the "real" impression of a hand or body within the virtual world. We would like to hear opinions on how such a whole body interface would effect the design of the physical data glove. Is the data glove still required? If so, how will it differ from present designs. By having the "interface" in a sampled video format, image processing algorithms such as filters, edge and motion detectors can be applied, enhancing the transparancy of the fusion of "real" into the virtual world. The Princeton Engine provides a degree of interaction with scientific data in the HDTV framework which is not possible via other computing resources. In fact, it should be possible to "walk through" complex data without any perception of the latancy found in present systems. This walk through world will likely include a variety of high resolution rendered objects in data views with which the scientists, mission planners and commanders can directly interact. It should be possible to virtually "grab hold" of critical data - much as one uses a marking pen to highlight text in a reference document and perhaps perform the "virtual" equivalent of cut and paste. Video Windows ------------- One could further envision within the "Virtual World" a 2D high resolution display or perhaps a window onto the "real" world, for example, camaras at strategic remote locations could direct live video back to the Princeton Engine host. This "live" video is then projected into the virtual world participants window. The "live" video window might be coupled to a lower frame rate networked video communication channel. Alternatively, one could envision a "television" within the virtual world which VR participants can "switch" to a variety of channels. This ability to integrate video into the virtual world will be valuable to a number of applications. Status ------ Two Princeton Engines have been in operation since 1988. This spring three more will be added - one of which will be placed at NIST under DARPA sponsorship - for the High Definition systems program. Although the VR program at DSRC is just getting started at a minimum level (but with several PE's to play with...) we still hope to demonstrate some of the major ideas using the present video environment this year. We have already demonstrated, for example, scenarios for multiple video I/O channels, "fusing" an IR source with a monochromatic source while driving multiple high resolution displays. p.s Last Thursday was the 100th Birthday of our Founder, David Sarnoff.
brucec%phoebus.labs.tek.com@RELAY.CS.NET (Bruce Cohen;;50-662;LP=A;) (03/09/91)
The Princeton Engine sounds fascinating. Where can I get more information on the design, and its applications? Have you published any papers? > By positioning camaras (including IR camaras) spatially around the > virtual participant it will be possible to achieve a "whole body" to > virtual world interaction which is not possible with a physical data > glove. To our knowledge, this concept has never been tried in VR > because of the inordinate amount of video processing required - but it > can be done utilizing the Princeton Engines unique video processing > power. There was some discussion of feasibility of tracking body motions with multiple cameras in this newsgroup back in October. The biggest objections were the computional requirements, which will be solved by new and faster hardware as you point out, and the amount of space required for the cameras and the volume they watch, which is a problem for small rooms. The room problem would be most acute in homes, schools, and offices where the VR system is ancillary to the major business of the place, and has to share the volume with other, possibly higher priority uses. > Is the data glove still required? If so, how will it > differ from present designs. By having the "interface" in a sampled > video format, image processi Using a glove with distinctive markings could increase the effective signal-to-noise ratio of the tracking system by making the hand and finger position and orientation easier to distinguish from other features in the room. I think the hands and fingers are the parts of the body which you want most to track well (except perhaps for the eyes, so you can extract gaze information), so giving the system some help with the tracking is probably desirable. Also, you might be able to cut down on the number of cameras and the volume of space taken up by the system. -- ------------------------------------------------------------------------ Speaker-to-managers, aka Bruce Cohen, Computer Research Lab email: brucec@tekcrl.labs.tek.com Tektronix Laboratories, Tektronix, Inc. phone: (503)627-5241 M/S 50-662, P.O. Box 500, Beaverton, OR 97077
herbt@apollo.sarnoff.com (Herbert H Taylor III) (03/15/91)
Bruce Cohen writes:
** Where can I get more information on the design, and its
** applications? Have you published any papers?
Two papers of interest have been (or will soon be) published. The
basic one appeared in IEEE Transactions on Consumer Electronics (sic)
in June 1988. "The Princeton Engine: A Real-time Video System
Simulator" and in short form in the proceedings of the International
Conference on Consumer Electronics, June 1988. Why a paper on a video
supercomputer first appears in a transaction on consumer electronics
is a political-soap-opera type story which I will spare this group...
A second paper of interest will soon appear in: "Advances in Neural
Information Processing Systems", Morgan Kaufmann, 1991. The paper is
titled: "Applications of Neural Networks in Video Signal Processing"
by Pearson, et al. This paper describes the real-time implementation
on the PE of neural networks trained to detect characteristic AM
impulses (hair dryer noise).
We have a number of other reports which we will make available
through U.S. mail. Send email request to herbt@apollo.sarnoff.com. If
the number of requests is hugh it may take some time but we will
respond to everyone.
** The room problem would be most acute in homes, schools, and offices
** where the VR system is ancillary to the major business of the place,
** and has to share the volume with other, possibly higher priority uses.
Eventually one would like to remove the cables so you could walk free
form around the VR world. But must the VR world have a one-to-one
mapping to a fixed geometry. i.e. a step in virtual space is a step in
some physical space? Is some kind of spherical human track-ball or a
2D treadmill required to remove restrictions on boundaries? Hmmm how
does "Infinite Virtual Worlds" sound.
** Using a glove with distinctive markings could increase the effective
** signal-to-noise ratio of the tracking system by making the hand and finger
** position and orientation easier to distinguish from other features in the
** room. I think the hands and fingers are the parts of the body which you
** want most to track well
This is what I had in mind - possibly even allowing the glove to be
"wireless". There is also the possibility of going gloveless using IR.
We have already experimented with the use of an IR camara to segment
"hot" regions. The face and hands are easy to distinguish from the
rest of the world because they look "hot". The IR image can be
segmented to form a mask on the monochrome image. We can also produce
a contour map of the body (using histogram) which enables texture-like
features to be discriminated.
Herb Taylor