[sci.virtual-worlds] Report from David Sarnoff Research Center.

hht@sarnoff.com (Herbert H. Taylor x2733) (03/07/91)

  We are interested in exploring NOW what VR might be like for the
general computing world in another ten years - much as Xerox Park gave
us a vision in the 70's of what computing would be like in the 80's
and 90's. And although the system we describe here is very expensive,
we believe that in another ten years it will be a typical system. In
fact at SIGGRAPH someone pointed out that the typical computer of the
year 2000 will have 1G of RAM, operate at a 1GOP, have 1G I/O, etc. -
our system exceeds that performance now.

  We have developed a Video Supercomputer (aka the Princeton Engine)
which can continuously process multiple simultaneous streams of video
input and output. When we originally conceived the machine in 1983 we
intended it to be used for simulating in continuous real-time,
proposed digital television receivers.  However, it has since found a
happy home in a number of research fields including algorithms for
HDTV, data compression, neural nets, image pyramids, scientific data
and volume visualization and hopefully, VR. For example, we would like
to combine the processing power of the Princeton Engine with high
frame rate, high resolution displays - to create and manage a virtual
world built from "real" elements. To date all applications on the
Princeton Engine exploit in some way either real-time input or output
and usually both.  The following ascii-gram sumarizes the
architecture.
  8Bit A to D's                                        9 Bit DAC's
  (48 bits input)   _____________________________      (64 bits output)
  Video In 1 ---->|    The Princeton Engine     |----> Video Out 1 (R)
                  |  2048x16BIT SIMD/DSP Procs  |----> Video Out 2 (G)
  Video In 2 ---->|                             |----> Video Out 3 (B)   
                  |  o Processor Architecture   |
  Video In 3 ---->| - Seven data paths          |----> Video Out 4 (R)
 (Optional D1/D2) | - Mpy and Alu               |----> Video Out 5 (G)
                  | - NN & Cut Through IP Comm  |----> Video Out 6 (B)
  Video In 4 ---->| - 144 Bit Wide Inst Word    | /|\
                  | - 64 3-Port Register File   |  |
  Video In 5 ---->| - 1GigaByte Video Rate Ram  | OUTPUT
                  | - Hardware LUT              | Clocked At 28, 56 MHZ
  Video In 6 ---->|_____________________________|----> Video Out 7 (D1/D2)
             /|\                /|\               /|\
   INPUT      |                  |                 |
   Sampled at 14,28,56,81MHZ     |        D1/D2 Clocked at 13.5/14MHZ
                      Instruction Clock at 14MHZ

  The Princeton Engine is a SIMD architecture (ala CM2 and MassPar)
comprised of up to 2048 16bit DSP processors. It differs from those
machines in several respects including the ability to continuously
perform video rate I/O, flowing the video transparently through the
array of processors. The "front-end" is comprised of six Analog to
Digital converters while the "back-end" is comprised of seven D to
A's.  Alternatively, any of the analog inputs or outputs can be
substituted with a digital D1/D2 interface. All 13 video data streams
are independent of the instruction stream. With very little overhead
any or all of the six video input streams can be directed to processor
local memory based frame buffers. Video streams can then be "fused" or
individually processed.
   
 "Video" Data Glove
 ------------------
 By positioning camaras (including IR camaras) spatially around the
virtual participant it will be possible to achieve a "whole body" to
virtual world interaction which is not possible with a physical data
glove. To our knowledge, this concept has never been tried in VR
because of the inordinate amount of video processing required - but it
can be done utilizing the Princeton Engines unique video processing
power. In the Princeton Engine up to six simultaneous real-time video
input streams are possible. There is very little computational
overhead to process multiple video streams nor to "fuse" them with
artificial world data providing the "real" impression of a hand or
body within the virtual world. We would like to hear opinions on how
such a whole body interface would effect the design of the physical
data glove. Is the data glove still required? If so, how will it
differ from present designs. By having the "interface" in a sampled
video format, image processing algorithms such as filters, edge and
motion detectors can be applied, enhancing the transparancy of the
fusion of "real" into the virtual world.

 The Princeton Engine provides a degree of interaction with scientific
data in the HDTV framework which is not possible via other computing
resources. In fact, it should be possible to "walk through" complex
data without any perception of the latancy found in present systems.
This walk through world will likely include a variety of high
resolution rendered objects in data views with which the scientists,
mission planners and commanders can directly interact. It should be
possible to virtually "grab hold" of critical data - much as one uses
a marking pen to highlight text in a reference document and perhaps
perform the "virtual" equivalent of cut and paste.

 Video Windows
 -------------
  One could further envision within the "Virtual World" a 2D high
resolution display or perhaps a window onto the "real" world, for
example, camaras at strategic remote locations could direct live video
back to the Princeton Engine host. This "live" video is then projected
into the virtual world participants window. The "live" video window
might be coupled to a lower frame rate networked video communication
channel. Alternatively, one could envision a "television" within the
virtual world which VR participants can "switch" to a variety of
channels. This ability to integrate video into the virtual world will
be valuable to a number of applications. 

 Status
 ------
 Two Princeton Engines have been in operation since 1988. This spring
three more will be added - one of which will be placed at NIST under
DARPA sponsorship - for the High Definition systems program. Although
the VR program at DSRC is just getting started at a minimum level (but
with several PE's to play with...) we still hope to demonstrate some
of the major ideas using the present video environment this year. We
have already demonstrated, for example, scenarios for multiple video
I/O channels, "fusing" an IR source with a monochromatic source while
driving multiple high resolution displays.

p.s Last Thursday was the 100th Birthday of our Founder, David Sarnoff.

brucec%phoebus.labs.tek.com@RELAY.CS.NET (Bruce Cohen;;50-662;LP=A;) (03/09/91)

The Princeton Engine sounds fascinating.  Where can I get more information
on the design, and its applications?  Have you published any papers?

>  By positioning camaras (including IR camaras) spatially around the
> virtual participant it will be possible to achieve a "whole body" to
> virtual world interaction which is not possible with a physical data
> glove. To our knowledge, this concept has never been tried in VR
> because of the inordinate amount of video processing required - but it
> can be done utilizing the Princeton Engines unique video processing
> power.

There was some discussion of feasibility of tracking body motions with
multiple cameras in this newsgroup back in October.  The biggest objections
were the computional requirements, which will be solved by new and faster
hardware as you point out, and the amount of space required for the cameras
and the volume they watch, which is a problem for small rooms.  The room
problem would be most acute in homes, schools, and offices where the VR
system is ancillary to the major business of the place, and has to share
the volume with other, possibly higher priority uses.

> Is the data glove still required? If so, how will it
> differ from present designs. By having the "interface" in a sampled
> video format, image processi

Using a glove with distinctive markings could increase the effective
signal-to-noise ratio of the tracking system by making the hand and finger
position and orientation easier to distinguish from other features in the
room.  I think the hands and fingers are the parts of the body which you
want most to track well (except perhaps for the eyes, so you can extract
gaze information), so giving the system some help with the tracking is
probably desirable.  Also, you might be able to cut down on the number of
cameras and the volume of space taken up by the system.
--
------------------------------------------------------------------------
Speaker-to-managers, aka
Bruce Cohen, Computer Research Lab        email: brucec@tekcrl.labs.tek.com
Tektronix Laboratories, Tektronix, Inc.                phone: (503)627-5241
M/S 50-662, P.O. Box 500, Beaverton, OR  97077

herbt@apollo.sarnoff.com (Herbert H Taylor III) (03/15/91)

   
Bruce Cohen writes:

** Where can I get more information on the design, and its
** applications?  Have you published any papers?

   Two papers of interest have been (or will soon be) published. The
basic one appeared in IEEE Transactions on Consumer Electronics (sic)
in June 1988. "The Princeton Engine: A Real-time Video System
Simulator" and in short form in the proceedings of the International
Conference on Consumer Electronics, June 1988. Why a paper on a video
supercomputer first appears in a transaction on consumer electronics
is a political-soap-opera type story which I will spare this group...
A second paper of interest will soon appear in: "Advances in Neural
Information Processing Systems", Morgan Kaufmann, 1991. The paper is
titled: "Applications of Neural Networks in Video Signal Processing"
by Pearson, et al. This paper describes the real-time implementation
on the PE of neural networks trained to detect characteristic AM
impulses (hair dryer noise).

   We have a number of other reports which we will make available
through U.S. mail. Send email request to herbt@apollo.sarnoff.com. If
the number of requests is hugh it may take some time but we will
respond to everyone.

**   The room problem would be most acute in homes, schools, and offices
** where the VR system is ancillary to the major business of the place,
** and has to share the volume with other, possibly higher priority uses.

 Eventually one would like to remove the cables so you could walk free
form around the VR world. But must the VR world have a one-to-one
mapping to a fixed geometry. i.e. a step in virtual space is a step in
some physical space? Is some kind of spherical human track-ball or a
2D treadmill required to remove restrictions on boundaries?  Hmmm how
does "Infinite Virtual Worlds" sound.

**   Using a glove with distinctive markings could increase the effective
** signal-to-noise ratio of the tracking system by making the hand and finger
** position and orientation easier to distinguish from other features in the
** room.  I think the hands and fingers are the parts of the body which you
** want most to track well 

 This is what I had in mind - possibly even allowing the glove to be
"wireless". There is also the possibility of going gloveless using IR.
We have already experimented with the use of an IR camara to segment
"hot" regions. The face and hands are easy to distinguish from the
rest of the world because they look "hot". The IR image can be
segmented to form a mask on the monochrome image. We can also produce
a contour map of the body (using histogram) which enables texture-like
features to be discriminated.

  Herb Taylor