[sci.virtual-worlds] VR/Video

lilj@uunet.UU.NET (Joshua Neil Rubin) (04/13/91)

Chris Shaw writes:

>While video may be a useful adjunct to a virtual reality system,
>it lacks the fundamental property of arbitrary real-time view
>position and orientation control.

           *           *          *

> . . . the simulation component doesn't exist because the view cannot 
>be arbitrarily controlled . . .

           *           *          * 

>Chris Shaw     University of Alberta
>cdshaw@cs.UAlberta.ca           Now with new, minty Internet flavour!
>CatchPhrase: Bogus as HELL !


The NASA Mars fly-by film and several other projects demonstrate that you 
can derive an accurate voxel set from a single stereo pair.  Given
sufficient computer speed and power, we should ultimately be able to do 
this (i.e., be able to extrapolate alternative viewpoints from a single 
video stereopair) in realtime.  Of course, the hidden surfaces must be 
interpolated.  But hey.


lilj@well

mark@cs.UAlberta.CA (Mark Green) (04/14/91)

In article <1991Apr13.180518.1243@milton.u.washington.edu>, decwrl!well.sf.ca.us
!well!lilj@uunet.UU.NET (Joshua Neil Rubin) writes:
> 
> Chris Shaw writes:
> 
> >While video may be a useful adjunct to a virtual reality system,
> >it lacks the fundamental property of arbitrary real-time view
> >position and orientation control.
> 
>            *           *          *
> 
> > . . . the simulation component doesn't exist because the view cannot 
> >be arbitrarily controlled . . .
> 
>            *           *          * 
> 
> >Chris Shaw     University of Alberta
> >cdshaw@cs.UAlberta.ca           Now with new, minty Internet flavour!
> >CatchPhrase: Bogus as HELL !
> 
> 
> The NASA Mars fly-by film and several other projects demonstrate that you 
> can derive an accurate voxel set from a single stereo pair.  Given
> sufficient computer speed and power, we should ultimately be able to do 
> this (i.e., be able to extrapolate alternative viewpoints from a single 
> video stereopair) in realtime.  Of course, the hidden surfaces must be 
> interpolated.  But hey.
> 
> 
> lilj@well
> 

What??? The NASA Mars fly-by film and similar works are computer
generated graphics produced from a collection of different data sources (ie.
there are no stereo pairs involved).  The basic procedure is to
determine a height field for the planet.  This can be done by different
forms of range finding or multiple views of the planet.  Lew Hitchner
of NASA Ames has mentioned this type of data for Mars in this news
group and we have used the same data for an interactive fly-by of Mars.
Once the height field is available a polgonal mesh can be constructed
that represents the planet's surface.  Now you have a basic geometrical
model of the planet.  To get the surface detail, pictures of the planet
(at the appropriate locations) are texture mapped onto the polygons
when they are displayed.  Unless you are using a high-end Silicon
Graphics machine this is not a real-time operation.  Note that this
is a purely geometrical model, there is no video, stereo-pairs,
voxels or anything like it involved.  From a single stereo pair there
is no way of deducing the geometry of an arbitrary object so it can
be displayed from any view point.  This only requires a little thought,
after all how can you see the back of the object from only one view??

- Mark Green
  University of Alberta  mark@cs.ualberta.ca

cdshaw@cs.UAlberta.CA (Chris Shaw) (04/14/91)

In article <1991Apr13.180518.1243@milton.u.washington.edu> lilj write:
>Chris Shaw writes:
>
>>While video may be a useful adjunct to a virtual reality system,
>>it lacks the fundamental property of arbitrary real-time view
>>position and orientation control.
>> . . . the simulation component doesn't exist because the view cannot 
>>be arbitrarily controlled . . .
>
>The NASA Mars fly-by film and several other projects demonstrate that you 
>can derive an accurate voxel set from a single stereo pair. 

Yes, and CAT scans indicate that you can do the same type of thing for
medicine. The task of generating a 3D model from sensor-based input is
called tomography. However, tomography isn't easy. I asked someone who has
studied tomography, and got an estimate of at least 10 billion pictures
needed to digitize a 20 foot by 20 foot room, given 5 degree intervals
on view direction, and one inch resolution on position, but this was
without making a model.

In any case, the stereo pair statement is a bit dubious. I seem to
remember reading a report on Mars sensing that said that the stereo
pair data was inaccurate, and that other sensor methods were being used
to sense altitude. The second thing to bear in mind is that there are
almost no hidden surfaces in the planetary sensing application, since
the surface is being sensed from above.

Thirdly, the process of turning video input into a movie is one in
which a polygonal mesh is created from the sensor data, and pictures 
of the surface are texture-mapped onto the mesh. "Mars: The Movie"
was made this way. Certainly, video is useful in this application,
but only at the front end. In any case, the bottom line is that
it ain't real time yet.

By the way, a slight clarification of the Aspen Movie Maps project:
There were only 4 views at each point: North, South, East & West (the
town grid, in any case).

>Given sufficient computer speed and power, we should ultimately be able to do 
>this (i.e., be able to extrapolate alternative viewpoints from a single 
>video stereopair) in realtime. 

I don't think so. The Mars movie was created from a subset of a database
of the ENTIRE PLANET. It's not like a space probe flew by one
afternoon and took a few pictures. That move was culled from TERABytes
of data. The flight simulator people have to do the same thing, grab
real data off a disk in time for the pilot to fly over it. But it's
not just raw video images they are grabbing, it's "video" that has
been processed in advance to extract altitude.

>lilj@well
-- 
Chris Shaw     University of Alberta
cdshaw@cs.UAlberta.ca           Now with new, minty Internet flavour!
CatchPhrase: Bogus as HELL !

lilj@uunet.UU.NET (Joshua Neil Rubin) (04/20/91)

Let's put aside for a minute the problem of hidden surfaces.  I 
readily concede a single stereopair has insufficient information to 
allow you to synthesize alternate perspectives of such surfaces.

Taking solely the information from a single stereopair of an object 
with no hidden surfaces, you can synthesize *any* new view of the 
object from *any* perspective you might wish.  Solely with technology 
that is 100 years old.

Take the surface of Mars as an example:

Assume that 100 years ago you had a stereopair looking straight down 
onto a bumpy, craggy, mountainous part of Mars.  The only really 
unusual thing about this stretch of terrain is that every bit of 
surface was in direct line of sight with each of the two cameras 
taking the stereopair.  (This eliminates the hidden surface problem)

Using only these two photos, by using some basic principles of 
stereoscopic arithmetic which have been known since at least the time 
of Wheatstone in the 1800's (before even the invention of photography, 
actually), an accurate ruler, a calculator, and some clay, you could 
easily (albeit tediously) create a perfectly accurate three-
dimensional model of that terrain.  And you could look at it from any 
angle you chose.

As I see it, the problem in quickly synthesizing a new computer-
generated virtual perspective of a scene from a single stereopair 
isn't that you need skillabytes of data.  The problem is that you need 
sophisticated object recognition programs to recognize what 
stereographers call the "homologous points" in the two images which 
make up the stereopair.  These are, as the name implies, the two 
points, one per image in a stereopair, which represent the same 
location in actual space.  (You derive depth information from a 
stereopair by comparing the distance between two points on one image 
of the stereopair with the distance between the two homologous points 
in the other image.)

Humans can pick out homologous points easily enough.  In fact we do it 
automatically whenever we use depth perception.  Computers currently 
have a harder time than we humans do parsing scenes into objects and 
recognizing analogies between imperfectly-matched patterns  Once the 
homologous points have been identified, however, it is a simple matter 
to do the arithmetic required to reconstruct the relative depths of 
the various points in the scene.

I'll grant you that we're talking about immense amounts of computing 
speed and power and memory to do all of that object recognition so 
fast.

gourdol@imag.imag.fr (Gourdol Arnaud) (04/22/91)

In article <1991Apr20.165305.6080@milton.u.washington.edu> decwrl!well.sf.ca.us!
well!lilj@uunet.UU.NET (Joshua Neil Rubin) writes:
>As I see it, the problem in quickly synthesizing a new computer-
>generated virtual perspective of a scene from a single stereopair 
>isn't that you need skillabytes of data.  The problem is that you need 
>sophisticated object recognition programs to recognize what 
>stereographers call the "homologous points" in the two images which 
>make up the stereopair.  These are, as the name implies, the two 
>points, one per image in a stereopair, which represent the same 
>location in actual space.  (You derive depth information from a 
>stereopair by comparing the distance between two points on one image 
>of the stereopair with the distance between the two homologous points 
>in the other image.)
>
>Humans can pick out homologous points easily enough.  In fact we do it 
>automatically whenever we use depth perception.  Computers currently 
>have a harder time than we humans do parsing scenes into objects and 
>recognizing analogies between imperfectly-matched patterns  Once the 
>homologous points have been identified, however, it is a simple matter 
>to do the arithmetic required to reconstruct the relative depths of 
>the various points in the scene.
>
>I'll grant you that we're talking about immense amounts of computing 
>speed and power and memory to do all of that object recognition so 
>fast.

Well, it can be done.
A friend of mine is working on an interesting project. The idea
is to use two cameras to film the hands of a user. The two images
are then analyzed, a 3d model of the hands are built, and the
information about the position and posture of the hands are
transmitted as a set of angles and coordinates.
That's a nice replacement for datagloves! No physical devices
needed, no cable, you can pick the phone while you're working,
and so on.
Well, anyway, this is supposed to work in real time with a Sun-4
and a specialized chip.

Of course, this is a special case. The contrast must be good,
there are few objects moving, and they are well known.
A more ambitious project deals with an autonomous robot.

If you want more information, you can contact Patrice de Marconnay
(he works on the hand recognition project) :
marconna@lifia.imag.fr

For more information on 3D reconstruction, contact James Crowley:
crowley@lifia.imag.fr

Arno.

-- 
   /======================//==========================================/
  / Arnaud Gourdol.      // On the Netland:         Gourdol@imag.fr  /
 /                      // Via AppleLink: Gourdol@imag.fr@INTERNET# /
/======================//==========================================/

kost@iias.spb.su (Popov Konstantin E.) (05/06/91)

Hi all!

I would clarify problem about that Joshua Neil Rubin spoke:

        [something erased]
>        Taking solely the information from a single stereopair of an object
>        with no hidden surfaces, you can synthesize *any* new view of the
>        object from *any* perspective you might wish.  Solely with technology
>        that is 100 years old.
         .... and
>        The only really
>        unusual thing about this stretch of terrain is that every bit of
>        surface was in direct line of sight with each of the two cameras
>        taking the stereopair.
         .... and now
>        As I see it, the problem in quickly synthesizing a new computer-
>        generated virtual perspective of a scene from a single stereopair
>        isn't that you need skillabytes of data.
        [something erased]

... and after that, i think that it's not pure problem for computer graphics
disciplines. I suspect that where are necessary not only *excelent*
graphic perfomance, but also AI tools. When you are getting such single
stereopar, we must decide, at minimum -

(a)     How much can we change point of view?
(b)     What distortions of made perspectives are suitable?
        and on the whole, how we can measure this "suitableness"?
        (i suspect also that it depend upon ****many factors.)
(c)     How can we will get another piece of data for composing
        perspectives if you want make it aside our pre-defined ranges?

If we want that system work reliably, we must provide there some
subsystems that will be solve such problems.

Can somebody say something about it? I hope.

(I must give hard apologys for my English to anybody who has read this.)
                                U
       -- kostja                | ... I sing i don't know about.
  e-mail: kost@gsp.iias.spb.su  |      (Boris Grebenshikov)
 ------------------------------ | ------------------------------


[MODERATOR'S NOTE:  A welcome to our first Russian participant!  Good
to have you among us (from a grandchild of Russian emigres). -- Bob
Jacobson]

lilj@uunet.UU.NET (Joshua Neil Rubin) (05/13/91)

kost@iias.spb.su (Konstantin E. Popov) writes:

>graphic perfomance, but also AI tools. When you are getting such single
>stereopar, we must decide, at minimum -
>
>(a)     How much can we change point of view?
>(b)     What distortions of made perspectives are suitable?
>        and on the whole, how we can measure this "suitableness"?
>        (i suspect also that it depend upon ****many factors.)
>(c)     How can we will get another piece of data for composing
>        perspectives if you want make it aside our pre-defined ranges?

I believe that the answers are as follows:

(a)      We can theoretically change the point of view to any perspective 
with no distortion as long as no surface was obstructed from either of the 
cameras which photographed the original stereopair (that is, there were no 
hidden surfaces).  If there were hidden surfaces, we cannot change the 
point of view at all without introducing distortions.  Surfaces that are 
hidden from the original perspective create gaps in our information about 
the scene.  For instance, if we were to take a single stereopair of 
somebody's head from directly in front, we only have information about a 
mask-shaped surface, that is, the portion of the face exposed to both 
cameras.  We would lack information not only about the back of the head, 
but indeed about an entire wedge of space projecting backward behind the 
head.  If we wished to synthesize a virtual perspective from the right-hand 
side of the head, we would have enough information to construct a view of 
the mask-shaped surface of the face.  However, projecting backward from 
this mask (or to the left if we are viewing the head from the right-hand 
side), would be a wedge in space about which we lack information.  In our 
synthesized view, we could represent this space as semi-translucent or as a 
wire-frame, denoting that we do not know whether it or any particular part 
of it contains or does not contain an object.

(b)      Suitability would probably depend on the nature of the application 
and the nature of the data.  You can't *gain* information about an actual 
scene by synthesizing a new perspective of the scene from the information 
in a single stereopair.  However, the synthesized perspective might help 
you to better *understand* the information that you have.

(c)      Once you have already parsed the scene into objects in order to be 
able to synthesize a new perspective, it would probably be *relatively* 
easy to interpolate certain surfaces.  For instance, if our object-
recognition program thought it detected a sphere, it would be a simple 
matter to add the hidden surface of that sphere.  Or, of course, you could 
get information from another photograph from another perspective.