lilj@uunet.UU.NET (Joshua Neil Rubin) (04/13/91)
Chris Shaw writes: >While video may be a useful adjunct to a virtual reality system, >it lacks the fundamental property of arbitrary real-time view >position and orientation control. * * * > . . . the simulation component doesn't exist because the view cannot >be arbitrarily controlled . . . * * * >Chris Shaw University of Alberta >cdshaw@cs.UAlberta.ca Now with new, minty Internet flavour! >CatchPhrase: Bogus as HELL ! The NASA Mars fly-by film and several other projects demonstrate that you can derive an accurate voxel set from a single stereo pair. Given sufficient computer speed and power, we should ultimately be able to do this (i.e., be able to extrapolate alternative viewpoints from a single video stereopair) in realtime. Of course, the hidden surfaces must be interpolated. But hey. lilj@well
mark@cs.UAlberta.CA (Mark Green) (04/14/91)
In article <1991Apr13.180518.1243@milton.u.washington.edu>, decwrl!well.sf.ca.us !well!lilj@uunet.UU.NET (Joshua Neil Rubin) writes: > > Chris Shaw writes: > > >While video may be a useful adjunct to a virtual reality system, > >it lacks the fundamental property of arbitrary real-time view > >position and orientation control. > > * * * > > > . . . the simulation component doesn't exist because the view cannot > >be arbitrarily controlled . . . > > * * * > > >Chris Shaw University of Alberta > >cdshaw@cs.UAlberta.ca Now with new, minty Internet flavour! > >CatchPhrase: Bogus as HELL ! > > > The NASA Mars fly-by film and several other projects demonstrate that you > can derive an accurate voxel set from a single stereo pair. Given > sufficient computer speed and power, we should ultimately be able to do > this (i.e., be able to extrapolate alternative viewpoints from a single > video stereopair) in realtime. Of course, the hidden surfaces must be > interpolated. But hey. > > > lilj@well > What??? The NASA Mars fly-by film and similar works are computer generated graphics produced from a collection of different data sources (ie. there are no stereo pairs involved). The basic procedure is to determine a height field for the planet. This can be done by different forms of range finding or multiple views of the planet. Lew Hitchner of NASA Ames has mentioned this type of data for Mars in this news group and we have used the same data for an interactive fly-by of Mars. Once the height field is available a polgonal mesh can be constructed that represents the planet's surface. Now you have a basic geometrical model of the planet. To get the surface detail, pictures of the planet (at the appropriate locations) are texture mapped onto the polygons when they are displayed. Unless you are using a high-end Silicon Graphics machine this is not a real-time operation. Note that this is a purely geometrical model, there is no video, stereo-pairs, voxels or anything like it involved. From a single stereo pair there is no way of deducing the geometry of an arbitrary object so it can be displayed from any view point. This only requires a little thought, after all how can you see the back of the object from only one view?? - Mark Green University of Alberta mark@cs.ualberta.ca
cdshaw@cs.UAlberta.CA (Chris Shaw) (04/14/91)
In article <1991Apr13.180518.1243@milton.u.washington.edu> lilj write: >Chris Shaw writes: > >>While video may be a useful adjunct to a virtual reality system, >>it lacks the fundamental property of arbitrary real-time view >>position and orientation control. >> . . . the simulation component doesn't exist because the view cannot >>be arbitrarily controlled . . . > >The NASA Mars fly-by film and several other projects demonstrate that you >can derive an accurate voxel set from a single stereo pair. Yes, and CAT scans indicate that you can do the same type of thing for medicine. The task of generating a 3D model from sensor-based input is called tomography. However, tomography isn't easy. I asked someone who has studied tomography, and got an estimate of at least 10 billion pictures needed to digitize a 20 foot by 20 foot room, given 5 degree intervals on view direction, and one inch resolution on position, but this was without making a model. In any case, the stereo pair statement is a bit dubious. I seem to remember reading a report on Mars sensing that said that the stereo pair data was inaccurate, and that other sensor methods were being used to sense altitude. The second thing to bear in mind is that there are almost no hidden surfaces in the planetary sensing application, since the surface is being sensed from above. Thirdly, the process of turning video input into a movie is one in which a polygonal mesh is created from the sensor data, and pictures of the surface are texture-mapped onto the mesh. "Mars: The Movie" was made this way. Certainly, video is useful in this application, but only at the front end. In any case, the bottom line is that it ain't real time yet. By the way, a slight clarification of the Aspen Movie Maps project: There were only 4 views at each point: North, South, East & West (the town grid, in any case). >Given sufficient computer speed and power, we should ultimately be able to do >this (i.e., be able to extrapolate alternative viewpoints from a single >video stereopair) in realtime. I don't think so. The Mars movie was created from a subset of a database of the ENTIRE PLANET. It's not like a space probe flew by one afternoon and took a few pictures. That move was culled from TERABytes of data. The flight simulator people have to do the same thing, grab real data off a disk in time for the pilot to fly over it. But it's not just raw video images they are grabbing, it's "video" that has been processed in advance to extract altitude. >lilj@well -- Chris Shaw University of Alberta cdshaw@cs.UAlberta.ca Now with new, minty Internet flavour! CatchPhrase: Bogus as HELL !
lilj@uunet.UU.NET (Joshua Neil Rubin) (04/20/91)
Let's put aside for a minute the problem of hidden surfaces. I readily concede a single stereopair has insufficient information to allow you to synthesize alternate perspectives of such surfaces. Taking solely the information from a single stereopair of an object with no hidden surfaces, you can synthesize *any* new view of the object from *any* perspective you might wish. Solely with technology that is 100 years old. Take the surface of Mars as an example: Assume that 100 years ago you had a stereopair looking straight down onto a bumpy, craggy, mountainous part of Mars. The only really unusual thing about this stretch of terrain is that every bit of surface was in direct line of sight with each of the two cameras taking the stereopair. (This eliminates the hidden surface problem) Using only these two photos, by using some basic principles of stereoscopic arithmetic which have been known since at least the time of Wheatstone in the 1800's (before even the invention of photography, actually), an accurate ruler, a calculator, and some clay, you could easily (albeit tediously) create a perfectly accurate three- dimensional model of that terrain. And you could look at it from any angle you chose. As I see it, the problem in quickly synthesizing a new computer- generated virtual perspective of a scene from a single stereopair isn't that you need skillabytes of data. The problem is that you need sophisticated object recognition programs to recognize what stereographers call the "homologous points" in the two images which make up the stereopair. These are, as the name implies, the two points, one per image in a stereopair, which represent the same location in actual space. (You derive depth information from a stereopair by comparing the distance between two points on one image of the stereopair with the distance between the two homologous points in the other image.) Humans can pick out homologous points easily enough. In fact we do it automatically whenever we use depth perception. Computers currently have a harder time than we humans do parsing scenes into objects and recognizing analogies between imperfectly-matched patterns Once the homologous points have been identified, however, it is a simple matter to do the arithmetic required to reconstruct the relative depths of the various points in the scene. I'll grant you that we're talking about immense amounts of computing speed and power and memory to do all of that object recognition so fast.
gourdol@imag.imag.fr (Gourdol Arnaud) (04/22/91)
In article <1991Apr20.165305.6080@milton.u.washington.edu> decwrl!well.sf.ca.us! well!lilj@uunet.UU.NET (Joshua Neil Rubin) writes: >As I see it, the problem in quickly synthesizing a new computer- >generated virtual perspective of a scene from a single stereopair >isn't that you need skillabytes of data. The problem is that you need >sophisticated object recognition programs to recognize what >stereographers call the "homologous points" in the two images which >make up the stereopair. These are, as the name implies, the two >points, one per image in a stereopair, which represent the same >location in actual space. (You derive depth information from a >stereopair by comparing the distance between two points on one image >of the stereopair with the distance between the two homologous points >in the other image.) > >Humans can pick out homologous points easily enough. In fact we do it >automatically whenever we use depth perception. Computers currently >have a harder time than we humans do parsing scenes into objects and >recognizing analogies between imperfectly-matched patterns Once the >homologous points have been identified, however, it is a simple matter >to do the arithmetic required to reconstruct the relative depths of >the various points in the scene. > >I'll grant you that we're talking about immense amounts of computing >speed and power and memory to do all of that object recognition so >fast. Well, it can be done. A friend of mine is working on an interesting project. The idea is to use two cameras to film the hands of a user. The two images are then analyzed, a 3d model of the hands are built, and the information about the position and posture of the hands are transmitted as a set of angles and coordinates. That's a nice replacement for datagloves! No physical devices needed, no cable, you can pick the phone while you're working, and so on. Well, anyway, this is supposed to work in real time with a Sun-4 and a specialized chip. Of course, this is a special case. The contrast must be good, there are few objects moving, and they are well known. A more ambitious project deals with an autonomous robot. If you want more information, you can contact Patrice de Marconnay (he works on the hand recognition project) : marconna@lifia.imag.fr For more information on 3D reconstruction, contact James Crowley: crowley@lifia.imag.fr Arno. -- /======================//==========================================/ / Arnaud Gourdol. // On the Netland: Gourdol@imag.fr / / // Via AppleLink: Gourdol@imag.fr@INTERNET# / /======================//==========================================/
kost@iias.spb.su (Popov Konstantin E.) (05/06/91)
Hi all! I would clarify problem about that Joshua Neil Rubin spoke: [something erased] > Taking solely the information from a single stereopair of an object > with no hidden surfaces, you can synthesize *any* new view of the > object from *any* perspective you might wish. Solely with technology > that is 100 years old. .... and > The only really > unusual thing about this stretch of terrain is that every bit of > surface was in direct line of sight with each of the two cameras > taking the stereopair. .... and now > As I see it, the problem in quickly synthesizing a new computer- > generated virtual perspective of a scene from a single stereopair > isn't that you need skillabytes of data. [something erased] ... and after that, i think that it's not pure problem for computer graphics disciplines. I suspect that where are necessary not only *excelent* graphic perfomance, but also AI tools. When you are getting such single stereopar, we must decide, at minimum - (a) How much can we change point of view? (b) What distortions of made perspectives are suitable? and on the whole, how we can measure this "suitableness"? (i suspect also that it depend upon ****many factors.) (c) How can we will get another piece of data for composing perspectives if you want make it aside our pre-defined ranges? If we want that system work reliably, we must provide there some subsystems that will be solve such problems. Can somebody say something about it? I hope. (I must give hard apologys for my English to anybody who has read this.) U -- kostja | ... I sing i don't know about. e-mail: kost@gsp.iias.spb.su | (Boris Grebenshikov) ------------------------------ | ------------------------------ [MODERATOR'S NOTE: A welcome to our first Russian participant! Good to have you among us (from a grandchild of Russian emigres). -- Bob Jacobson]
lilj@uunet.UU.NET (Joshua Neil Rubin) (05/13/91)
kost@iias.spb.su (Konstantin E. Popov) writes: >graphic perfomance, but also AI tools. When you are getting such single >stereopar, we must decide, at minimum - > >(a) How much can we change point of view? >(b) What distortions of made perspectives are suitable? > and on the whole, how we can measure this "suitableness"? > (i suspect also that it depend upon ****many factors.) >(c) How can we will get another piece of data for composing > perspectives if you want make it aside our pre-defined ranges? I believe that the answers are as follows: (a) We can theoretically change the point of view to any perspective with no distortion as long as no surface was obstructed from either of the cameras which photographed the original stereopair (that is, there were no hidden surfaces). If there were hidden surfaces, we cannot change the point of view at all without introducing distortions. Surfaces that are hidden from the original perspective create gaps in our information about the scene. For instance, if we were to take a single stereopair of somebody's head from directly in front, we only have information about a mask-shaped surface, that is, the portion of the face exposed to both cameras. We would lack information not only about the back of the head, but indeed about an entire wedge of space projecting backward behind the head. If we wished to synthesize a virtual perspective from the right-hand side of the head, we would have enough information to construct a view of the mask-shaped surface of the face. However, projecting backward from this mask (or to the left if we are viewing the head from the right-hand side), would be a wedge in space about which we lack information. In our synthesized view, we could represent this space as semi-translucent or as a wire-frame, denoting that we do not know whether it or any particular part of it contains or does not contain an object. (b) Suitability would probably depend on the nature of the application and the nature of the data. You can't *gain* information about an actual scene by synthesizing a new perspective of the scene from the information in a single stereopair. However, the synthesized perspective might help you to better *understand* the information that you have. (c) Once you have already parsed the scene into objects in order to be able to synthesize a new perspective, it would probably be *relatively* easy to interpolate certain surfaces. For instance, if our object- recognition program thought it detected a sphere, it would be a simple matter to add the hidden surface of that sphere. Or, of course, you could get information from another photograph from another perspective.