pausch@capa.cs.Virginia.EDU (Randy Pausch) (05/16/91)
For those of you who aren't members of ACM SIGCHI, here's the text of
the paper.   Contacting me for "additional info" will not be productive:
most everything I know is in the paper.
As appeared in: Proceedings of the ACM SIGCHI Human Factors in Computer
Systems Conference, April, 1991, New Orleans
                   Virtual Reality on Five Dollars a Day
                                Randy Pausch
                        Computer Science Department
                           University of Virginia
                               Thornton Hall
                         Charlottesville, VA 22903
                            Pausch@Virginia.edu
ABSTRACT
Virtual reality systems using head-mounted displays and glove input are
gaining popularity but their cost prohibits widespread use. We have
developed a system using an 80386 IBM-PCTM, a Polhemus 3Space IsotrakTM, two
Reflection Technology Private EyeTM displays, and a Mattel Power GloveTM.
For less than $5,000, we have created an effective vehicle for developing
interaction techniques in virtual reality. Our system displays monochrome
wire frames of objects with a spatial resolution of 720 by 280, the highest
resolution head-mounted system published to date. We have confirmed findings
by other researchers that low-latency interaction is significantly more
important than high-quality graphics or stereoscopy. We have also found it
useful to display reference objects to our user, specifically a ground plane
for reference and a vehicle containing the user.
KEYWORDS: Virtual reality, head-mounted display, glove input, computer
graphics, teleoperation, speech recognition, hand gesturing,
three-dimensional interaction.
INTRODUCTION
Virtual reality systems are currently gaining popularity but the cost of the
underlying hardware has limited research in the field. With any new
technology, there is an early period where informal observations are made
and large breakthroughs are possible. We believe that the best way to speed
up this process with head-mounted display/glove input systems is to provide
low cost versions of the technology so larger numbers of researchers may use
it. We have developed a complete virtual reality system for less than
$5,000, or less than five dollars per day if amortized over a three-year
period. We built the system because we had an immediate need and also to
show that virtual reality research can be done without expensive hardware.
Our immediate interest in virtual reality interaction comes from the Tailor
project[18], whose goal is to allow severely disabled children to control
devices via gesture input. The Tailor system adjusts to each child's
possible range of motion and converts motion in that range into analog
control signals that drive software applications. To specify motion
mappings, therapists with no technical background must specify one
dimensional curves and two dimensional surfaces in three dimensional space.
Using our low cost system, we will allow therapists to interactively
manipulate a wire frame mesh by using the glove to grasp control points on
the mesh.
Our system provides 720 by 280 spatial resolution and weighs 6 ounces,
making it higher resolution and lower weight than head-mounted displays
previously reported in the literature. In this paper, we present several
design observations made after experience with our system. Our first
observation is that increasing spatial resolution does not greatly improve
the quality of the system. We typically decrease our resolution to increase
our rendering speed. We also observe that stereoscopy is not critical, and
that reference objects such as a ground plane and a virtual vehicle are
extremely helpful to the user.
SYSTEM DESCRIPTION
The main processor for our system is a 2.5 MIP, 20 Mhz 386-based IBM-PCTM
compatible with 640K of RAM, a 80387 floating point co-processor, and
MS-DOSTM. Our head-mounted display uses a combination of two Private Eye
displays manufactured by Reflection Technology, Inc. [1]. Figure 1 shows a
Private Eye, a 1.2 by 1.3 by 3.5 inch device weighing 2.5 ounces. The 1 inch
square monochrome display surface has a resolution of 720 horizontal by 280
vertical red pixels against a black background. Optics between the user's
eye and the display surface make the image appear to be one to three feet
wide, "floating" several feet away.
The Private Eye is implemented with a vertical column of 280 red LEDs,
manufactured as a unit to pack them as densely as possible. To fill the
entire visual display area, the LEDs are switched on and off rapidly as a
vibrating mirror rotates through the 720 different vertical columns of the
display, as shown in Figure 2. The Private Eye can "shadow" a standard CGA
display with resolution of either 640 by 200 or 320 by 200 pixels, or it can
be accessed a library which supports a spatial resolution of 720 by 280
resolution. The library allows the painting of text and bitmaps, but does
not support graphics primitives such as lines; therefore, we use the device
by shadowing a CGA display.
Reflection Technologies is marketing the Private Eye primarily as a
"hands-busy" display; Figure 3 shows how the company expects most users to
wear the device. The user can look down into the display without obstructing
normal vision. Figure 4 shows how we mount two Private Eyes underneath a
baseball cap. We have also used sunglasses with leather sides to shield the
user from peripheral distractions. Our head-mounted display can either be
stereoscopic or bi-ocular (each eye receives the same picture).
We use a Polhemus 3Space Isotrak[20] to track the position and orientation
of the user's head. The Isotrak senses changes in a magnetic field and
reports three spatial (x, y, z) and three angular (yaw, pitch, roll)
coordinates 60 times each second. Our system uses the Mattel Power Glove as
an input device for position and gesture information. The glove is
manufactured by Mattel, Inc., under licence from Abrams-Gentile
Entertainment, Inc. (AGE). The Power Glove is provided to retail stores at a
wholesale cost of 62 dollars and is sold at a retail cost ranging between 70
and 100 dollars. Although Mattel does not release unit sales figures, they
report that in 1989 the Power Glove generated over 40 million dollars in
revenue, implying that over half a million gloves were sold that year.
Early glove research was conducted at VPL Research, Inc., the manufacturers
of the DataGloveTM[23,27]. The DataGlove uses fiber optics to determine
finger bend and a Polhemus tracker to determine hand position. Neither of
these technologies could be mass produced easily, so the Power Glove uses
variable resistance material for finger bend, and ultrasonics for hand
position.
The Power Glove is marketed as a peripheral for the Nintendo Entertainment
SystemTM. To thwart rival toy manufacturers, the data stream between the
Power Glove and the main Nintendo unit is encrypted. When the Power Glove
was originally introduced, it was rumored that dozens of research groups
across the country began working on decrypting this data stream, and that
several groups actually broke the code. An article appeared in Byte magazine
describing how to attach the glove as a serial device, but only allowed the
glove to emulate a joystick-type input device[6]. Rather than engaging in
cryptography, we phoned Chris Gentile at AGE and described our research
goals. He allowed us to sign a non-disclosure agreement and within days sent
us a decrypting device that allows us to use the glove as a serial device
communicating over an RS232 line. AGE and VPL Research have recently
announced the VPL/AGE Power Glove Education Support Program[26] and plan to
provide a low cost glove with 5 degrees of freedom for between 150 and 200
dollars.
The Power Glove uses two ultrasonic transmitters on the back of the user's
hand and three wall-mounted receivers configured in an L-shape. The glove
communicates successfully within ten to fifteen feet of the receivers when
it is oriented towards them. As the glove turns away from the receivers, the
signals degrades. Although some signal is received up to a 90 degree angle,
Mattel claims the glove is only usable at up to roughly 45 degrees. When the
glove is within five to six feet of the receivers, its (x, y, z) coordinate
information is accurate to within 0.25 inches [15]. In addition to position
information, the Power Glove provides roll information, where roll is the
angle made by pivoting the hand around the axis of the forearm. Roll is
reported in one of twelve possible positions.
Finger bend is determined from the varying resistance through materials
running the length of the finger. The user's thumb, index, middle, and ring
finger bend are each reported as a two-bit integer. This four-position
granularity is significantly less than the resolution provided by the VPL
DataGlove, but most of the gestures used in previously published virtual
reality systems can be supported with only two bits per finger [2,8,11,25].
The only hardware we plan to add to our system is for voice input. Several
small vocabulary, speaker-dependent input devices exist for the PC, all
costing several hundred dollars. Once this is added, many of the commands
currently given by hand gesture will be replaced by voice input.
All software for our system is locally developed in ANSI-standard C [12]. We
have a simple version of PHIGS [10] and are using a locally developed user
interface toolkit [17]. Our low-level graphics and input handling packages
have been widely ported, and allow our students to develop applications on
SunsTM, MacintoshesTM, or PCs before running them on the machine equipped
with the head-mounted display. We are currently developing a
three-dimensional glove-based object editor.
Although fast enough to be used, the limiting factor of our system's
performance is the speed of line scan conversion. We draw monochrome wire
frame objects, but are limited by the hardware's ability to draw lines. The
hardware can render 500 vectors per second (of random orientation and
length) but our CPU can execute the floating point viewing transformations
for 3,500 vectors per second. In practice, we tend to use scenes with
roughly 50 lines and we sustain a rate of 7 frames per second.
High-performance scan-conversion boards currently exist which would
substantially improve our rendering capabilities, and we expect their price
to drop substantially in the coming year.
The major limitation of our system's usability is the lag of the Polhemus
Isotrak. Other researchers using the Isotrak have also reported this
problem; no one has precisely documented its duration, but it is within 150
and 250 milliseconds[9]. Ascension Technology, Inc. recently announced the
BirdTM, a $5,000 competitor to the Polhemus Isotrak with a lag of only 24
milliseconds[21].
The existing system, when augmented with voice, will still cost less than
$5,000 in hardware ($750 for each eye, $3,000 for the head tracker, $80 for
the Power Glove, and ~$400 for the voice input). For less than the cost of a
high resolution color monitor, we have added the I/O devices to support a
complete virtual reality system.
RESEARCH OBSERVATIONS
Fred Brooks [5] has commented that:
"A major issue perplexes and bedevils the computer-human interface community
-- the tension between narrow truths proved convincingly by statistically
sound experiments, and broad `truths,' generally applicable, but supported
only by possibly unrepresentative observations."
Brooks distinguishes between findings, observations, and rules-of-thumb, and
states that we should provide results in all three categories, as
appropriate. Most research presented to date in virtual reality are either
what Brooks calls observations or rules-of-thumb, and we continue in this
vein, stating our experience:
The quality of the graphics is not as important as the interaction latency
If we had to choose between them, we would prefer to decrease our tracking
lag than increase our graphics capabilities. Although we have much greater
spatial resolution than other head-mounted displays, this does not seem to
significantly improve the quality of our system. Our experience confirms
what has been discovered at VPL Research and NASA AMES research center: if
the display is driven by user head motion, users can tolerate low display
resolution, but notice lag in the 200 millisecond range.
Stereoscopy is not essential
Users of bi-ocular and monocular (one eye covered with a patch) versions of
our system could maneuver and interact with objects in the environment.
Since a straightforward implementation of stereo viewing slows down graphics
by a factor of two or doubles the hardware cost, it is not always an
appropriate use of resources.
A ground plane is extremely useful
Non-head-mounted virtual worlds sometimes introduce a ground plane to
provide orientation [3,22]. In expensive head-mounted systems, the floor is
usually implicitly included as a shaded polygon. We found the need in our
system to include an artificial ground plane for reference, drawn as a
rectangular grid of either lines or dots.
Display the limits of the "vehicle" to the user
In virtual reality, a user's movement is always constrained by the physical
world. In most systems this manifests with the user straining an umbilical
cord. Even in systems with no umbilical and infinite range trackers, this
problem will still exist. Unless the user is in the middle of a large, open
space the real world will limit the user's motions. In the VIEW system [7,8]
a waist-level hexagon displays the range of the tracker, but is part of the
world scene and does not move as the user flies. We treat the user as always
residing in a "vehicle" [24]. The vehicle for a Polhemus is roughly a ten
foot hemisphere. If the user wishes to view an object within the range of
the vehicle, he may walk over to it, thereby changing his own location
within the vehicle. If, however, the user wishes to grab an object not
currently in the vehicle, he must first fly the vehicle until the desired
object is within the vehicle, as shown in Figure 5. Note that the user may
be simultaneously moving within the vehicle and changing the vehicle's
position in the virtual world, although in practice our users do not combine
these operations. For small vehicles it is probably appropriate to always
display their bounds but for larger vehicles it may be better to show their
bounds only when users are near the edges.
FUTURE WORK
Adding voice input will allow us to experiment with a model we have
developed to support object selection via simultaneous voice and gesture
input. We have already built a prototype of this selection model using a
display screen in combination with voice and gesture input and will attempt
to repeat those results using a head-mounted display[19].
We also will be addressing the registration problem, or the correct matching
of real and synthetic objects. Until force-feedback technology improves from
its current state[14,16], glove-based systems will have to use real-world
objects as tactile and force feedback to the user for some tasks. For
example, one could perform a virtual version of the popular magic trick
"cups and balls" by moving real cups on a real table, but having arbitrary
virtual objects appear under the cups. The graphics for the cups, which can
be grasped and moved, must closely correspond to the real world cups. By
attaching trackers to real world objects, we will study how closely the
visual image must match reality to avoid user dissatisfaction. A second
approach to this problem is to use the Private Eye as a heads up display,
wearing it over only one eye and allowing the user to correlate the real
world and synthetic graphics.
We are currently pursuing support to create a laboratory with between ten
and twenty low cost virtual reality stations. By providing reasonable access
to an entire graduate or undergraduate class, we suspect we may quickly
develop a large number of new interaction techniques. Jaron Lanier has
commented that in virtual reality, "creativity is the only thing of value"
[13]. A good way to spark creative breakthroughs is to increase the number
of people actively using the technology. We are also exploring the
possibility of creating a self-contained, portable system based on a laptop
machine.
CONCLUSIONS
The field of virtual reality research is in its infancy, and will benefit
greatly from putting the technology into as many researchers' hands as
possible. The virtual reality systems previously described in the literature
cost more than most researchers can afford. We have shown that for less than
$5,000, or five dollars per day over three years, researchers can use a
head-mounted display with glove and voice input. Our system has a higher
spatial resolution than any previous system, and is significantly lighter
than previous systems [4,7]. For glove input, the Power Glove has provided
excellent spatial accuracy and usable finger bend data. Based on experience
with our system, we have found that interaction latency is significantly
more important than display resolution or stereoscopy, and that the user can
greatly benefit from the display of reference objects, such as a ground
plane and a virtual vehicle.
ACKNOWLEDGMENTS
This work could not have proceeded without the help we received from Chris
Gentile of AGE. Novak of Mattel, Inc. also provided assistance with an early
draft of the paper. We would also like to thank Ronald Williams, Pramod
Dwivedi, Larry Ferber, Rich Gossweiler, and Chris Long at the University of
Virginia for their help.
REFERENCES
1. Becker, A.,Design Case Study: Private Eye, Information Display, March,
1990.
2. Blanchard, C., Burgess, S., Harvill, Y., Lanier, J, and Lasko, A.,
Reality Built for Two: A Virtual Reality Tool," ACM SIGGRAPH 1990 Symposium
on Interactive 3D Graphics, March, 1990.
3. Brett, C.,Pieper, S., and Zeltzer, D., Putting It All Together: An
Integrated Package for Viewing and Editing 3D Microworlds, Proceedings of
the 4th Usenix Computer Graphics Workshop, October, 1987.
4. Brooks, F., Walkthrough - A Dynamic Graphics System for Simulating
Virtual Buildings, Proceedings of the 1986 ACM Workshop on Interactive
Graphics, October, 1986, 9-21.
5. Brooks, F., Grasping Reality Through Illusion: Interactive Graphics
Serving Science, Proceedings of the ACM SIGCHI Human Factors in Computer
Systems Conference, Washington, D.C., May 17, 1988, 1-11.
6. Eglowstein, H.,Reach Out and Touch Your Data, Byte, July 1990, 283-290.
7. Fisher, S.,McGreevy, M.,Humphries, J., and Robinett, M., Virtual
Environment Display System, Proceedings of the 1986 ACM Workshop on
Interactive Graphics, October, 1986, 77-87.
8. Fisher, S., The AMES Virtual Environment Workstation (VIEW), SIGGRAPH `89
Course #29 Notes, August, 1989. (included a videotape).
9. Fisher, S., Personal Communication (electronic mail), Crystal River,
Inc., September 28, 1990.
10. Foley, J., van Dam, A., Feiner, S., and Hughes, J., Computer Graphics,
Principles and Practices, Addison-Wesley Publishing Co., 1990.
11.  Kaufman, A., Yagel, R. and Bakalash, R., Direct Interaction with a 3D
Volumetric Environment, ACM SIGGRAPH 1990 Symposium on Interactive 3D
Graphics, March, 1990.
12. Kelley, A. and Pohl, I., A Book on C, second Edition, Benjamin/Cummings
Publishing Company, Inc., 1990.
13. Lanier, J., Plenary Address on Virtual Reality, Proceedings of UIST: the
Annual ACM SIGGRAPH Symposium on User Interface Software and Technology,
November, 1989.
14. Ming, O., Pique, M., Hughes, J., and Brooks, F., Force Display Performs
Better than Visual Display in a Simple 6-D Docking Task, IEEE Robotics and
Automation Conference, May, 1989.
15. Novak, Personal Communication (telephone call), January 3, 1991.
16. Ouh-young, M., Pique, M., Hughes, J., Srinivasan, N., and Brooks, F.,
Using a Manipulator For Force Display in Molecular Docking, IEEE Robotics
and Automation Conference 3 (April, 1988), 1824-1829.
17. Pausch, R., A Tutorial for SUIT, the Simple User Interface Toolkit,
Technical Report Tech. Rep.-90-29, University of Virginia Computer Science
Department, September 1, 1990.
18. Pausch, R., and Williams, R., Tailor: Creating Custom User Interfaces
Based on Gesture, Proceedings of UIST: the Annual ACM SIGGRAPH Symposium on
User Interface Software and Technology, October, 1990, 123-134.
19. Pausch, R., and Gossweiler, R., "UserVerse: Application-Independent
Object Selection Using Inaccurate Multi-Modal Input," in Multimedia and
Multimodal User Interface Design, edited M. Blattner and R. Dannenberg,
Addison-Wesley, 1991 (to appear).
20. Rabb, F., Blood, E., Steiner, R., and. Jones, H., Magnetic Position and
Orientation Tracking System, IEEE Transaction on Aerospace and Electronic
Systems, 15, 5 (September, 1979), 709-718.
21. Scully, J., Personal Communication (letter), Ascension Technology, Inc.,
PO Box 527, Burlington, VT 05402 (802) 655-7879, June 27, 1990.
22. Sturman, D., Pieper, S., and Zeltzer, D., Hands-on Interaction With
Virtual Environments, Proceedings of UIST: the Annual ACM SIGGRAPH Symposium
on User Interface Software and Technology, November, 1989.
23. VPL-Research, DataGlove Model 2 Users Manual, Inc., 1987.
24. Ware, C., and Osborne, S., Exploration and Virtual Camera Control in
Virtual Three Dimensional Environments, ACM SIGGRAPH 1990 Symposium on
Interactive 3D Graphics, March, 1990.
25. Weimer, D., and Ganapathy, S., A Synthetic Visual Environment with Hand
Gesturing and Voice Input, Proceedings of the ACM SIGCHI Human Factors in
Computer Systems Conference, April, 1989, 235-240.
26. Zachary, G., and Gentile, C., Personal Communication (letter), VPL
Research, Inc., July 18, 1990. VPL/AGE Power Glove Support Program, VPL
Research, Inc., 656 Bair Island Road, Suite 304, Redwood City, CA 94063,
(415) 361-1710.
27. Zimmerman, T., Lanier, J., Blanchard, C., Bryson, S., and Harvill, Y., A
Hand Gesture Interface Device, Graphics Interface `87, May, 1987, 189-192.
--
--------------------------------------------------------------------
Randy Pausch (Pausch@Virginia.Edu) 804-982-2211  FAX: (804) 982-2214
Assistant Professor, Computer Science Department, Thornton Hall,
University of Virginia, Charlottesville, VA 22903-2442 
--------------------------------------------------------------------