bwk@mbunix.mitre.org (Kort) (11/18/89)
MIT Industrial Liaison Program -- Networks and Learning
On Wednesday and Thursday (November 15-16) I attended the MIT
Industrial Liaison Program entitled "Networks and Learning".
Here is my report...
Professor Thomaso Poggio of the MIT Department of Brain and
Cognitive Sciences opened the symposium by reviewing the
history of advances in the field. About every 20 years there
is an "epidemic" of activity lasting about 12 years, followed
by about 8 years of inactivity. Sixty years ago there began
the Gestalt school in Europe. Forty years ago Cybernetics
emerged in the US. Twenty years ago Perceptrons generated a
flurry of research. Today, Neural Networks represent the
latest breakthrough in this series. [Neural Networks are
highly interconnected structures of relatively simple units,
with algebraic connection weights.]
Professor Leon Cooper, Co-Director of the Center for Neural
Science at Brown University, spoke on "Neural Networks in
Real-World Applications." Neural Nets learn from examples.
Give them lots of examples of Input/Output pairs, and they
build a smooth mapping from the input space to the output
space. Neural Nets work best when the rules are vague or
unknown. The classical 3-stage neural net makes a good
classifier. It can divide up the input space into
arbitrarily shaped regions. At first the network just
divides the space in halves and quarters, using straight line
boundaries ("hyperplanes" for the mathematically minded).
Eventually (and with considerable training) the network can
form arbitrarily curved boundaries to achieve arbitrarily
general classification. Given enough of the critical
features upon which to reach a decision, networks have been
able to recognize and categorize diseased hearts from
heartbeat patterns. With a sufficiently rich supply of
clues, the accuracy of such classifiers can approach 100%.
Accuracy depends on the sample length of the heartbeat
pattern--a hurried decision is an error-prone decision.
Professor Ron Rivest, Associate Director of MIT's Laboratory
for Computer Science, surveyed "The Theoretical Aspects of
Learning and Networks." He addresses the question, "How do
we discover good methods of solution for the problems we wish
to solve?" In studying Neural Networks, he notes their
strengths and characteristics: learning from example,
expressiveness, computational complexity, sample space
complexity, learning a mapping. The fundamental unit of a
neural network is a linear adder followed by a threshold
trigger. If the algebraic sum of the input signals exceeds
threshold, the output signal fires. Neural nets need not be
constrained to boolean signals (zero/one), but can handle
continuous analog signal levels. And the threshold trigger
can be relaxed to an S-shaped response. Rivest tells us that
any continuous function mapping the unit interval [-1, 1]
into itself can be approximated arbitrarily well with a 3-
stage neural network. (The theorem extends to the Cartesian
product: the mapping can be from an m-fold unit hypercube
into an n-fold unit hypercube.) Training the neural net
amounts to finding the coefficients which minimize the error
between the examples and the neural network's approximation.
The so-called Error Backpropagation algorithm is
mathematically equivalent to least squares curve fitting
using steepest descent. While this method works, it can be
very slow. In fact, training a 3-stage neural network is an
NP-complete problem--the work increases exponentially with
the size of the network. The classical solution to this
dilemma is to decompose the problem down into smaller
subproblems, each solvable by a smaller system. Open issues
in neural network technology include the incorporation of
prior domain knowledge, and the inapplicability of powerful
learning methods such as Socratic-style guided discovery and
experimentation. There is a need to merge the statistical
paradigm of neural networks with the more traditional
knowledge representation techniques of analytical and
symbolic approaches.
Professor Terry Sejnowski, Director of the Computational
Neurobiology Laboratory at the Salk Institute for Biological
Studies, gave a captivating lecture on "Learning Algorithms
in the Brain." Terry, who studies biological neural
networks, has witnessed the successful "reverse engineering"
of several complete systems. The Vestibular Occular Reflex
is the feedforward circuit from the semicircular canals of
the inner ear to the eye muscles which allow us to fixate on
a target even as we move and bob our heads. If you shake
your head as you read this sentence, your eyes can remain
fixed on the text. This very old circuit has been around for
hundreds of millions of years, going back to our reptilian
ancestors. It is found in the brain stem, and operates with
only a 7-ms delay. (Tracking a moving target is more
complex, requiring a feedback circuit that taps into the
higher cognitive centers.) The Vestibular Occular Reflex
appears to be overdesigned, generating opposing signals which
at first appear to serve no function. Only last week, a
veteran researcher finally explained how the dynamic tension
between opposing signals allows the long-term adaptation to
growth of the body and other factors (such as new eyeglasses)
which could otherwise defeat the performance of the reflex.
Terry also described the operation of one of the simplest
neurons, found in the hippocampus, which mediates long-term
memory. The Hebbs Synapse is one that undergoes a
physiological change when the neuron happens to fire during
simultaneous occurrence of stimuli representing the
input/output pair of a training sample. After the
physiological change, the neuron becomes permanently
sensitized to the input stimulus. The Hebbs Synapse would
seem to be the foundation for superstitious learning.
After a refreshing lunch of cold roast beef and warm
conversation, Professor Thomaso Poggio returned to the podium
to speak on "Networks for Learning: A Vision Application."
He began by reviewing the theoretical result that equates the
operation of a 2-layer neural network to linear regression.
To achieve polynomial regression, one needs a 3-layer neural
network. Such a neural net can reconstruct a (smooth)
hypersurface from sparse data. (An example of a non-smooth
map would be a telephone directory which maps names into
numbers. No smooth interpolation will enable you to estimate
the telephone number of someone whose name is not in the
directory.) Professor Poggio explored the deep connection
between classical curve fitting and 3-stage neural networks.
The architecture of the neural net corresponds to the so-
called HyperBasis Functions which are fitted to the training
data. A particularly simple but convenient basis function is
a gaussian centered around each sample x-value. The
interpolated y-value is then just the average of all the
sample y-values weighted by their gaussian multipliers. In
other words, the nearest neighbors to x are averaged to
estimate the output, y(x). For smooth maps, such a scheme
works well.
Dr. Richard Lippmann of the MIT Lincoln Laboratory spoke on
"Neural Network Pattern Classifiers for Speech Recognition."
Historically, classification has progressed through four
stages--Probabalistic Classifiers using linear discriminant
functions, Hyperplane Separation using piecewise linear
boundaries, Receptive Field Classification using radial basis
functions, and the new Exemplar Method using multilayer
Perceptrons and feature maps. Surveying and comparing
alternate architectures and algorithms for speech
recognition, Dr. Lippmann, reviewed the diversity of
techniques, comparing results, accuracy, speed, and
computational resources required. From the best to the
worst, they can differ by orders of magnitude in cost and
performance.
Professor Michael Jordan of MIT's Department of Brain and
Cognitive Science spoke on "Adaptive Networks for Motor
Control and Robotics." There has been much progress in this
field over the last five years, but neural nets do not
represent a revolutionary breakthrough. The "Inverse
Problem" in control theory is classical: find the control
sequence which will drive the system from the current state
to the goal state. It is well known from Cybernetics that
the controller must compute (directly or recursively) an
inverse model of the forward system. This is equivalent to
the problem of diagnosing cause from effect. The classical
solution is to build a model of the forward system and let
the controller learn the inverse through unsupervised
learning (playing with the model). The learning proceeds
incrementally, corresponding to backpropagation or gradient
descent based on the transposed Jacobian (first derivative).
This is essentially how humans learn to fly and drive using
simulators.
Danny Hillis, Founding Scientist of Thinking Machines
Corporation, captured the audience with a spellbinding talk
on "Intelligence as an Emergent Phenomenon." Danny began
with a survey of computational problems well-suited to
massively parallel architectures--matrix algebra and parallel
search. He uses the biological metaphor of evolution as his
model for massively parallel computation and search. Since
the evolution of intelligence is not studied as much as the
engineering approach (divide and conquer) or the biological
approach (reverse engineer nature's best ideas), Danny chose
to apply his connection machine to the exploration of
evolutionary processes. He invented a mathematical organism
(called a "ramp") which seeks to evolve and perfect itself.
A population cloud of these ramps inhabits his connection
machine, mutating, evolving, and competing for survival of
the fittest. Danny's color videos show the evolution of the
species under different circumstances. He found that the
steady state did not generally lead to a 100 percent
population of perfect ramps. Rather 2 or more immiscible
populations of suboptimal ramps formed pockets with seething
boundaries. He then introduced a species of parasites which
attacked ramps at their weakest points, so that stable
populations would eventually succumb to a destructive
epidemic. The parasites did not clear the way for the
emergence of perfect and immune ramps. Rather, the
populations cycled through a roiling rise and fall of
suboptimal ramps, still sequestered into camps of Gog and
Magog. The eerie resemblance to modern geopolitics and
classical mythology was palpable and profound.
Professor John Wyatt of the MIT Department of Electrical
Engineering and Computer Science closed the first day's
program with a talk on "Analog VLSI Hardware for Early
Vision: Parallel Distributed Computation without Learning."
Professor Wyatt's students are building analog devices that
can be stimulated by focusing a scene image onto the surface
of a chip. His devices for image processing use low
precision (about 8 bits) analog processing based on the
inherent bulk properties of silicon. His goal is to produce
chips costing $4.95. One such chip can find the fixed point
when the scene is zoomed. (Say you are approaching the back
of a slow moving truck. As the back of the truck looms
larger in your field of view, the fixed point in the scene
corresponds to the point of impact if you fail to slow down.)
Identification of the coordinates of the fixed point and the
estimated time to impact are the output of this chip.
Charged-coupled devices and other technologies are being
transformed into such image processing devices as stereo
depth estimation, image smoothing and segmentation, and
motion vision.
The second day of the symposium focused on the Japanese,
European, and American perspectives for the development and
application of neural nets.
Professor Shun-ichi Amari of the Department of Mathematical
Engineering and Information Physics at the University of
Tokyo explored the mathematical theory of neural nets.
Whereas conventional computers operate on symbols using
programmed sequential logic, neural nets correspond more to
intuitive styles of information processing--pattern
recognition, dynamic parallel processing, and learning.
Professor Amari explored neural network operation in terms of
mathematical mapping theory and fixed points. Here, the
fixed points represent the set of weights corresponding to
the stable state after extensive training.
Dr. Wolfram Buttner of Siemens Corporate Research and
Development discussed several initiatives in Europe to
develop early commercial applications of neural net
technology. Workpiece recognition in the robotic factory and
classification of stimuli into categories are recurring
themes here. There is also interest in unsupervised learning
(playing with models or exploring complex environments),
decision support systems (modeling, prediction, diagnosis,
scenario analysis, optimal decision making with imperfect
information) and computer languages for neural network
architectures. Dr. Buttner described NeuroPascal, an
extension to Pascal for parallel neurocomputing
architectures.
Dr. Scott Kirkpatrick, Manager of Workstation Design at IBM's
Thomas J. Watson Research Center, explored numerous potential
applications of neural nets as information processing
elements. They can be viewed as filters, transformers,
classifiers, and predictors. Commercial applications include
routine processing of high-volume data streams such as
credit-checking and programmed arbitrage trading. They are
also well-suited to adaptive equalization, echo cancellation,
and other signal processing tasks. SAIC is using them in its
automated luggage inspection system to recognize the telltale
signs of suspect contents of checked luggage. Neurogammon
1.0, which took two years to build, plays a mean game of
backgammon, beating all other machines and giving world class
humans a run for their money. Hard problems for neural nets
include 3D object recognition in complex scenes, natural
language understanding, and "database mining" (theory
construction). Today's commercially viable applications of
neural nets could only support about 200 people. It will be
many years before neurocomputing becomes a profitable
industry.
Marvin Minsky, MIT's Donner Professor of Science, gave an
entertaining talk on "Future Models". The human brain has
over 400 specialized architectures, and is equivalent in
capacity to about 200 Connection Machines (Model CM-2).
There are about 2000 data buses interconnecting the various
departments of the brain. As one moves up the hierarchy of
information processing, one begins at Sensory-Motor and
advances through Concrete Thinking, Operational Thinking,
"Other Stages", and arrives at Formal Thinking as the highest
cognitive stage. A human subject matter expert who is a
world class master in his field has about 20-50 thousand
discrete "chunks" of knowledge. Among the computational
paradigms found in the brain, there are Space Frames (for
visual information), Script Frames (for stories), Trans-
Frames (for mapping between frames), K-Lines (explanation
elided), Semantic Networks (for vocabulary and ideas), Trees
(for hierarchical and taxonomical knowledge), and Rule-Based
Systems (for bureaucrats). Minsky's theory is summarized in
his latest book, Society of Mind. Results with neural
networks solving "interesting" problems such as playing
backgammon or doing freshman calculus reveal that we don't
always know which problems are hard. It appears that a
problem is hard until somebody shows an easy way to solve it.
After that, it's deemed trivial. As to intelligence, Minsky
says that humans are good at what humans do. He says, "A
frog is very good at catching flies. And you're not."
The afternoon panel discussion, led by Patrick Winston,
provided the speakers and audience another chance to visit
and revisit topics of interest. That commercial neural
networks are not solving profoundly deep and important
problems was a source of dismay to some, who thought that we
had enough programmed trading and credit checking going on
already, and we don't need more robots turning down our loans
and sending the stock markets into instability.
The deeper significance of the symposium is that research in
neural networks is stimulating the field of brain and
cognitive science and giving us new insights into who we are,
how we came to be that way, and where we can go, if we use
our higher cognitive functions to best advantage.
--Barry Kort