[comp.ai.neural-nets] MIT Industrial Liaison Program -- Networks and Learning

bwk@mbunix.mitre.org (Kort) (11/18/89)

MIT Industrial Liaison Program -- Networks and Learning

On Wednesday and Thursday (November 15-16) I attended the MIT 
Industrial Liaison Program entitled "Networks and Learning".  
Here is my report...

Professor Thomaso Poggio of the MIT Department of Brain and 
Cognitive Sciences opened the symposium by reviewing the 
history of advances in the field.  About every 20 years there 
is an "epidemic" of activity lasting about 12 years, followed 
by about 8 years of inactivity.  Sixty years ago there began 
the Gestalt school in Europe.  Forty years ago Cybernetics 
emerged in the US.  Twenty years ago Perceptrons generated a 
flurry of research.  Today, Neural Networks represent the 
latest breakthrough in this series.  [Neural Networks are 
highly interconnected structures of relatively simple units, 
with algebraic connection weights.]

Professor Leon Cooper, Co-Director of the Center for Neural 
Science at Brown University, spoke on "Neural Networks in 
Real-World Applications."  Neural Nets learn from examples.  
Give them lots of examples of Input/Output pairs, and they 
build a smooth mapping from the input space to the output 
space.  Neural Nets work best when the rules are vague or 
unknown.  The classical 3-stage neural net makes a good 
classifier.  It can divide up the input space into 
arbitrarily shaped regions.  At first the network just 
divides the space in halves and quarters, using straight line 
boundaries ("hyperplanes" for the mathematically minded).  
Eventually (and with considerable training) the network can 
form arbitrarily curved boundaries to achieve arbitrarily 
general classification.  Given enough of the critical 
features upon which to reach a decision, networks have been 
able to recognize and categorize diseased hearts from 
heartbeat patterns.  With a sufficiently rich supply of 
clues, the accuracy of such classifiers can approach 100%.  
Accuracy depends on the sample length of the heartbeat 
pattern--a hurried decision is an error-prone decision.

Professor Ron Rivest, Associate Director of MIT's Laboratory 
for Computer Science, surveyed "The Theoretical Aspects of 
Learning and Networks."  He addresses the question, "How do 
we discover good methods of solution for the problems we wish 
to solve?"  In studying Neural Networks, he notes their 
strengths and characteristics:  learning from example, 
expressiveness, computational complexity, sample space 
complexity, learning a mapping.  The fundamental unit of a 
neural network is a linear adder followed by a threshold 
trigger.  If the algebraic sum of the input signals exceeds 
threshold, the output signal fires.  Neural nets need not be 
constrained to boolean signals (zero/one), but can handle 
continuous analog signal levels.  And the threshold trigger 
can be relaxed to an S-shaped response.  Rivest tells us that 
any continuous function mapping the unit interval [-1, 1] 
into itself can be approximated arbitrarily well with a 3-
stage neural network.  (The theorem extends to the Cartesian 
product:  the mapping can be from an m-fold unit hypercube 
into an n-fold unit hypercube.)  Training the neural net 
amounts to finding the coefficients which minimize the error 
between the examples and the neural network's approximation.  
The so-called Error Backpropagation algorithm is 
mathematically equivalent to least squares curve fitting 
using steepest descent.  While this method works, it can be 
very slow.  In fact, training a 3-stage neural network is an 
NP-complete problem--the work increases exponentially with 
the size of the network.  The classical solution to this 
dilemma is to decompose the problem down into smaller 
subproblems, each solvable by a smaller system.  Open issues 
in neural network technology include the incorporation of 
prior domain knowledge, and the inapplicability of powerful 
learning methods such as Socratic-style guided discovery and 
experimentation.  There is a need to merge the statistical 
paradigm of neural networks with the more traditional 
knowledge representation techniques of analytical and 
symbolic approaches.

Professor Terry Sejnowski, Director of the Computational 
Neurobiology Laboratory at the Salk Institute for Biological 
Studies, gave a captivating lecture on "Learning Algorithms 
in the Brain."  Terry, who studies biological neural 
networks, has witnessed the successful "reverse engineering" 
of several complete systems.  The Vestibular Occular Reflex 
is the feedforward circuit from the semicircular canals of 
the inner ear to the eye muscles which allow us to fixate on 
a target even as we move and bob our heads.  If you shake 
your head as you read this sentence, your eyes can remain 
fixed on the text.  This very old circuit has been around for 
hundreds of millions of years, going back to our reptilian 
ancestors.  It is found in the brain stem, and operates with 
only a 7-ms delay.  (Tracking a moving target is more 
complex, requiring a feedback circuit that taps into the 
higher cognitive centers.)  The Vestibular Occular Reflex 
appears to be overdesigned, generating opposing signals which 
at first appear to serve no function.  Only last week, a 
veteran researcher finally explained how the dynamic tension 
between opposing signals allows the long-term adaptation to 
growth of the body and other factors (such as new eyeglasses) 
which could otherwise defeat the performance of the reflex.  
Terry also described the operation of one of the simplest 
neurons, found in the hippocampus, which mediates long-term 
memory.  The Hebbs Synapse is one that undergoes a 
physiological change when the neuron happens to fire during 
simultaneous occurrence of stimuli representing the 
input/output pair of a training sample.  After the 
physiological change, the neuron becomes permanently 
sensitized to the input stimulus.  The Hebbs Synapse would 
seem to be the foundation for superstitious learning.

After a refreshing lunch of cold roast beef and warm 
conversation, Professor Thomaso Poggio returned to the podium 
to speak on "Networks for Learning:  A Vision Application."  
He began by reviewing the theoretical result that equates the 
operation of a 2-layer neural network to linear regression.  
To achieve polynomial regression, one needs a 3-layer neural 
network.  Such a neural net can reconstruct a (smooth) 
hypersurface from sparse data.  (An example of a non-smooth 
map would be a telephone directory which maps names into 
numbers.  No smooth interpolation will enable you to estimate 
the telephone number of someone whose name is not in the 
directory.)  Professor Poggio explored the deep connection 
between classical curve fitting and 3-stage neural networks.  
The architecture of the neural net corresponds to the so-
called HyperBasis Functions which are fitted to the training 
data.  A particularly simple but convenient basis function is 
a gaussian centered around each sample x-value.  The 
interpolated y-value is then just the average of all the 
sample y-values weighted by their gaussian multipliers.  In 
other words, the nearest neighbors to x are averaged to 
estimate the output, y(x).  For smooth maps, such a scheme 
works well.

Dr. Richard Lippmann of the MIT Lincoln Laboratory spoke on 
"Neural Network Pattern Classifiers for Speech Recognition."  
Historically, classification has progressed through four 
stages--Probabalistic Classifiers using linear discriminant 
functions, Hyperplane Separation using piecewise linear 
boundaries, Receptive Field Classification using radial basis 
functions, and the new Exemplar Method using multilayer 
Perceptrons and feature maps.  Surveying and comparing 
alternate architectures and algorithms for speech 
recognition, Dr. Lippmann, reviewed the diversity of 
techniques, comparing results, accuracy, speed, and 
computational resources required.  From the best to the 
worst, they can differ by orders of magnitude in cost and 
performance.

Professor Michael Jordan of MIT's Department of Brain and 
Cognitive Science spoke on "Adaptive Networks for Motor 
Control and Robotics."  There has been much progress in this 
field over the last five years, but neural nets do not 
represent a revolutionary breakthrough.  The "Inverse 
Problem" in control theory is classical:  find the control 
sequence which will drive the system from the current state 
to the goal state.  It is well known from Cybernetics that 
the controller must compute (directly or recursively) an 
inverse model of the forward system.  This is equivalent to 
the problem of diagnosing cause from effect.  The classical 
solution is to build a  model of the forward system and let 
the controller learn the inverse through unsupervised 
learning (playing with the model).  The learning proceeds 
incrementally, corresponding to backpropagation or gradient 
descent based on the transposed Jacobian (first derivative).  
This is essentially how humans learn to fly and drive using 
simulators.

Danny Hillis, Founding Scientist of Thinking Machines 
Corporation, captured the audience with a spellbinding talk 
on "Intelligence as an Emergent Phenomenon."  Danny began 
with a survey of computational problems well-suited to 
massively parallel architectures--matrix algebra and parallel 
search.  He uses the biological metaphor of evolution as his 
model for massively parallel computation and search.  Since 
the evolution of intelligence is not studied as much as the 
engineering approach (divide and conquer) or the biological 
approach (reverse engineer nature's best ideas), Danny chose 
to apply his connection machine to the exploration of 
evolutionary processes.  He invented a mathematical organism 
(called a "ramp") which seeks to evolve and perfect itself.  
A population cloud of these ramps inhabits his connection 
machine, mutating, evolving, and competing for survival of 
the fittest.  Danny's color videos show the evolution of the 
species under different circumstances.  He found that the 
steady state did not generally lead to a 100 percent 
population of perfect ramps.  Rather 2 or more immiscible 
populations of suboptimal ramps formed pockets with seething 
boundaries.  He then introduced a species of parasites which 
attacked ramps at their weakest points, so that stable 
populations would eventually succumb to a destructive 
epidemic.  The parasites did not clear the way for the 
emergence of perfect and immune ramps.  Rather, the 
populations cycled through a roiling rise and fall of 
suboptimal ramps, still sequestered into camps of Gog and 
Magog.  The eerie resemblance to modern geopolitics and 
classical mythology was palpable and profound.

Professor John Wyatt of the MIT Department of Electrical 
Engineering and Computer Science closed the first day's 
program with a talk on "Analog VLSI Hardware for Early 
Vision:  Parallel Distributed Computation without Learning."  
Professor Wyatt's students are building analog devices that 
can be stimulated by focusing a scene image onto the surface 
of a chip.  His devices for image processing use low 
precision (about 8 bits) analog processing based on the 
inherent bulk properties of silicon.  His goal is to produce 
chips costing $4.95.  One such chip can find the fixed point 
when the scene is zoomed.  (Say you are approaching the back 
of a slow moving truck.  As the back of the truck looms 
larger in your field of view, the fixed point in the scene 
corresponds to the point of impact if you fail to slow down.)  
Identification of the coordinates of the fixed point and the 
estimated time to impact are the output of this chip.  
Charged-coupled devices and other technologies are being 
transformed into such image processing devices as stereo 
depth estimation, image smoothing and segmentation, and 
motion vision.

The second day of the symposium focused on the Japanese, 
European, and American perspectives for the development and 
application of neural nets.

Professor Shun-ichi Amari of the Department of Mathematical 
Engineering and Information Physics at the University of 
Tokyo explored the mathematical theory of neural nets.  
Whereas conventional computers operate on symbols using 
programmed sequential logic, neural nets correspond more to 
intuitive styles of information processing--pattern 
recognition, dynamic parallel processing, and learning.  
Professor Amari explored neural network operation in terms of 
mathematical mapping theory and fixed points.  Here, the 
fixed points represent the set of weights corresponding to 
the stable state after extensive training.

Dr. Wolfram Buttner of Siemens Corporate Research and 
Development discussed several initiatives in Europe to 
develop early commercial applications of neural net 
technology.  Workpiece recognition in the robotic factory and 
classification of stimuli into categories are recurring 
themes here.  There is also interest in unsupervised learning 
(playing with models or exploring complex environments), 
decision support systems (modeling, prediction, diagnosis, 
scenario analysis, optimal decision making with imperfect 
information) and computer languages for neural network 
architectures.  Dr. Buttner described NeuroPascal, an 
extension to Pascal for parallel neurocomputing 
architectures.

Dr. Scott Kirkpatrick, Manager of Workstation Design at IBM's 
Thomas J. Watson Research Center, explored numerous potential 
applications of neural nets as information processing 
elements.  They can be viewed as filters, transformers, 
classifiers, and predictors.  Commercial applications include 
routine processing of high-volume data streams such as 
credit-checking and programmed arbitrage trading.  They are 
also well-suited to adaptive equalization, echo cancellation, 
and other signal processing tasks.  SAIC is using them in its 
automated luggage inspection system to recognize the telltale 
signs of suspect contents of checked luggage.  Neurogammon 
1.0, which took two years to build, plays a mean game of 
backgammon, beating all other machines and giving world class 
humans a run for their money.  Hard problems for neural nets 
include 3D object recognition in complex scenes, natural 
language understanding, and "database mining" (theory 
construction).  Today's commercially viable applications of 
neural nets could only support about 200 people.  It will be 
many years before neurocomputing becomes a profitable 
industry.

Marvin Minsky, MIT's Donner Professor of Science, gave an 
entertaining talk on "Future Models".  The human brain has 
over 400 specialized architectures, and is equivalent in 
capacity to about 200 Connection Machines (Model CM-2).  
There are about 2000 data buses interconnecting the various 
departments of the brain.  As one moves up the hierarchy of 
information processing, one begins at Sensory-Motor and 
advances through Concrete Thinking, Operational Thinking, 
"Other Stages", and arrives at Formal Thinking as the highest 
cognitive stage.  A human subject matter expert who is a 
world class master in his field has about 20-50 thousand 
discrete "chunks" of knowledge.  Among the computational 
paradigms found in the brain, there are Space Frames (for 
visual information), Script Frames (for stories), Trans-
Frames (for mapping between frames), K-Lines (explanation 
elided), Semantic Networks (for vocabulary and ideas), Trees 
(for hierarchical and taxonomical knowledge), and Rule-Based 
Systems (for bureaucrats).  Minsky's theory is summarized in 
his latest book, Society of Mind.  Results with neural 
networks solving "interesting" problems such as playing 
backgammon or doing freshman calculus reveal that we don't 
always know which problems are hard.  It appears that a 
problem is hard until somebody shows an easy way to solve it.  
After that, it's deemed trivial.  As to intelligence, Minsky 
says that humans are good at what humans do.  He says, "A 
frog is very good at catching flies.  And you're not."

The afternoon panel discussion, led by Patrick Winston, 
provided the speakers and audience another chance to visit 
and revisit topics of interest.  That commercial neural 
networks are not solving profoundly deep and important 
problems was a source of dismay to some, who thought that we 
had enough programmed trading and credit checking going on 
already, and we don't need more robots turning down our loans 
and sending the stock markets into instability.  

The deeper significance of the symposium is that research in 
neural networks is stimulating the field of brain and 
cognitive science and giving us new insights into who we are, 
how we came to be that way, and where we can go, if we use 
our higher cognitive functions to best advantage.

--Barry Kort