[comp.ai.digest] Neural Networks & Unaligned fields

Laws@KL.SRI.COM (Ken Laws) (09/03/87)

The current networks will generally fail to recognize shifted patterns.
All of the recognition networks I have seen (including the optical
implementations) correlate the image with a set of templates and then
use a winner-take-all subnetwork or a feedback enhancement to select
the best-matching template.  Vision researchers were doing this kind
of matching (for character recognition, with the character known to
be centered in the visual field) back in the 50s and early 60s.  Position
independence was then added by convolving the image and template, essentially
performing the match at every possible shift.  This was rather expensive,
so Fourier, Hough, and hierarchical matching techniques were introduced.
Then came edge detection, shape description, and many other paradigms.
We don't have all the answers yet, but we've come a long way from the
type of matching currently implemented in neural networks.

The advantage of the networks, particularly those implemented in analog
hardware, is speed.  IF you have a problem for which alignment is known,
or IF you have time or hardware to try all possible alignments, or IF
your network is complex enough to store all templates at a sufficient
number of shifts, neural networks may be able to give you an off-the-shelf 
recognizer that bypasses the need to research all of the pattern recognition
literature of the last decade.

I suspect that the above conditions will actually hold in a fair number
of engineering situations.  Indeed, many of these applications have already
been identified by the signal processing community.  Neural networks offer
a trainable alternative to DSP or acoustic convolution chips.  Where rules
and explanations are appropriate, designers will use expert systems; otherwise
they will neural networks and similar systems.  Only the most difficult
and important applications will require development of customized reasoning
systems such as numerical or object-oriented simulations.

					-- Ken
-------

mikek@boulder.UUCP (Mike Kranzdorf) (09/04/87)

	The second reference above is correct, but fails to mention work
by Fukishima and Mozer.  These multi-layer networks are able to form
an internal distributed representation of a pattern on an input retina.
They demonstrate very good shift and scale invariance.  The new and
improved neocognitron (Fukishima) can even recognize multiple patterns
on the retina.

--mike					mikek@boulder.colorado.edu

maiden@SDCSVAX.UCSD.EDU (VLSI Layout Project) (09/07/87)

In article <12331701930.42.LAWS@KL.SRI.Com> AIList-Request@SRI.COM writes:
>The current networks will generally fail to recognize shifted patterns.
>All of the recognition networks I have seen (including the optical
>implementations) correlate the image with a set of templates and then
>use a winner-take-all subnetwork or a feedback enhancement to select
>the best-matching template.  
[some lines deleted]
>					-- Ken
>-------

There are a number of networks that will recognize shifts in position.
Among them are optical implementations (see SPIE by Psaltis at CalTech)
and the Neocognitron (Biol. Cybern. by Fukushima).  The first neocognitron
article dates to 1978, the latest article is 1987.  There have been a
number of improvements, including shifts in attention.

 Edward K. Y. Jung
 ------------------------------------------------------------------------
 1. If the answer to life, the universe and everything is "42"...
 2. And if the question is "what is six times nine"...
 3. Then God must have 13 fingers.
 ------------------------------------------------------------------------
 UUCP: {seismo|decwrl}!sdcsvax!maiden     ARPA: maiden@sdcsvax.ucsd.edu

sandon@dartmouth.EDU (Peter Sandon) (09/12/87)

I did not read the Byte article either. However, assuming that
the network under discussion had no way to represent the similarity
relationship among different nodes that represent translated
versions of the same feature, it is not surprising that it would
have a difficult time generalizing from a given pattern to
an 'unaligned' version of that pattern.

Rumelhart pointed out to Banks that what is needed are many sets
of units having similar weight patterns, that is, weights that
are sensitive to translated versions of a given pattern. In addition,
the relationship between these similar units must be represented.
Rumelhart suggests adding units as needed but does not mention how
to relate these additional units to the trained unit. Fukushima did
something similar in his Neocognitron, by broadcasting a learned 
weight set to an entire layer of units which were then all connected 
to an OR unit. This OR unit then represented the fact that all the
units represented the same feature, modulo translation. Of course,
broadcasting weights requires more global control than many would
like, and the OR is not quite the relation we want for patterns of
any complexity.

In 1981, Hinton suggested a means of separately representing shape and
translation in a network, such that 'unaligned' patterns could be
recognized. In my thesis, I implemented a modified version of that
network scheme, in order to demonstrate that a network can generalize
object recognition across translation. The network that I implemented
is five layers deep, which proved too much for standard backpropagation
(the generalized delta rule) and for my extensions to the GDR.
However, generalization across translation can be demonstrated in
a subnetwork of this network. I am working on further improvements
to backpropagation that will allow the entire network to be trained.

It is important to recognize that there are many useless 
generalizations that might be made, and a few useful ones. The
Hamming distance between two 'T's that are offset from one another
is much greater than that between a 'T' and a 'C' that is offset such
that it overlaps much of the 'T'. What is the 'correct' generalization
to be made when trying to classify these patterns? In order to get
the desired generalization, the network must be biased toward
developing representations in which the Hamming distances (of the
intermediate representations) between within-class patterns is
small compared to that between other patterns. Generalization based
on similarity will then be appropriate. Without such biases, 'good'
generalization would be quite surprising.

--Pete Sandon