[comp.ai.neural-nets] Which learning algorithm is best for scale/rotation invariant input?

kilmer@hq.af.mil (09/19/90)

I was wondering if anyone had done a study of network paradigms that were
particularly well suited for scale and rotation invariance. My problem 
involves identifying similar input patterns of different scale, such as
a letter 'A' of two different font sizes (ie. 14 or 16 point courier),
or different rotations (i.e. landscape vs. portrait or 23deg angle...).
I created a backprop net that would accept a 8x14 binary input matrix
and created a font within this matrix that the net was to associate to
an 8 bit output (representing the character as a number i.e. ASCII).
I had 94 characters in the font.  The network learned the 94 character
associations fine, and I tested the network with 3-8% noise levels with 
very good success at finding the resulting binary output.  For larger 
fonts, I was simply going to scale down the character to fit into the 8x14
matrix.  This should work...I havn't tryed it yet. As for the rotation
problem, I wasn't sure how to approach this (short of rotating the input
object until I had a positive match). 

Well, while I was working on this I decided to try and approach the problem
from a differnt angle.  I was first going to teach the network different
fonts until I learned as many as I had access to, but wondered whether this
was a dead end issue.  Wouldn't a net that was able to extract the various
features of an object and output what features it has identified within the
object be better than teaching it all fonts.  Specifically extract features
regardless of size, or rotation. I have heard of something known as a neo-
cognitron that was able to correctly identify disproportioned input or 
something like that, but havn't been able to find out any info on it.  Does
anyone out there have any, or doing any research into this area???

I would appreciate any reply. 

Thanks,

Richard 
-- 
.-------------------------------------------------------------------------.
|  Richard Kilmer                           Kilmer@Opsnet-Pentagon.af.mil |
|  VAX Systems Analyst                            (AKA Kilmer@26.24.0.26) |
|    .--->Look to the future --.          "But when hope has gone away    |
|    |                         |           In a night or in a day         |
|    `--- Through the past <---'           In a vision or in none         |
|                                          Is is therefore the less gone?"|
`-------------------------------------------------------------------------'
--
-------------------------------------------------------------------------.
|  Richard Kilmer                           Kilmer@Opsnet-Pentagon.af.mil |
|  VAX Systems Analyst                            (AKA Kilmer@26.24.0.26) |
|    .--->Look to the future --.          "But when hope has gone away    |

schraudo@beowulf.ucsd.edu (Nici Schraudolph) (10/01/90)

One way to solve/circumvent the scale/translation/rotation invariance problem
in visual recognition problems is through appropriate preprocessing of the
inputs.  I've seen an example of this approach at IJCNN'90 (San Diego):

David Casasent and Etienne Barnard, "Adaptive Clustering Neural Net for
Piecewise Nonlinear Discriminant Surfaces", Proc. IJCNN'90, p. I-423
(also, a paper by the same authors in IJCNN'89 (Washington) , p. I-111)

They first perform a 2-D Fourier transform on the image (which gives them
translation invariance), then use input neurons with ring- and wedge-shaped
receptive fields on the transformed image.  The "ring neurons" are scale
sensitive but rotation invariant whereas the "wedge neurons" are rotation
sensitive but scale invariant.  The right mix of these may provide a good
feature space for this kind of recognition task.

-- 
Nicol N. Schraudolph, C-014                      "Big Science, hallelujah.
University of California, San Diego               Big Science, yodellayheehoo."
La Jolla, CA 92093-0114                                     - Laurie Anderson.
                          nici%cs@ucsd.{edu,bitnet,uucp}

reiner@isy.liu.se (Reiner Lenz) (10/01/90)

We studied the problem of invariance in pattern recognition problems in 
top-down and bottom-fashion. 

A)

In the top-down approach you know that you want to recognize patterns
independent of some group of transformations. Using some theory you can
show that the transformation group gives you the desired feature extraction
process. For example: 2-D rotation invariance leads to Fourier transform
in polar coordinates, 3-D rotation invariance leads to surface harmonics,
scale invariance leads to the Mellin transform etc.

Ref.:

@article{Len_jos:89,
	author ="Reiner Lenz",
	title ="A Group Theoretical Model of Feature Extraction", 
	journal=josaa,
	volume="6",
	number="6",
	pages="827-834",
	year = "1989"
}

@book{Len:90ln,
	author= "Reiner Lenz",
	title =  "Group Theoretical Methods in Image Processing",
	publisher = "Springer Verlag",
	series = "Lecture Notes in Computer Science (Vol. 413)",
	address = "Heidelberg, Berlin, New York",
	year = "1990"
}

@article{Len:90,
	author= "Reiner Lenz",
	title =  "Group-Invariant Pattern Recognition",
	journal = "Pattern Recognition",
	volume="23",
	number="1/2",
	pages = "199-218",
	year = "1990"
}

A generalization is the following: 

Not all patterns in a group are equally important. This is the case in
scale invariance; since scaled patterns with very small or very large
scaling factors are not very similar to the original pattern. How the
theory must be modified in this case is described in one of our internal 
reports.

Ref.:

@techreport{Len_prob:91,
	author ="Reiner Lenz",
	title="On probabilistic Invariance",
        institution={Link\"oping University, ISY, S-58183 Link\"oping},
	note="Internal Report",
	year="1991"
}

We also investigated the problem in a bottom-up fashion. We design a 
learning filter system that consists of a fixed number of filter
functions. Then we train this system with examples of the pattern class
that we want to recognize. The learning rule is designed in such a way
that the resulting system produces filter functions with a minimum loss
of information and a maximum amount of concentration of the feature
components. Examples show that this system learns Fourier transformation
from examples of rotated patterns. 

Ref:

@inproceedings{Len:90ijcnn,
	author= {Reiner Lenz and Mats \"Osterberg},
	title =  "Learning Filter Systems",
	booktitle = "Proc. Int. Joint Conference on Neural Networks, San Diego",
	year = "1990"
}
-- 
"Kleinphi macht auch Mist"
Reiner Lenz | Dept. EE.                 |
            | Linkoeping University	| email:	reiner@isy.liu.se
            | S-58183 Linkoeping/Sweden |
--
"Kleinphi macht auch Mist"
Reiner Lenz | Dept. EE.                 |
            | Linkoeping University	| email:	reiner@isy.liu.se
            | S-58183 Linkoeping/Sweden |

manning@nntp-server.caltech.edu (Evan Marshall Manning) (10/01/90)

I have an interesting looking >1 year old preprint here entitled
"Simultaneous Position, Scale, and Rotation Invariant Pattern Classification
using Third-Order Neural Networks".  It's by Max B. Reid, Lilly Spirkovska,
and Ellen Ocoa at the Intelligent Systems Technology Branch, NASA Ames
Research Center, Moffet Field, CA 94035.  The preprint claims it was to be
published in The International Journal of Neural Networks - Research and
Applications.  I gather it should have been printed by now.

I also have a similar article from pages I-689-692 of the proceedings of
the IJCNN June, 1989.

Hope this helps.

-- Evan
***************************************************************************
Your eyes are weary from staring at the CRT for so | Evan M. Manning
long.  You feel sleepy.  Notice how restful it is  |      is
to watch the cursor blink.  Close your eyes.  The  |manning@gap.cco.caltech.edu
opinions stated above are yours.  You cannot       | manning@mars.jpl.nasa.gov
imagine why you ever felt otherwise.               | gleeper@tybalt.caltech.edu

minsky@media-lab.MEDIA.MIT.EDU (Marvin Minsky) (10/02/90)

And be sure to read the original classic -- Pitts and McCulloch 1947,
reprinted in W.S.McCulloch's "Embodiments of Mind," MIT press book.
Although it was from the pre-computer age, it has nice clear
explanation of the relevant invariant Haar measure theory.  I don't
recall any good analysis therein of how to prune off the irrelevant
parts of the group to make things converge.