[mod.ai] Seminar - Learning when Irrelevant Variables Abound

HADDAD@SUSHI.STANFORD.EDU.UUCP (01/28/87)
Next BATS will be at IBM Almaden Research Center on Friday, February 13.

Following is a preliminary schedule:

 9:45 - 10:00   Coffee + +

10:00 - 11:00  "Algebraic Methods in the Theory of Lower Bounds
               for Boolean Circuit Complexity"
               Norman Smolensky, U.C. Berkeley.

11:00 - 12:00  " The Decision Problem for the Probabilities
                 of Higher-Order Properties".
                 Phokion Kolaitis, IBM Almaden.

 1:00 -  2:00   "Learning When Irrelevant Features Abound"
                 Nick Littlestone, U.C. Santa Cruz.

 2:00 - 3:00     "Fast Parallel Algorithms for Chordal Graphs"
                 Alejandro A. Schaffer, Stanford University.

===================================================================

              "Learning When Irrelevant Features Abound"

                         Nick Littlestone
                         U.C. Santa Cruz

Valiant and others have studied the problem of learning various classes
of Boolean functions from examples.  Here we discuss on-line learning of
these functions.  In on-line learning, the learner responds to each
example according to a current hypothesis.  Then the learner updates the
hypothesis, if necessary, based on the correct classification of the
example.  This is the form of the Perceptron learning algorithms, in
which updates to the weights occur after each mistake.  One natural
measure of the quality of learning in the on-line setting is the number
of mistakes the learner makes.  For suitable classes of functions,
on-line learning algorithms are available which make a bounded number of
mistakes, with the bound independent of the number of examples seen by
the learner.  We present one such algorithm, which learns disjunctive
Boolean functions.  The algorithm can be expressed as a linear-threshold
algorithm.  If the examples include a large number of irrelevant
variables, the algorithm does very well, the number of mistakes
depending only logarithmically on the number of irrelevant variables.
More specifically, if the function being learned is of the form $f ( x
sub 1 ,..., x sub n )~=~x sub {i sub 1} orsign ... orsign x sub {i sub
k} then the mistake bound is $O ( k log n )$.  If $k = O ( log n )$ then
this bound is significantly better than that given by the Perceptron
convergence theorem.