rick@cs.arizona.edu (Rick Schlichting) (12/29/90)
[Dr. David Kahaner is a numerical analyst visiting Japan for two-years
under the auspices of the Office of Naval Research-Asia (ONR/Asia).
The following is the professional opinion of David Kahaner and in no
way has the blessing of the US Government or any agency of it. All
information is dated and of limited life time. This disclaimer should
be noted on ANY attribution.]
[Copies of previous reports written by Kahaner can be obtained from
host cs.arizona.edu using anonymous FTP.]
To: Distribution
From: David K. Kahaner, ONR Asia [kahaner@xroads.cc.u-tokyo.ac.jp]
David Haussler, Univ of Calif, Santa Cruz [haussler@saturn.ucsc.edu]
Re: First International Workshop on Algorithmic Learning Theory, Tokyo,
8-10 October 1990.
26 December 1990
ABSTRACT. The First International Workshop on Algorithmic Learning
Theory (ALT '90) took place in Tokyo October 8-10, 1990, attended by 120
researchers. This report summarizes the major papers discussed there.
INTRODUCTION.
Algorithmic Learning Theory is the theoretical portion of Artificaial
Intelligence that is concerned with maching learning.
Professor David Haussler
Department of Computer Science
University of California
Santa Cruz, CA 95064
Email: haussler@saturn.ucsc.edu
was invited to participate in the International Workshop held recently in
Tokyo. His detailed summary is given below. Interestingly, the paper
that Haussler felt was the best in the Workshop, by S. Amari, is similar
to another of Amari's that I was very impressed with earlier, see report
[fuzzy, 24 August 1990]. Amari was also the organizing chair of the New
Information Processing Technologies Workshop held in December 1990,
[nipt.90, 26 Dec 1990].
The First International Workshop on Algorithmic Learning Theory (ALT '90)
took place in Tokyo October 8-10, 1990. The workshop was sponsored by the
Japanese Society for Artificial Intelligence, with cooperation from ACM's
SIGACT and SIGART groups, the IEEE Computer Society, the Information
Processing Society of Japan, the Institute for New Generation Computer
Technology (ICOT), and many other organizations, and with financial
support from twelve Japanese corporations and foundations, including
Fujitsu, Hitachi, IBM Japan, Mitsubishi, NEC, NTT and Toshiba. It is
anticipated that the meeting will be held every other year in the future.
This year approximately 120 researchers attended the meeting from all
over the world. A complete listing of the papers presented and author's
addresses is given in the appendix to this report.
The term "algorithmic learning theory" is similar to the term
"computational learning theory" used for the annual workshops on this
subject held in the United States (the COLT workshops). So far there
have been three COLT workshops (MIT '88, UC Santa Cruz '89, and Rochester
'90), sponsored by ACM SIGACT and SIGART groups, and originally funded in
part by grants from ONR and NSF. ONR may also help fund a fourth COLT
workshop, to be held Aug. 5-7, 1991, again at UC Santa Cruz. The field
of algorithmic/computational learning theory has its roots primarily in
artificial intelligence work in machine learning, including work in
statistical pattern recognition and classification in the early 60's and
the recent work on learning in neural networks, and in E.M. Gold's work
on inductive inference of formal languages. Computational learning
theory, as practiced by the researchers that attend the ALT and COLT
meetings, represents a major theoretical component of AI learning work.
Within computational learning theory there are three major established
lines of research, and a number of newer alternative approaches and
ideas. These approaches were all represented at the ALT meeting. The
oldest line of investigation is often called "inductive inference" or
"learning in the limit". The papers of M. Fulk and S. Jain, K. Jantke,
J. Case et al., S. Lange and R. Wiehagen, P. Garcia et al., T. Shinohara,
and A. Togashi and S. Noguchi at ALT present new results in this line of
research.
The inductive inference approach grew out of recursive function and
computability theory, and is still dominated by methods from these
fields. To illustrate the type of problem considered in inductive
inference work, consider the problem of designing an inference machine to
learn a language in the manner that a child learns, i.e. not by being
told what the grammar of the language is but rather by hearing
grammatical sentences. In an abstract sense we can view grammatical
sentences in the natural language as finite strings of symbols over a
finite formal alphabet (here the formal alphabet is actually the set of
words in the natural language.) The (usually infinite) set of strings
over this formal alphabet that correspond to possible sentences in the
natural language constitutes a formal language.
The inference machine must learn which of the infinitely many possible
strings of symbols over the formal alphabet are in the formal language,
and which strings are not. The only information available is a sequence
of examples. In the simplest case, each example is simply a string in the
formal language. In the more general case, there are two types of
examples, strings that are in the language (positive examples) and
strings that are not (negative examples). In either case, it is assumed
that the sequence of examples given to the inference machine includes all
possible examples. The inference machine is successful in learning the
language if no matter what order these examples are presented in,
sometime during the presentation the machine makes the correct
hypothesis, and stays with this hypothesis from that point on. Here the
hypothesis is some correct representation, perhaps a grammar, for the
formal language that is being learned. It is said that the machine is
able to learn an entire class of formal languages if it is always
successful on each language in this class. The same setup can be used to
investigate inference machines that are able to learn classes of subsets
of the integers, or indeed classes of functions on arbitrary countable
domains. (A formal language can be viewed as a function on the set of
all strings that takes the value 1 on strings in the language and 0 on
strings not in the language.)
This approach has been called "learning in the limit" because there is no
constraint placed on the number of examples needed before the correct
hypothesis is discovered. It only has to happen sometime "in the limit".
Neither is any constraint placed on the computational complexity (e.g.
time complexity) of the procedures used by the inference machine; they
need only be computable. This has made results from this model of limited
use in practice. The two other established lines of research in
computational learning theory attempt to remedy this situation. One of
these is called "exact identification in polynomial time". Here the
inference machine (or "learning algorithm" as it is usually called) is
allowed to make queries about specific examples and hypotheses of its own
choosing. Further, the learning algorithm is required to terminate after
making only polynomially many such queries, and to use only polynomially
bounded computation time. These polynomials are defined in terms of the
complexity of the function that is being learned (called the target
function). This represents a more "active" approach to learning, as well
as being more practical when it comes to actual implementation. Many
papers at ALT '90 presented new results in this line of research,
including the papers of N. Abe, S. Lange and R. Wiehagen, Y. Takada, Y.
Takada et al.
The third established approach in computational learning theory is the
Probably Approximately Correct (PAC) model. Here examples are presented
randomly to the learning algorithm, and the algorithm is not required to
learn the target function exactly, but only to find a hypothesis that is
a good approximation of it with high probability. A good approximation
means a hypothesis that gives the same value as the target function on
most randomly drawn examples. Of course, these definitions depend on the
probability distribution that is used to generate examples. Usually, it
is assumed that when the target function is f, examples are generated by
drawing instances from the domain of f independently at random according
to some arbitrary distribution, and then labeling each instance x with
the value f(x). Thus the learning algorithm receives a training sample
(x1,f(x1)), ..., (xm,f(xm)), where the xi's are drawn independently at
random, and must produce a hypothesis that is a good approximation of f
(with high probability). The learning algorithm is required to work for
any distribution that may be used to generate the instances x1, ..., xm
in the training sample. Moreover, it is required that the number m of
examples and the computation time be polynomial in the complexity of the
target function, as in the model of exact identification in polynomial
time.
The PAC model is often applied to the problem of designing and analysing
algorithms that learn Boolean functions, although it can also be used
when more general classes of functions are needed. The model
incorporates some of the perspective of statistical decision theory into
framework of inductive inference, as well as adding computational
complexity considerations. Of the three established lines of research in
computational learning theory, the PAC model has gained the most
attention by practitioners of machine learning. However, many
practitioners have been unsatisfied with the model. One reason is that
not enough of statistical decision theory was incorporated in the model.
In particular, the model was originally defined only for learning
functions that take two values, and it was assumed that there was no
noise in the training examples. Since then various noise models have been
added (see the paper of Y. Sakakibara), but only recently was the model
generalized to cover the important problems of regression and density
estimation, as well as function approximation. These generalizations
were described in my paper at ALT, and in part in N. Cesa-Bianchi's
paper. A decision theory approach to learning was also described in the
excellent paper of S. Amari, entitled "Mathematical Theory of Neural
Learning." This paper analyses the dynamics of learning in neural
networks, providing a beautiful and rich perspective. It is, in my
opinion, the most important paper presented at ALT '90.
In addition to other work on neural networks (in the papers of N.
Namatame and R. Oka), and extensions or relatives of PAC learning (papers
of J. Kivinen, and A. Shinohara and S. Miyano), a number of other
approaches to machine learning, beyond the three established lines of
work discussed above, were discussed at ALT. These include learning and
reasoning by analogy (papers of B. Shekar et al., T. Unemi, M. Harao, J.
Arima, B. Indurkhya, and R. Orihara et al.), important new work on
learning methods in logic programming (papers of S. Muggleton, S.
Muggleton and C. Feng, M. Hagiya, and S. Liu and M. Hagiya), and formal
work on explanation-based generalization (papers of P. Laird and E.
Gamble, and T. Tanaka). This great variety of work kept the meeting
lively and the discussions animated.
Overall, I would say that the meeting was very successful. I believe that
the field of computational learning theory is still in the process of
searching for the right foundation, and that none of the frameworks
proposed so far are nearly adequate. I see the great variety of new
formal models of learning that are being proposed as a healthy sign. The
worst thing that could happen now would be to have the field ossify into
one of the three established and inadequate lines of research discussed
above. As most of my work has been in the PAC model, it is perhaps no
surprise that I feel that of the three models, this model provides the
best starting point from which to consider modified theories that will
enrich our ability to formally capture the learning process. I believe
also that much of the impetus for these revisions in computational
learning theory will come from the important applied learning work that
is currently being done in the field of neural networks. This fabulously
rich, interdisciplinary field may very quickly change our fundamental
notions about the learning process. It will pay us to monitor it
carefully.
--------------------------END OF REPORT---------------------------------