[comp.research.japan] Kahaner Report: Workshop on Alg. Learning Theory, Tokyo

rick@cs.arizona.edu (Rick Schlichting) (12/29/90)

  [Dr. David Kahaner is a numerical analyst visiting Japan for two-years
   under the auspices of the Office of Naval Research-Asia (ONR/Asia).  
   The following is the professional opinion of David Kahaner and in no 
   way has the blessing of the US Government or any agency of it.  All 
   information is dated and of limited life time.  This disclaimer should 
   be noted on ANY attribution.]

  [Copies of previous reports written by Kahaner can be obtained from
   host cs.arizona.edu using anonymous FTP.]

To: Distribution
From: David K. Kahaner, ONR Asia [kahaner@xroads.cc.u-tokyo.ac.jp]
      David Haussler, Univ of Calif, Santa Cruz  [haussler@saturn.ucsc.edu]
Re: First International Workshop on Algorithmic Learning Theory, Tokyo,
             8-10 October 1990.
26 December 1990


ABSTRACT. The First International Workshop on Algorithmic Learning
Theory (ALT '90) took place in Tokyo October 8-10, 1990, attended by 120
researchers. This report summarizes the major papers discussed there.

INTRODUCTION. 
Algorithmic Learning Theory is the theoretical portion of Artificaial
Intelligence that is concerned with maching learning. 
         Professor David Haussler
         Department of Computer Science
         University of California
         Santa Cruz, CA 95064
           Email: haussler@saturn.ucsc.edu
was invited to participate in the International Workshop held recently in 
Tokyo. His detailed summary is given below.  Interestingly, the paper 
that Haussler felt was the best in the Workshop, by S. Amari, is similar 
to another of Amari's that I was very impressed with earlier, see report 
[fuzzy, 24 August 1990]. Amari was also the organizing chair of the New 
Information Processing Technologies Workshop held in December 1990, 
[nipt.90, 26 Dec 1990].  


The First International Workshop on Algorithmic Learning Theory (ALT '90) 
took place in Tokyo October 8-10, 1990. The workshop was sponsored by the 
Japanese Society for Artificial Intelligence, with cooperation from ACM's 
SIGACT and SIGART groups, the IEEE Computer Society, the Information 
Processing Society of Japan, the Institute for New Generation Computer 
Technology (ICOT), and many other organizations, and with financial 
support from twelve Japanese corporations and foundations, including 
Fujitsu, Hitachi, IBM Japan, Mitsubishi, NEC, NTT and Toshiba.  It is 
anticipated that the meeting will be held every other year in the future.  
This year approximately 120 researchers attended the meeting from all 
over the world.  A complete listing of the papers presented and author's 
addresses is given in the appendix to this report.  

The term "algorithmic learning theory" is similar to the term 
"computational learning theory" used for the annual workshops on this 
subject held in the United States (the COLT workshops).  So far there 
have been three COLT workshops (MIT '88, UC Santa Cruz '89, and Rochester 
'90), sponsored by ACM SIGACT and SIGART groups, and originally funded in 
part by grants from ONR and NSF. ONR may also help fund a fourth COLT 
workshop, to be held Aug. 5-7, 1991, again at UC Santa Cruz.  The field 
of algorithmic/computational learning theory has its roots primarily in 
artificial intelligence work in machine learning, including work in 
statistical pattern recognition and classification in the early 60's and 
the recent work on learning in neural networks, and in E.M. Gold's work 
on inductive inference of formal languages.  Computational learning 
theory, as practiced by the researchers that attend the ALT and COLT 
meetings, represents a major theoretical component of AI learning work.  

Within computational learning theory there are three major established 
lines of research, and a number of newer alternative approaches and 
ideas.  These approaches were all represented at the ALT meeting.  The 
oldest line of investigation is often called "inductive inference" or 
"learning in the limit".  The papers of M. Fulk and S. Jain, K. Jantke, 
J. Case et al., S. Lange and R. Wiehagen, P. Garcia et al., T. Shinohara, 
and A. Togashi and S.  Noguchi at ALT present new results in this line of 
research.  

The inductive inference approach grew out of recursive function and 
computability theory, and is still dominated by methods from these 
fields. To illustrate the type of problem considered in inductive 
inference work, consider the problem of designing an inference machine to 
learn a language in the manner that a child learns, i.e. not by being 
told what the grammar of the language is but rather by hearing 
grammatical sentences. In an abstract sense we can view grammatical 
sentences in the natural language as finite strings of symbols over a 
finite formal alphabet (here the formal alphabet is actually the set of 
words in the natural language.) The (usually infinite) set of strings 
over this formal alphabet that correspond to possible sentences in the 
natural language constitutes a formal language.  

The inference machine must learn which of the infinitely many possible 
strings of symbols over the formal alphabet are in the formal language, 
and which strings are not. The only information available is a sequence 
of examples. In the simplest case, each example is simply a string in the 
formal language. In the more general case, there are two types of 
examples, strings that are in the language (positive examples) and 
strings that are not (negative examples). In either case, it is assumed 
that the sequence of examples given to the inference machine includes all 
possible examples.  The inference machine is successful in learning the 
language if no matter what order these examples are presented in, 
sometime during the presentation the machine makes the correct 
hypothesis, and stays with this hypothesis from that point on.  Here the 
hypothesis is some correct representation, perhaps a grammar, for the 
formal language that is being learned. It is said that the machine is 
able to learn an entire class of formal languages if it is always 
successful on each language in this class.  The same setup can be used to 
investigate inference machines that are able to learn classes of subsets 
of the integers, or indeed classes of functions on arbitrary countable 
domains.  (A formal language can be viewed as a function on the set of 
all strings that takes the value 1 on strings in the language and 0 on 
strings not in the language.) 

This approach has been called "learning in the limit" because there is no 
constraint placed on the number of examples needed before the correct 
hypothesis is discovered. It only has to happen sometime "in the limit".  
Neither is any constraint placed on the computational complexity (e.g. 
time complexity) of the procedures used by the inference machine; they 
need only be computable. This has made results from this model of limited 
use in practice. The two other established lines of research in 
computational learning theory attempt to remedy this situation. One of 
these is called "exact identification in polynomial time". Here the 
inference machine (or "learning algorithm" as it is usually called) is 
allowed to make queries about specific examples and hypotheses of its own 
choosing. Further, the learning algorithm is required to terminate after 
making only polynomially many such queries, and to use only polynomially 
bounded computation time. These polynomials are defined in terms of the 
complexity of the function that is being learned (called the target 
function).  This represents a more "active" approach to learning, as well 
as being more practical when it comes to actual implementation. Many 
papers at ALT '90 presented new results in this line of research, 
including the papers of N. Abe, S. Lange and R. Wiehagen, Y. Takada, Y. 
Takada et al.  

The third established approach in computational learning theory is the 
Probably Approximately Correct (PAC) model. Here examples are presented 
randomly to the learning algorithm, and the algorithm is not required to 
learn the target function exactly, but only to find a hypothesis that is 
a good approximation of it with high probability. A good approximation 
means a hypothesis that gives the same value as the target function on 
most randomly drawn examples. Of course, these definitions depend on the 
probability distribution that is used to generate examples. Usually, it 
is assumed that when the target function is f, examples are generated by 
drawing instances from the domain of f independently at random according 
to some arbitrary distribution, and then labeling each instance x with 
the value f(x).  Thus the learning algorithm receives a training sample 
(x1,f(x1)), ..., (xm,f(xm)), where the xi's are drawn independently at 
random, and must produce a hypothesis that is a good approximation of f 
(with high probability).  The learning algorithm is required to work for 
any distribution that may be used to generate the instances x1, ..., xm 
in the training sample.  Moreover, it is required that the number m of 
examples and the computation time be polynomial in the complexity of the 
target function, as in the model of exact identification in polynomial 
time.  

The PAC model is often applied to the problem of designing and analysing 
algorithms that learn Boolean functions, although it can also be used 
when more general classes of functions are needed.  The model 
incorporates some of the perspective of statistical decision theory into 
framework of inductive inference, as well as adding computational 
complexity considerations. Of the three established lines of research in 
computational learning theory, the PAC model has gained the most 
attention by practitioners of machine learning.  However, many 
practitioners have been unsatisfied with the model. One reason is that 
not enough of statistical decision theory was incorporated in the model.  
In particular, the model was originally defined only for learning 
functions that take two values, and it was assumed that there was no 
noise in the training examples. Since then various noise models have been 
added (see the paper of Y. Sakakibara), but only recently was the model 
generalized to cover the important problems of regression and density 
estimation, as well as function approximation.  These generalizations 
were described in my paper at ALT, and in part in N. Cesa-Bianchi's 
paper.  A decision theory approach to learning was also described in the 
excellent paper of S. Amari, entitled "Mathematical Theory of Neural 
Learning." This paper analyses the dynamics of learning in neural 
networks, providing a beautiful and rich perspective. It is, in my 
opinion, the most important paper presented at ALT '90.  

In addition to other work on neural networks (in the papers of N. 
Namatame and R. Oka), and extensions or relatives of PAC learning (papers 
of J. Kivinen, and A. Shinohara and S. Miyano), a number of other 
approaches to machine learning, beyond the three established lines of 
work discussed above, were discussed at ALT.  These include learning and 
reasoning by analogy (papers of B. Shekar et al., T. Unemi, M. Harao, J. 
Arima, B. Indurkhya, and R. Orihara et al.), important new work on 
learning methods in logic programming (papers of S. Muggleton, S. 
Muggleton and C. Feng, M. Hagiya, and S. Liu and M. Hagiya), and formal 
work on explanation-based generalization (papers of P. Laird and E. 
Gamble, and T. Tanaka).  This great variety of work kept the meeting 
lively and the discussions animated.  

Overall, I would say that the meeting was very successful. I believe that 
the field of computational learning theory is still in the process of 
searching for the right foundation, and that none of the frameworks 
proposed so far are nearly adequate.  I see the great variety of new 
formal models of learning that are being proposed as a healthy sign. The 
worst thing that could happen now would be to have the field ossify into 
one of the three established and inadequate lines of research discussed 
above. As most of my work has been in the PAC model, it is perhaps no 
surprise that I feel that of the three models, this model provides the 
best starting point from which to consider modified theories that will 
enrich our ability to formally capture the learning process. I believe 
also that much of the impetus for these revisions in computational 
learning theory will come from the important applied learning work that 
is currently being done in the field of neural networks. This fabulously 
rich, interdisciplinary field may very quickly change our fundamental 
notions about the learning process. It will pay us to monitor it 
carefully.  

--------------------------END OF REPORT---------------------------------