rick@cs.arizona.edu (Rick Schlichting) (12/29/90)
[Dr. David Kahaner is a numerical analyst visiting Japan for two-years under the auspices of the Office of Naval Research-Asia (ONR/Asia). The following is the professional opinion of David Kahaner and in no way has the blessing of the US Government or any agency of it. All information is dated and of limited life time. This disclaimer should be noted on ANY attribution.] [Copies of previous reports written by Kahaner can be obtained from host cs.arizona.edu using anonymous FTP.] To: Distribution From: David K. Kahaner, ONR Asia [kahaner@xroads.cc.u-tokyo.ac.jp] David Haussler, Univ of Calif, Santa Cruz [haussler@saturn.ucsc.edu] Re: First International Workshop on Algorithmic Learning Theory, Tokyo, 8-10 October 1990. 26 December 1990 ABSTRACT. The First International Workshop on Algorithmic Learning Theory (ALT '90) took place in Tokyo October 8-10, 1990, attended by 120 researchers. This report summarizes the major papers discussed there. INTRODUCTION. Algorithmic Learning Theory is the theoretical portion of Artificaial Intelligence that is concerned with maching learning. Professor David Haussler Department of Computer Science University of California Santa Cruz, CA 95064 Email: haussler@saturn.ucsc.edu was invited to participate in the International Workshop held recently in Tokyo. His detailed summary is given below. Interestingly, the paper that Haussler felt was the best in the Workshop, by S. Amari, is similar to another of Amari's that I was very impressed with earlier, see report [fuzzy, 24 August 1990]. Amari was also the organizing chair of the New Information Processing Technologies Workshop held in December 1990, [nipt.90, 26 Dec 1990]. The First International Workshop on Algorithmic Learning Theory (ALT '90) took place in Tokyo October 8-10, 1990. The workshop was sponsored by the Japanese Society for Artificial Intelligence, with cooperation from ACM's SIGACT and SIGART groups, the IEEE Computer Society, the Information Processing Society of Japan, the Institute for New Generation Computer Technology (ICOT), and many other organizations, and with financial support from twelve Japanese corporations and foundations, including Fujitsu, Hitachi, IBM Japan, Mitsubishi, NEC, NTT and Toshiba. It is anticipated that the meeting will be held every other year in the future. This year approximately 120 researchers attended the meeting from all over the world. A complete listing of the papers presented and author's addresses is given in the appendix to this report. The term "algorithmic learning theory" is similar to the term "computational learning theory" used for the annual workshops on this subject held in the United States (the COLT workshops). So far there have been three COLT workshops (MIT '88, UC Santa Cruz '89, and Rochester '90), sponsored by ACM SIGACT and SIGART groups, and originally funded in part by grants from ONR and NSF. ONR may also help fund a fourth COLT workshop, to be held Aug. 5-7, 1991, again at UC Santa Cruz. The field of algorithmic/computational learning theory has its roots primarily in artificial intelligence work in machine learning, including work in statistical pattern recognition and classification in the early 60's and the recent work on learning in neural networks, and in E.M. Gold's work on inductive inference of formal languages. Computational learning theory, as practiced by the researchers that attend the ALT and COLT meetings, represents a major theoretical component of AI learning work. Within computational learning theory there are three major established lines of research, and a number of newer alternative approaches and ideas. These approaches were all represented at the ALT meeting. The oldest line of investigation is often called "inductive inference" or "learning in the limit". The papers of M. Fulk and S. Jain, K. Jantke, J. Case et al., S. Lange and R. Wiehagen, P. Garcia et al., T. Shinohara, and A. Togashi and S. Noguchi at ALT present new results in this line of research. The inductive inference approach grew out of recursive function and computability theory, and is still dominated by methods from these fields. To illustrate the type of problem considered in inductive inference work, consider the problem of designing an inference machine to learn a language in the manner that a child learns, i.e. not by being told what the grammar of the language is but rather by hearing grammatical sentences. In an abstract sense we can view grammatical sentences in the natural language as finite strings of symbols over a finite formal alphabet (here the formal alphabet is actually the set of words in the natural language.) The (usually infinite) set of strings over this formal alphabet that correspond to possible sentences in the natural language constitutes a formal language. The inference machine must learn which of the infinitely many possible strings of symbols over the formal alphabet are in the formal language, and which strings are not. The only information available is a sequence of examples. In the simplest case, each example is simply a string in the formal language. In the more general case, there are two types of examples, strings that are in the language (positive examples) and strings that are not (negative examples). In either case, it is assumed that the sequence of examples given to the inference machine includes all possible examples. The inference machine is successful in learning the language if no matter what order these examples are presented in, sometime during the presentation the machine makes the correct hypothesis, and stays with this hypothesis from that point on. Here the hypothesis is some correct representation, perhaps a grammar, for the formal language that is being learned. It is said that the machine is able to learn an entire class of formal languages if it is always successful on each language in this class. The same setup can be used to investigate inference machines that are able to learn classes of subsets of the integers, or indeed classes of functions on arbitrary countable domains. (A formal language can be viewed as a function on the set of all strings that takes the value 1 on strings in the language and 0 on strings not in the language.) This approach has been called "learning in the limit" because there is no constraint placed on the number of examples needed before the correct hypothesis is discovered. It only has to happen sometime "in the limit". Neither is any constraint placed on the computational complexity (e.g. time complexity) of the procedures used by the inference machine; they need only be computable. This has made results from this model of limited use in practice. The two other established lines of research in computational learning theory attempt to remedy this situation. One of these is called "exact identification in polynomial time". Here the inference machine (or "learning algorithm" as it is usually called) is allowed to make queries about specific examples and hypotheses of its own choosing. Further, the learning algorithm is required to terminate after making only polynomially many such queries, and to use only polynomially bounded computation time. These polynomials are defined in terms of the complexity of the function that is being learned (called the target function). This represents a more "active" approach to learning, as well as being more practical when it comes to actual implementation. Many papers at ALT '90 presented new results in this line of research, including the papers of N. Abe, S. Lange and R. Wiehagen, Y. Takada, Y. Takada et al. The third established approach in computational learning theory is the Probably Approximately Correct (PAC) model. Here examples are presented randomly to the learning algorithm, and the algorithm is not required to learn the target function exactly, but only to find a hypothesis that is a good approximation of it with high probability. A good approximation means a hypothesis that gives the same value as the target function on most randomly drawn examples. Of course, these definitions depend on the probability distribution that is used to generate examples. Usually, it is assumed that when the target function is f, examples are generated by drawing instances from the domain of f independently at random according to some arbitrary distribution, and then labeling each instance x with the value f(x). Thus the learning algorithm receives a training sample (x1,f(x1)), ..., (xm,f(xm)), where the xi's are drawn independently at random, and must produce a hypothesis that is a good approximation of f (with high probability). The learning algorithm is required to work for any distribution that may be used to generate the instances x1, ..., xm in the training sample. Moreover, it is required that the number m of examples and the computation time be polynomial in the complexity of the target function, as in the model of exact identification in polynomial time. The PAC model is often applied to the problem of designing and analysing algorithms that learn Boolean functions, although it can also be used when more general classes of functions are needed. The model incorporates some of the perspective of statistical decision theory into framework of inductive inference, as well as adding computational complexity considerations. Of the three established lines of research in computational learning theory, the PAC model has gained the most attention by practitioners of machine learning. However, many practitioners have been unsatisfied with the model. One reason is that not enough of statistical decision theory was incorporated in the model. In particular, the model was originally defined only for learning functions that take two values, and it was assumed that there was no noise in the training examples. Since then various noise models have been added (see the paper of Y. Sakakibara), but only recently was the model generalized to cover the important problems of regression and density estimation, as well as function approximation. These generalizations were described in my paper at ALT, and in part in N. Cesa-Bianchi's paper. A decision theory approach to learning was also described in the excellent paper of S. Amari, entitled "Mathematical Theory of Neural Learning." This paper analyses the dynamics of learning in neural networks, providing a beautiful and rich perspective. It is, in my opinion, the most important paper presented at ALT '90. In addition to other work on neural networks (in the papers of N. Namatame and R. Oka), and extensions or relatives of PAC learning (papers of J. Kivinen, and A. Shinohara and S. Miyano), a number of other approaches to machine learning, beyond the three established lines of work discussed above, were discussed at ALT. These include learning and reasoning by analogy (papers of B. Shekar et al., T. Unemi, M. Harao, J. Arima, B. Indurkhya, and R. Orihara et al.), important new work on learning methods in logic programming (papers of S. Muggleton, S. Muggleton and C. Feng, M. Hagiya, and S. Liu and M. Hagiya), and formal work on explanation-based generalization (papers of P. Laird and E. Gamble, and T. Tanaka). This great variety of work kept the meeting lively and the discussions animated. Overall, I would say that the meeting was very successful. I believe that the field of computational learning theory is still in the process of searching for the right foundation, and that none of the frameworks proposed so far are nearly adequate. I see the great variety of new formal models of learning that are being proposed as a healthy sign. The worst thing that could happen now would be to have the field ossify into one of the three established and inadequate lines of research discussed above. As most of my work has been in the PAC model, it is perhaps no surprise that I feel that of the three models, this model provides the best starting point from which to consider modified theories that will enrich our ability to formally capture the learning process. I believe also that much of the impetus for these revisions in computational learning theory will come from the important applied learning work that is currently being done in the field of neural networks. This fabulously rich, interdisciplinary field may very quickly change our fundamental notions about the learning process. It will pay us to monitor it carefully. --------------------------END OF REPORT---------------------------------