gerrit@uiag.UUCP (Cap Gerrit) (10/05/88)
Does anybody out there in the world has an implementation of the ID3 algorithm of Quinlan ?? Or (not exclusive) of the CLS algorithm of Hunt which is being used in ID3. A very good description of these algo's would also being appreciated ! We have developped a theoretical framework of a concept learning algo and as an example I have a model in our framework which should be a simulation of this ID3 algo. Unfortunately I didn't found much more than a vage description of ID3 in various books and papers. What I want to do is to make an implemantation of our framework and testing the simulation against the original ID3. It would be fine if the implementation was in C or in Prolog but other languages would do also. Gerrit Cap Dep. of Mathematics and Computer Science University of Antwerp Belgium (Europe) mail address : ...!prlb2!uiag!gerrit I'm sorry if this posting is a bit unusual to other news readers but this is my first posting. Also I want to appologise for my bad English and the limited mail address.
mcguire@antares.aero.org (Rod McGuire) (10/15/88)
In article <395@uiag.UUCP> gerrit@uiag.UUCP (Cap Gerrit) writes: >Does anybody out there in the world has an implementation of >the ID3 algorithm of Quinlan Prolog is one of the worst possible languages in which to write the ID3 algorithm. I don't know if it can be written to be efficient. The problem is that one needs to count up statistics in parallel and then somewhat randomly access these statistics. While in prolog it is possible to kludge up arrays with mutable elements, I will bet that the overhead of this technique makes a prolog implementation slower by a factor of at least 100 to say a fortran implementation. Since the ID3 algorithm is usually applied to a moderate amount of data (let's say, to construct a tree discriminating 10 classes from analysis of 5000 attribute vectors each with 20 attributes that can take on 1 of 5 values), I think that this performance difference can make prolog implementation unusable. Also, I think any prolog version that strives for efficiency is likely to be ugly and far removed from the functional specification of the algorithm. However I would love to see an analysis that proves me wrong. Below, I present in lisp the central part of the ID3 algorithm - the definition of the metric B(a U) which gives a value for the goodness of attribute "a" as a discriminant for the set of class-attribute-vectors "U". Following that is an array-based implementation for the time consuming parts. Let U be a set of class-attribute-vectors where each element "u" is a "cav" data-structure defined as: (cav-class u) = c, the class determined by vector u. (range 1 to nc) (cav-av u a) = v, the value for attribute a in vector u. (range 1 to nv) ; the metric B for splitting U on attribute "a" is defined as: (defun (B U a) (/ (sum (v 1 nv) ; sum for v=1 to nv (- (sum (c 1 nc) (* (N c a v U) (log (N c a v U)))) (* (S a v U) (log (S a v U))))) (size-of U))) where (S a v U) = number of elements u in U s.t. (cav-av u a) = v and (N c a v U) = number of elements u in U s.t. (cav-av u a) = v & (cav-class u) = c In order to avoid processing the elements in U over and over again, it is reasonable to pre-compute N (as below) and define S in terms of N. (defvar N (make-array (list nc na nv))) (loop for v in U do (loop for a from 1 to na do (increment (aref N (cav-class v) a (cav-av v a)))))
ok@quintus.uucp (Richard A. O'Keefe) (10/16/88)
In article <39500@aero.ARPA> mcguire@aero.aerospace.org (Rod McGuire) writes: >In article <395@uiag.UUCP> gerrit@uiag.UUCP (Cap Gerrit) writes: >>Does anybody out there in the world has an implementation of >>the ID3 algorithm of Quinlan >Prolog is one of the worst possible languages in which to write the >ID3 algorithm. I don't know if it can be written to be efficient. >The problem is that one needs to count up statistics in parallel and >then somewhat randomly access these statistics. While in prolog it >is possible to kludge up arrays with mutable elements, There is not the slightest need for mutable arrays in an implementation of ID3 or similar algorithms, and Prolog is not at all a poor choice. The Iterative Dichotomiser involves two kinds of steps: (1) making a sweep through the training set collecting a random sample of *incorrectly predicted* examples to add to the "window" (this is the "Iterative" part) (2) doing a kind of back-to-front radix sort on the contents of the "window" to turn it into a decision tree (this is the "Dichotomiser" part) >Since the ID3 algorithm is usually applied to a moderate amount of >data (let's say, to construct a tree discriminating 10 classes from >analysis of 5000 attribute vectors each with 20 attributes that can If you are working with tiny training sets like that, you don't need ID3. Quinlan's innovation was the "windowing" technique -- forming decision trees is nothing new, you will even find a tiny Fortran implementation in Algorithm AS 165, JRSS -- which permitted him to work with training sets having millions of examples. The "windowing" idea can be applied to many induction schemes: {Initialise} set the window to a random sample of N1 examples. {Induce} induce a "rule" from the current window. {Evaluate} make a pass through the complete training set, adding a random sample of N2 examples which are incorrectly classified (if fewer than N2 misclassifications, take all). If performance was adequate, stop. {Iterate} Go back to {Induce}. Two references: "Discovering rules by induction from large collections of examples", Quinlan, J.R. (in) Expert Systems in the micro-electronic age, (Ed. D. Michie) Edinburgh University Press, 1979 "Learning efficient classification procedures", Quinlan, J.R. (in) Machine learning: an artificial intelligence approach Eds Michalski, Carbonell, & Mitchell Tioga press, 1983 The algorithm descriptions in these articles are quite clear.
tgd@mist.cs.orst.edu (Tom Dietterich) (10/18/88)
There is evidence that the windowing feature of ID3 does not provide much benefit. Consult the following paper for details: Wirth, J. and Catlett, J. (1988). Experiments on the costs and benefits of windowing in ID3. In Proceedings of the Fifth International Conference on Machine Learning, Ann Arbor, MI. Available from Morgan-Kaufmann, Inc, Los Altos, CA. 87--99. Here is the abstract: "Quinlan's machine learning system ID3 uses a method called windowing to deal economically with large training sets. This paper describes a series of experiments performed to investigate the merits of this technique. In nearly every experiment, the use of windowing considerably increased the CPU requirements of ID3, but produced no significant benefits. We conclude that in noisy domains (where ID3 is now commonly used), windowing should be avoided." The paper reports several studies involving training sets as large as 20,000 examples. The authors state that if you have the physical memory to store the examples, it is best to avoid windowing. Windowing seems to work best on noise-free training sets where there are many redundant features. These turn out to be rather uncommon although the initial domains in which ID3 was developed had these properties. --Tom Dietterich tgd@cs.orst.edu Department of Computer Science Oregon State University Corvallis, OR 97331