[comp.ai] Concept Learning & ID3

gerrit@uiag.UUCP (Cap Gerrit) (10/05/88)

Does anybody out there in the world has an implementation of
the ID3 algorithm of Quinlan ?? Or (not exclusive) of the CLS algorithm
of Hunt which is being used in ID3.
A very good description of these algo's would also being appreciated !

We have developped a theoretical framework of a concept learning algo
and as an example I have a model in our framework which should be
a simulation of this ID3 algo.
Unfortunately I didn't found much more than a vage description of ID3
in various books and papers. 
What I want to do is to make an implemantation of our framework and
testing the simulation against the original ID3.

It would be fine if the implementation was in C or in Prolog but
other languages would do also.

Gerrit Cap
Dep. of Mathematics and Computer Science
University of Antwerp
Belgium (Europe)


mail address : ...!prlb2!uiag!gerrit
I'm sorry if this posting is a bit unusual to other news readers but
this is my first posting. Also I want to appologise for my bad English
and the limited mail address.

mcguire@antares.aero.org (Rod McGuire) (10/15/88)

In article <395@uiag.UUCP> gerrit@uiag.UUCP (Cap Gerrit) writes:
>Does anybody out there in the world has an implementation of
>the ID3 algorithm of Quinlan

Prolog is one of the worst possible languages in which to write the
ID3 algorithm. I don't know if it can be written to be efficient.
The problem is that one needs to count up statistics in parallel and
then somewhat randomly access these statistics.  While in prolog it
is possible to kludge up arrays with mutable elements, I will bet
that the overhead of this technique makes a prolog implementation
slower by a factor of at least 100 to say a fortran implementation.
Since the ID3 algorithm is usually applied to a moderate amount of
data (let's say, to construct a tree discriminating 10 classes from
analysis of 5000 attribute vectors each with 20 attributes that can
take on 1 of 5 values), I think that this performance difference can
make prolog implementation unusable. Also, I think any prolog version
that strives for efficiency is likely to be ugly and far removed
from the functional specification of the algorithm.

However I would love to see an analysis that proves me wrong.  Below,
I present in lisp the central part of the ID3 algorithm - the
definition of the metric B(a U) which gives a value for the goodness
of attribute "a" as a discriminant for the set of
class-attribute-vectors "U". Following that is an array-based
implementation for the time consuming parts.

Let U be a set of class-attribute-vectors where each element "u" is
a "cav" data-structure defined as:

 (cav-class u) = c, the class determined by vector u. 
                 (range 1 to nc)
 (cav-av u a) = v, the value for attribute a in vector u. 
                (range 1 to nv)

; the metric B for splitting U on attribute "a" is defined as:
(defun (B U a) 
  (/ (sum (v 1 nv)         ; sum for v=1 to nv
       (-  (sum (c 1 nc)
             (* (N c a v U)
                (log (N c a v U))))
           (* (S a v U)
              (log (S a v U)))))
     (size-of U)))

where
  (S a v U) = number of elements u in U s.t. 
      (cav-av u a) = v
and
  (N c a v U) = number of elements u in U s.t. 
      (cav-av u a) = v
    & (cav-class u) = c

In order to avoid processing the elements in U over and over again,
it is reasonable to pre-compute N (as below) and define S in terms of N.

(defvar N (make-array (list nc na nv))) 

(loop for v in U
  do (loop for a from 1 to na
       do  (increment (aref N (cav-class v) a (cav-av v a)))))

ok@quintus.uucp (Richard A. O'Keefe) (10/16/88)

In article <39500@aero.ARPA> mcguire@aero.aerospace.org (Rod McGuire) writes:
>In article <395@uiag.UUCP> gerrit@uiag.UUCP (Cap Gerrit) writes:
>>Does anybody out there in the world has an implementation of
>>the ID3 algorithm of Quinlan


>Prolog is one of the worst possible languages in which to write the
>ID3 algorithm. I don't know if it can be written to be efficient.
>The problem is that one needs to count up statistics in parallel and
>then somewhat randomly access these statistics.  While in prolog it
>is possible to kludge up arrays with mutable elements, 

There is not the slightest need for mutable arrays in an implementation
of ID3 or similar algorithms, and Prolog is not at all a poor choice.
The Iterative Dichotomiser involves two kinds of steps:

    (1) making a sweep through the training set collecting a random
	sample of *incorrectly predicted* examples to add to the
	"window" (this is the "Iterative" part)
    (2) doing a kind of back-to-front radix sort on the contents of
	the "window" to turn it into a decision tree (this is the
	"Dichotomiser" part)

>Since the ID3 algorithm is usually applied to a moderate amount of
>data (let's say, to construct a tree discriminating 10 classes from
>analysis of 5000 attribute vectors each with 20 attributes that can

If you are working with tiny training sets like that, you don't need
ID3.  Quinlan's innovation was the "windowing" technique -- forming
decision trees is nothing new, you will even find a tiny Fortran
implementation in Algorithm AS 165, JRSS -- which permitted him to
work with training sets having millions of examples.  The "windowing"
idea can be applied to many induction schemes:

    {Initialise}
	set the window to a random sample of N1 examples.
    {Induce}
	induce a "rule" from the current window.
    {Evaluate}
	make a pass through the complete training set,
	adding a random sample of N2 examples which are incorrectly
	classified (if fewer than N2 misclassifications, take all).
	If performance was adequate, stop.
    {Iterate}
	Go back to {Induce}.

Two references:
	"Discovering rules by induction from large collections of examples",
	Quinlan, J.R.
	(in) Expert Systems in the micro-electronic age, (Ed. D. Michie)
	Edinburgh University Press, 1979

	"Learning efficient classification procedures",
	Quinlan, J.R.
	(in) Machine learning: an artificial intelligence approach
	Eds Michalski, Carbonell, & Mitchell
	Tioga press, 1983
The algorithm descriptions in these articles are quite clear.

tgd@mist.cs.orst.edu (Tom Dietterich) (10/18/88)

There is evidence that the windowing feature of ID3 does not provide
much benefit.  Consult the following paper for details:

Wirth, J. and Catlett, J. (1988).  Experiments on the costs and
benefits of windowing in ID3.  In Proceedings of the Fifth
International Conference on Machine Learning, Ann Arbor, MI.
Available from Morgan-Kaufmann, Inc, Los Altos, CA.  87--99.

Here is the abstract:

"Quinlan's machine learning system ID3 uses a method called windowing
to deal economically with large training sets.  This paper describes a
series of experiments performed to investigate the merits of this
technique.  In nearly every experiment, the use of windowing
considerably increased the CPU requirements of ID3, but produced no
significant benefits.  We conclude that in noisy domains (where ID3 is
now commonly used), windowing should be avoided."

The paper reports several studies involving training sets as large as
20,000 examples.  The authors state that if you have the physical
memory to store the examples, it is best to avoid windowing.
Windowing seems to work best on noise-free training sets where there
are many redundant features.  These turn out to be rather uncommon
although the initial domains in which ID3 was developed had these properties.


--Tom Dietterich
tgd@cs.orst.edu
Department of Computer Science
Oregon State University
Corvallis, OR 97331