rik%cs@UCSD.EDU (Rik Belew) (03/17/88)
My two-bit definition of "genetic algorithms has,
like all over-simplifications, drawn criticisms
for being --- you guessed it --- over-simplified!
So here's a second pass:
First and foremost, I want to "make one thing
perfectly clear": When I defined GA(1) in terms
of "... John Holland and his students" I most
certainly did not intend that only those that
had journeyed to Ann Arbor and been annointed
by the man himself were fully certified to open
GA franchises! Defining a scientific approach
in terms of the personalities involved is not
adequate, for GA or any other work. I was
attempting to distinguish a particular approach
from the broader set of techniques that I called
GA(2). In my experience, John is the "root"
of all such work and much of it has been
done by "direct" students of his at Michigan
and "indirect" students of these students. I
also know, however, that others --- notably Dave
Ackley and Larry Rendell --- have worked on
these methods without direct contact with this
lineage. But I very much consider them "students
of Holland," in that they are aware of and have
benefited from John's work. (Again, I mean that
as a compliment, not because I have been charged
with GA Clique membership validation.) I see
absolutely no benefit and potentially great harm
in drawing lines between closely related bodies
of research.
So let's move on to more meaningful attempts to
define the GA. My two-bit definition focused on
the cross-over operator: GA(1) depended on it,
and GA(2) generally relyed on the weaker (less
intelligent) mutation operator. This lead Dave
Ackley to feel that:
... membership in GA(1) is restricted
to a small and somewhat quirky "DNA-ish"
subset of all possible combination rules
[ackley@flash.bellcore.com, 3 Mar 88]
I take Dave to mean that the algorithm
presented by Holland (let's say the R class of
algorithms described in his ANAS Chapter 6, to
be specific) sacrifices some performance in order
to remain more biologically plausible. But I'm
with Dave on this one: Personally, I'm also more
intereted in the algorithms than their relation
to real biological mechanisms. (Let the record
show, however, that there are GA practioners
who do try to take biological plausibility more
seriously, e.g., Grosso and Westerdale.)
So the next possibility is to refer to
properties of the GA we find desirable. For
Dave, I think the key property of the GA is its
"implicit parallelism": the ability to search a
huge space implicitly by explicitly manipulating
a very small set of structures. Jon Richardson
comes closer to the definition I had in mind
with his emphasis on Holland's "building blocks"
notion:
The proper distinction I think is
whether or not the recombination operator
in question supports the building block
hypothesis. "Mutation-like operators"
do not do this. Any kind of weird
recombination which can be shown to
propagate and construct building blocks,
I would call a Genetic Algorithm. If
the operator does nothing with building
blocks, I would consider it apocryphal.
It may be valuable but apocryphal
nonetheless and shouldn't be called a GA.
[richards@UTKCS2.CS.UTK.EDU, 4 Mar 88]
While I would echo the value of both these
desiderata, I don't find them technically tight
enough to be useful. So I suggest that we follow
Holland's suggestion (in his talk at ICGA85)
and reserve the term "GA" for those algorithms
for which we can prove the "schemata theorem"
(ANAS, Theorem 6.2.3). I believe this theorem
is still the best understanding we have of how
the GA gives rise to the properties of implicit
parallelism, building blocks, and why. Of
course, there are problems with this definition
as well. In particular, it is so tightly
wed to the string representation and cross-over
operator that it is very difficult to imagine any
algorithm very different from the (traditional)
GA that would satisfy the theorem. But that's
exactly where I think the work needs to be done.
Finally, I want to say why I (as Dave Ackley
says) "took the trouble to exclude" Ackely's SIGH
system from my definition of GA(1). My answer
is simply that I view SIGH as a hybrid. It
has borrowed techniques from a number of areas,
the GA, connectionism, simulated annealing to
name three. There is absolutely nothing wrong
with doing this, and as Dave's thesis showed and
Larry Eshelman's note confirmed [Larry.Eshelman-
@F.GP.CS.CMU.EDU, 11 Mar 1988] there are at least
some problems in which SIGH does much better
that the traditional GA. My only problem with
SIGH is that I can't do the apportionment of
credit problem: when it works, I don't know
exactly which technique is responsible, and when
it doesn't work I don't know who to blame.
I too think about connectionist algorithms and
simulated annealing along with Holland's GA and
bucket brigade, and see all of them as members
of a class of algorithms I want to understand
better. But I find it necessary to isolate the
properties of each before trying to combine them.
In short, I think Dave and I agree to a great
extent (on the problem, and on what portions of
the solution might be), and disagree only in our
respective approaches to putting it all together.