rik%cs@UCSD.EDU (Rik Belew) (03/17/88)
My two-bit definition of "genetic algorithms has, like all over-simplifications, drawn criticisms for being --- you guessed it --- over-simplified! So here's a second pass: First and foremost, I want to "make one thing perfectly clear": When I defined GA(1) in terms of "... John Holland and his students" I most certainly did not intend that only those that had journeyed to Ann Arbor and been annointed by the man himself were fully certified to open GA franchises! Defining a scientific approach in terms of the personalities involved is not adequate, for GA or any other work. I was attempting to distinguish a particular approach from the broader set of techniques that I called GA(2). In my experience, John is the "root" of all such work and much of it has been done by "direct" students of his at Michigan and "indirect" students of these students. I also know, however, that others --- notably Dave Ackley and Larry Rendell --- have worked on these methods without direct contact with this lineage. But I very much consider them "students of Holland," in that they are aware of and have benefited from John's work. (Again, I mean that as a compliment, not because I have been charged with GA Clique membership validation.) I see absolutely no benefit and potentially great harm in drawing lines between closely related bodies of research. So let's move on to more meaningful attempts to define the GA. My two-bit definition focused on the cross-over operator: GA(1) depended on it, and GA(2) generally relyed on the weaker (less intelligent) mutation operator. This lead Dave Ackley to feel that: ... membership in GA(1) is restricted to a small and somewhat quirky "DNA-ish" subset of all possible combination rules [ackley@flash.bellcore.com, 3 Mar 88] I take Dave to mean that the algorithm presented by Holland (let's say the R class of algorithms described in his ANAS Chapter 6, to be specific) sacrifices some performance in order to remain more biologically plausible. But I'm with Dave on this one: Personally, I'm also more intereted in the algorithms than their relation to real biological mechanisms. (Let the record show, however, that there are GA practioners who do try to take biological plausibility more seriously, e.g., Grosso and Westerdale.) So the next possibility is to refer to properties of the GA we find desirable. For Dave, I think the key property of the GA is its "implicit parallelism": the ability to search a huge space implicitly by explicitly manipulating a very small set of structures. Jon Richardson comes closer to the definition I had in mind with his emphasis on Holland's "building blocks" notion: The proper distinction I think is whether or not the recombination operator in question supports the building block hypothesis. "Mutation-like operators" do not do this. Any kind of weird recombination which can be shown to propagate and construct building blocks, I would call a Genetic Algorithm. If the operator does nothing with building blocks, I would consider it apocryphal. It may be valuable but apocryphal nonetheless and shouldn't be called a GA. [richards@UTKCS2.CS.UTK.EDU, 4 Mar 88] While I would echo the value of both these desiderata, I don't find them technically tight enough to be useful. So I suggest that we follow Holland's suggestion (in his talk at ICGA85) and reserve the term "GA" for those algorithms for which we can prove the "schemata theorem" (ANAS, Theorem 6.2.3). I believe this theorem is still the best understanding we have of how the GA gives rise to the properties of implicit parallelism, building blocks, and why. Of course, there are problems with this definition as well. In particular, it is so tightly wed to the string representation and cross-over operator that it is very difficult to imagine any algorithm very different from the (traditional) GA that would satisfy the theorem. But that's exactly where I think the work needs to be done. Finally, I want to say why I (as Dave Ackley says) "took the trouble to exclude" Ackely's SIGH system from my definition of GA(1). My answer is simply that I view SIGH as a hybrid. It has borrowed techniques from a number of areas, the GA, connectionism, simulated annealing to name three. There is absolutely nothing wrong with doing this, and as Dave's thesis showed and Larry Eshelman's note confirmed [Larry.Eshelman- @F.GP.CS.CMU.EDU, 11 Mar 1988] there are at least some problems in which SIGH does much better that the traditional GA. My only problem with SIGH is that I can't do the apportionment of credit problem: when it works, I don't know exactly which technique is responsible, and when it doesn't work I don't know who to blame. I too think about connectionist algorithms and simulated annealing along with Holland's GA and bucket brigade, and see all of them as members of a class of algorithms I want to understand better. But I find it necessary to isolate the properties of each before trying to combine them. In short, I think Dave and I agree to a great extent (on the problem, and on what portions of the solution might be), and disagree only in our respective approaches to putting it all together.