DAVIS@EMBL.BITNET.UUCP (03/09/87)
The subject just about sums it up......anyone out there in the 'lectronic village overly proud, or overly knowledgeable, or even just familiar with clustering algorithms for use in three dimensions ? That is to say, I have a bunch of points in a 3D space, and I want to cluster them. Simple huh ? Tell me how, or tell me how to find out how......replies directly to me, or post them on the list. with thanks, Paul Davis Euopean Molecular Biology Laboratory, Postfach 10.2209 6900 Heidelberg West Germany bitnet: davis@embl.bitnet uucp: ...psuvax!embl.bitnet!davis petnet: homing pigeons to.... "a time for dreams, a time for sleep, a time for love .... its now!" [What makes three-space special? Any similarity or dissimilarity metric that works in three dimensions should work in N dimensions. The really interesting cases are those where no reasonable weighting exists for combining distances in the different dimensions. Any of the major subroutine packages -- BMD, SPSS, etc. -- have clustering routines and associated documentation. Euclidean space is generally assumed, which causes problems with circular scales such as hue in a color space. (One heuristic for color spaces is to linearize the usual 256^3 cells by tracing through the space with a fractal curve, then search for clusters in the 1-D result.) Other 3-D spaces are best analysed in terms of direction cosines for vectors to the points from some origin. Statistical metrics based on within-cluster and between-cluster variances are optimal for some applications, but gravitational or potential-based models are better in others. ISODATA is a time-honored heuristic method for growing and splitting clusters, but is only suitable for circular clusters in isometric spaces. Zahn's method of analyzing minimal spanning trees is one way of overcoming the common faults (e.g., chaining or lack thereof) of heuristic approaches. The book on Pattern Recognition and Image Processing by Duda and Hart offers an easy introduction to some of the statistical and heuristic methods. Other pattern recognition books are more thorough. Clustering is still a black art, though, and you are probably best off getting a commercial package and trying a few of the options to get a feel for what works with your data. -- KIL]