[comp.sources.wanted] Clustering Algorithms

rww@esl.UUCP (Richard W. Webb) (04/21/88)

Hello,
	I am working on a personal project that needs to be able to
    automatically find clusters in a set of data points in N dimensions.
    This will be used to iteratively refine the interesting areas in
    certain Fractal objects.  I have also considered using it in a
    hierarchical rendering system.  Given a set of M points, I would
    like any information relevant to the problem of associating points
    with one local adjacent.

	One method I have looked at is to make a Minimum spanning tree
    for the points and throwing away the longest branches until I have
    nice clusters.  This method does work nicely, but has to have some
    arbitrary hueristics to decide when to stop.

	I have heard of a program called ISODATA that does some clustering,
    but I don't know how to get it or what really does.

	Any help or SOURCE :-) would be greatly appreciated.

-- 
Richard W. Webb                ecvax!decwrl!borealis!\
ESL Incorporated                      sdcsvax!seismo!- ames!esl!rww
ARPA: rww%esl@ames.ARPA                ucbcad!ucbvax!/     /
SMAIL: rww@esl.ESL.COM                       ihnp4!lll-lcc!

finegan@ucqais.uc.edu (Mike Finegan) (04/25/88)

In article <655@esl.UUCP>, rww@esl.UUCP (Richard W. Webb) writes:
> Hello,
> 
> 	I have heard of a program called ISODATA that does some clustering,
>     but I don't know how to get it or what really does.
> 
I used ISODATA to cluster 3-D vectors (RGB, HSI, etc.) for image processing
segmentation. The authors are Ball & Hall - Stanford Research Labs (sic?).
The algorithm is presented fairly completely in the report (dated 1965). The
parameters which control the clustering are cryptic (i.e. heuristic ...) and
it uses large amounts of memory. It worked okay for my application, but I
didn't have the patience to try every combination of process parameters, to
see if I was getting optimum clustering or not. 
    You would probably be better off with another technique, unless you have
very few data points (or not too many dimensions). If you really want code -
I could give you what I have - it was written for a class and could be cleaned 
up. Maybe you could send me some other clustering algorithms in return ? I
have a copy of the report - its about 100 pages - but the algorithm is about 
10 or 20, with flow charts, diagrams, formulae, etc.
				Mike Finegan   ...uccba!finegan