A genetic algorithm approach to cluster analysis

https://doi.org/10.1016/S0898-1221(99)00090-5Get rights and content
Under an Elsevier user license
open archive

Abstract

A common problem in the social and agricultural sciences is to find clusters in experimental data; the standard attack is a deterministic search terminating in a locally optimal clustering. We propose here a genetic algorithm (GA) for performing cluster analysis. GAs have been used profitably in a variety of contexts in which it is either impractical or impossible to directly solve for a globally optimal solution to complex numerical problems. In the present case, our GA clustering technique attempted to maximize a variance-ratio (VR) based goodness-of-fit criterion defined in terms of external cluster isolation and internal cluster homogeneity. Although our GA-based clustering algorithm cannot guarantee to recover the cluster solution that exhibits the global maximum of this fitness function, it does explicitly work toward this goal (in marked contrast to existing clustering algorithms, especially hierarchical agglomerative ones such as Ward's method). Using both constrained and unconstrained simulated datasets, Monte Carlo results showed that in some conditions the genetic clustering algorithm did indeed surpass the performance of conventional clustering techniques (Ward's and K-means) in terms of an internal (VR) criterion. Suggestions for future refinement and study are offered.

Keywords

Cluster analysis
Data clustering
Genetic algorithm
Global optimization
Variance-ratio maximization

Cited by (0)