skip to main content
article
Free Access

Data clustering: a review

Published:01 September 1999Publication History
Skip Abstract Section

Abstract

Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exploratory data analysis. However, clustering is a difficult problem combinatorially, and differences in assumptions and contexts in different communities has made the transfer of useful generic concepts and methodologies slow to occur. This paper presents an overview of pattern clustering methods from a statistical pattern recognition perspective, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners. We present a taxonomy of clustering techniques, and identify cross-cutting themes and recent advances. We also describe some important applications of clustering algorithms such as image segmentation, object recognition, and information retrieval.

References

  1. AARTS, E. AND KORST, J. 1989. Simulated Annealing and Boltzmann Machines: A Stochastic Approach to Combinatorial Optimization and Neural Computing. Wiley-Interscience series in discrete mathematics and optimization. John Wiley and Sons, Inc., New York, NY. Google ScholarGoogle Scholar
  2. ACM, 1994. ACM CR Classifications. ACM Computing Surveys 35, 5-16.Google ScholarGoogle Scholar
  3. AL-SULTAN, K.S. 1995. A tabu search approach to clustering problems. Pattern Recogn. 28, 1443-1451.Google ScholarGoogle Scholar
  4. AL-SULTAN, K. S. AND KHAN, M. M. 1996. Computational experience on four algorithms for the hard clustering problem. Pattern Recogn. Lett. 17, 3, 295-308. Google ScholarGoogle Scholar
  5. ALLEN, P. A. AND ALLEN, J. R. 1990. Basin Analysis: Principles and Applications. Blackwell Scientific Publications, Inc., Cambridge, MA.Google ScholarGoogle Scholar
  6. ALTA VISTA, 1999. http://altavista.digital.com.Google ScholarGoogle Scholar
  7. AMADASUN, M. AND KING, R.A. 1988. Low-level segmentation of multispectral images via agglomerative clustering of uniform neighbourhoods. Pattern Recogn. 21, 3 (1988), 261-268. Google ScholarGoogle Scholar
  8. ANDERBERG, M. R. 1973. Cluster Analysis for Applications. Academic Press, Inc., New York, NY.Google ScholarGoogle Scholar
  9. AUGUSTSON, J. G. AND MINKER, J. 1970. An analysis of some graph theoretical clustering techniques. J. ACM 17, 4 (Oct. 1970), 571- 588. Google ScholarGoogle Scholar
  10. BABU, G. P. AND MURTY, M. N. 1993. A nearoptimal initial seed value selection in K-means algorithm using a genetic algorithm. Pattern Recogn. Lett. 14, 10 (Oct. 1993), 763- 769. Google ScholarGoogle Scholar
  11. BABU, G. P. AND MURTY, M.N. 1994. Clustering with evolution strategies. Pattern Recogn. 27, 321-329.Google ScholarGoogle Scholar
  12. BABU, G. P., MURTY, M. N., AND KEERTHI, S. S. 2000. Stochastic connectionist approach for pattern clustering (To appear). IEEE Trans. Syst. Man Cybern. Google ScholarGoogle Scholar
  13. BACKER, F. B. AND HUBERT, L.g. 1976. A graphtheoretic approach to goodness-of-fit in complete-link hierarchical clustering. J. Am. Stat. Assoc. 71,870-878.Google ScholarGoogle Scholar
  14. BACKER, E. 1995. Computer-Assisted Reasoning in Cluster Analysis. Prentice Hall International (UK) Ltd., Hertfordshire, UK. Google ScholarGoogle Scholar
  15. BAEZA-YATES, R.A. 1992. Introduction to data structures and algorithms related to information retrieval. In Information Retrieval: Data Structures and Algorithms, W. B. Frakes and R. Baeza-Yates, Eds. Prentice- Hall, Inc., Upper Saddle River, NJ, 13-27. Google ScholarGoogle Scholar
  16. BAJCSY, P. 1997. Hierarchical segmentation and clustering using similarity analysis. Ph.D. Dissertation. Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL. Google ScholarGoogle Scholar
  17. BALL, G. H. AND HALL, D.J. 1965. ISODATA, a novel method of data analysis and classification. Tech. Rep. Stanford University, Stanford, CA.Google ScholarGoogle Scholar
  18. BENTLEY, J. L. AND FRIEDMAN, J.H. 1978. Fast algorithms for constructing minimal spanning trees in coordinate spaces. IEEE Trans. Comput. C-27, 6 (June), 97-105.Google ScholarGoogle Scholar
  19. BEZDEK, J. C. 1981. Pattern Recognition With Fuzzy Objective Function Algorithms. Plenum Press, New York, NY. Google ScholarGoogle Scholar
  20. BHUYAN, J. N., RAGHAVAN, V. V., AND VENKATESH, K.E. 1991. Genetic algorithm for clustering with an ordered representation. In Proceedings of the Fourth International Conference on Genetic Algorithms, 408-415.Google ScholarGoogle Scholar
  21. BISWAS, G., WEINBERG, J., AND LI, C. 1995. A Conceptual Clustering Method for Knowledge Discovery in Databases. Editions Technip.Google ScholarGoogle Scholar
  22. BRAILOVSKY, V. L. 1991. A probabilistic approach to clustering. Pattern Recogn. Lett. 12, 4 (Apr. 1991), 193-198. Google ScholarGoogle Scholar
  23. BRODATZ, P. 1966. Textures: A Photographic Album for Artists and Designers. Dover Publications, Inc., Mineola, NY.Google ScholarGoogle Scholar
  24. CAN, F. 1993. Incremental clustering for dynamic information processing. ACM Trans. Inf. Syst. 11, 2 (Apr. 1993), 143-164. Google ScholarGoogle Scholar
  25. CARPENTER, G. AND GROSSBERG, S. 1990. ART3: Hierarchical search using chemical transmitters in self-organizing pattern recognition architectures. Neural Networks 3, 129-152.Google ScholarGoogle Scholar
  26. CHEKURI, C., GOLDWASSER, M. H., RAGHAVAN, P., AND UPFAL, E. 1997. Web search using automatic classification. In Proceedings of the Sixth International Conference on the World Wide Web (Santa Clara, CA, Apr.), http:// theory.stanford.edu/people/wass/publications/ Web Search/Web Search.html.Google ScholarGoogle Scholar
  27. CHENG, C. H. 1995. A branch-and-bound clustering algorithm. IEEE Trans. Syst. Man Cybern. 25, 895-898.Google ScholarGoogle Scholar
  28. CHENG, Y. AND FU, K.S. 1985. Conceptual clustering in knowledge organization. IEEE Trans. Pattern Anal. Mach. Intell. 7, 592-598.Google ScholarGoogle Scholar
  29. CHENG, Y. 1995. Mean shift, mode seeking, and clustering. IEEE Trans. Pattern Anal. Mach. Intell. 17, 7 (July), 790-799. Google ScholarGoogle Scholar
  30. CHIEN, Y.T. 1978. Interactive Pattern Recognition. Marcel Dekker, Inc., New York, NY. Google ScholarGoogle Scholar
  31. CHOUDHURY, S. AND MURTY, M.N. 1990. A divisive scheme for constructing minimal spanning trees in coordinate space. Pattern Recogn. Lett. 11, 6 (Jun. 1990), 385-389. Google ScholarGoogle Scholar
  32. 1996. Special issue on data mining. Commun. ACM 39, 11.Google ScholarGoogle Scholar
  33. COLEMAN, G. B. AND ANDREWS, H. C. 1979. Image segmentation by clustering. Proc. IEEE 67, 5, 773-785.Google ScholarGoogle Scholar
  34. CONNELL, S. AND JAIN, A. K. 1998. Learning prototypes for on-line handwritten digits. In Proceedings of the 14th International Conference on Pattern Recognition (Brisbane, Australia, Aug.), 182-184. Google ScholarGoogle Scholar
  35. CROSS, S. E., Ed. 1996. Special issue on data mining. IEEE Expert 11, 5 (Oct.). Google ScholarGoogle Scholar
  36. DALE, M. B. 1985. On the comparison of conceptual clustering and numerical taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 7, 241-244.Google ScholarGoogle Scholar
  37. DAVE, R. N. 1992. Generalized fuzzy C-shells clustering and detection of circular and elliptic boundaries. Pattern Recogn. 25, 713-722.Google ScholarGoogle Scholar
  38. DAVIS, T., Ed. 1991. The Handbook of Genetic Algorithms. Van Nostrand Reinhold Co., New York, NY.Google ScholarGoogle Scholar
  39. DAY, W. H.E. 1992. Complexity theory: An introduction for practitioners of classification. In Clustering and Classification, P. Arabie and L. Hubert, Eds. World Scientific Publishing Co., Inc., River Edge, NJ.Google ScholarGoogle Scholar
  40. DEMPSTER, A. P., LAIRD, N. M., AND RUB IN, D. B. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Stat. Soc. B. 39, 1, 1-38.Google ScholarGoogle Scholar
  41. DIDAY, E. 1973. The dynamic cluster method in non-hierarchical clustering. J. Comput. Inf. Sci. 2, 61-88.Google ScholarGoogle Scholar
  42. DIDAY, E. AND SIMON, J. C. 1976. Clustering analysis. In Digital Pattern Recognition, K. S. Fu, Ed. Springer-Verlag, Secaucus, NJ, 47-94.Google ScholarGoogle Scholar
  43. DIDAY, E. 1988. The symbolic approach in clustering. In Classification and Related Methods, H. H. Bock, Ed. North-Holland Publishing Co., Amsterdam, The Netherlands.Google ScholarGoogle Scholar
  44. DORAI, C. AND JAIN, A.K. 1995. Shape spectra based view grouping for free-form objects. In Proceedings of the International Conference on Image Processing (ICIP-95), 240-243. Google ScholarGoogle Scholar
  45. DUBES, R. C. AND JAIN, A. K. 1976. Clustering techniques: The user's dilemma. Pattern Recogn. 8, 247-260.Google ScholarGoogle Scholar
  46. DUBES, R. C. AND JAIN, A. K. 1980. Clustering methodology in exploratory data analysis. In Advances in Computers, M. C. Yovits,, Ed. Academic Press, Inc., New York, NY, 113- 125.Google ScholarGoogle Scholar
  47. DUBES, R. C. 1987. How many clusters are best?--an experiment. Pattern Recogn. 20, 6 (Nov. 1, 1987), 645-663. Google ScholarGoogle Scholar
  48. DUBES, R.C. 1993. Cluster analysis and related issues. In Handbook of Pattern Recognition & Computer Vision, C. H. Chen, L. F. Pau, and P. S. P. Wang, Eds. World Scientific Publishing Co., Inc., River Edge, NJ, 3-32. Google ScholarGoogle Scholar
  49. DUBUISSON, M. P. AND JAIN, A.K. 1994. A modified Hausdorff distance for object matching. In Proceedings of the International Conference on Pattern Recognition (ICPR '94), 566-568.Google ScholarGoogle Scholar
  50. DUDA, R. O. AND HART, P. E. 1973. Pattern Classification and Scene Analysis. John Wiley and Sons, Inc., New York, NY. Google ScholarGoogle Scholar
  51. DUNN, S., JANOS, L., AND ROSENFELD, A. 1983. Bimean clustering. Pattern Recogn. Lett. 1, 169-173.Google ScholarGoogle Scholar
  52. DURAN, B. S. AND ODELL, P. L. 1974. Cluster Analysis: A Survey. Springer-Verlag, New York, NY.Google ScholarGoogle Scholar
  53. EDDY, W. F., MOCKUS, A., AND OUE, S. 1996. Approximate single linkage cluster analysis of large data sets in high-dimensional spaces. Comput. Stat. Data Anal. 23, 1, 29-43. Google ScholarGoogle Scholar
  54. ETZIONI, O. 1996. The World-Wide Web: quagmire or gold mine? Commun. ACM 39, 11, 65-68. Google ScholarGoogle Scholar
  55. EVERITT, B.S. 1993. Cluster Analysis. Edward Arnold, Ltd., London, UK.Google ScholarGoogle Scholar
  56. FABER, V. 1994. Clustering and the continuous k-means algorithm. Los Alamos Science 22, 138-144.Google ScholarGoogle Scholar
  57. FABER, V., HOCHBERG, J. C., KELLY, P. M., THOMAS, T. R., AND WHITE, J.M. 1994. Concept extraction: A data-mining technique. Los Alamos Science 22, 122-149.Google ScholarGoogle Scholar
  58. FAYYAD, U. M. 1996. Data mining and knowledge discovery: Making sense out of data. IEEE Expert 11, 5 (Oct.), 20-25. Google ScholarGoogle Scholar
  59. FISHER, D. AND LANGLEY, P. 1986. Conceptual clustering and its relation to numerical taxonomy. In Artificial Intelligence and Statistics, A W. Gale, Ed. Addison-Wesley Longman Publ. Co., Inc., Reading, MA, 77-116. Google ScholarGoogle Scholar
  60. FISHER, D. 1987. Knowledge acquisition via incremental conceptual clustering. Mach. Learn. 2, 139-172. Google ScholarGoogle Scholar
  61. FISHER, D., Xu, L., CARNES, R., RICH, Y., FENVES, S. J., CHEN, J., SHIAVI, R., BISWAS, G., AND WEIN- BERG, J. 1993. Applying AI clustering to engineering tasks. IEEE Expert 8, 51-60. Google ScholarGoogle Scholar
  62. FISHER, L. AND VAN NESS, J. W. 1971. Admissible clustering procedures. Biometrika 58, 91-104.Google ScholarGoogle Scholar
  63. FLYNN, P. J. AND JAIN, A.K. 1991. BONSAI: 3D object recognition using constrained search. IEEE Trans. Pattern Anal. Mach. Intell. 13, 10 (Oct. 1991), 1066-1075. Google ScholarGoogle Scholar
  64. FOGEL, D. B. AND SIMPSON, P.K. 1993. Evolving fuzzy clusters. In Proceedings of the International Conference on Neural Networks (San Francisco, CA), 1829-1834.Google ScholarGoogle Scholar
  65. FOGEL, D. B. AND FOGEL, L. J., Eds. 1994. Special issue on evolutionary computation. IEEE Trans. Neural Netw. (Jan.).Google ScholarGoogle Scholar
  66. FOGEL, L. J., OWENS, A. J., AND WALSH, M. J. 1965. Artificial Intelligence Through Simulated Evolution. John Wiley and Sons, Inc., New York, NY.Google ScholarGoogle Scholar
  67. FRAKES, W. B. AND BAEZA-YATES, R., Eds. 1992. Information Retrieval: Data Structures and Algorithms. Prentice-Hall, Inc., Upper Saddle River, NJ. Google ScholarGoogle Scholar
  68. FRED, A. L. N. AND LEITAO, J. M. N. 1996. A minimum code length technique for clustering of syntactic patterns. In Proceedings of the International Conference on Pattern Recognition (Vienna, Austria), 680-684. Google ScholarGoogle Scholar
  69. FRED, A. L. N. 1996. Clustering of sequences using a minimum grammar complexity criterion. In Grammatical Inference: Learning Syntax from Sentences, L. Miclet and C. Higuera, Eds. Springer-Verlag, Secaucus, NJ, 107-116. Google ScholarGoogle Scholar
  70. Fu, K. S. AND LU, S.Y. 1977. A clustering procedure for syntactic patterns. IEEE Trans. Syst. Man Cybern. 7, 734-742.Google ScholarGoogle Scholar
  71. Fu, K. S. AND MUI, J. K. 1981. A survey on image segmentation. Pattern Recogn. 13, 3-16.Google ScholarGoogle Scholar
  72. FUKUNAGA, Z. 1990. Introduction to Statistical Pattern Recognition. 2nd ed. Academic Press Prof., Inc., San Diego, CA. Google ScholarGoogle Scholar
  73. GLOVER, F. 1986. Future paths for integer programming and links to artificial intelligence. Comput. Oper. Res. 13, 5 (May 1986), 533- 549. Google ScholarGoogle Scholar
  74. GOLDBERG, D. E. 1989. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Publishing Co., Inc., Redwood City, CA. Google ScholarGoogle Scholar
  75. GORDON, A. D. AND HENDERSON, J. T. 1977. Algorithm for Euclidean sum of squares. Biometrics 33, 355-362.Google ScholarGoogle Scholar
  76. GOTLIEB, G. C. AND KUMAR, S. 1968. Semantic clustering of index terms. J. ACM 15, 493- 513. Google ScholarGoogle Scholar
  77. GOWDA, K. C. 1984. A feature reduction and unsupervised classification algorithm for multispectral data. Pattern Recogn. 17, 6, 667- 676.Google ScholarGoogle Scholar
  78. GOWDA, K. C. AND KRISHNA, G. 1977. Agglomerative clustering using the concept of mutual nearest neighborhood. Pattern Recogn. 10, 105-112.Google ScholarGoogle Scholar
  79. GOWDA, K. C. AND DIDAY, E. 1992. Symbolic clustering using a new dissimilarity meG- sure. IEEE Trans. Syst. Man Cybern. 22, 368-378.Google ScholarGoogle Scholar
  80. GOWER, J. C. AND ROSS, G. J.S. 1969. Minimum spanning rees and single-linkage cluster analysis. Appl. Stat. 18, 54-64.Google ScholarGoogle Scholar
  81. GREFENSTETTE, J 1986. Optimization of control parameters for genetic algorithms. IEEE Trans. Syst. Man Cybern. SMC-16, 1 (Jan./ Feb. 1986), 122-128. Google ScholarGoogle Scholar
  82. HARALICK, R. M. AND KELLY, G. L. 1969. Pattern recognition with measurement space and spatial clustering for multiple images. Proc. IEEE 57, 4, 654-665.Google ScholarGoogle Scholar
  83. HARTIGAN, J. A. 1975. Clustering Algorithms. John Wiley and Sons, Inc., New York, NY. Google ScholarGoogle Scholar
  84. HEDBERG, S. 1996. Searching for the mother lode: Tales of the first data miners. IEEE Expert 11, 5 (Oct.), 4-7. Google ScholarGoogle Scholar
  85. HERTZ, J., KROGH, A., AND PALMER, R. G. 1991. Introduction to the Theory of Neural Computation. Santa Fe Institute Studies in the Sciences of Complexity lecture notes. Addison- Wesley Longman Publ. Co., Inc., Reading, MA. Google ScholarGoogle Scholar
  86. HOFFMAN, R. AND JAIN, A. K. 1987. Segmentation and classification of range images. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-9, 5 (Sept. 1987), 608-620. Google ScholarGoogle Scholar
  87. HOFMANN, T. AND BUHMANN, J. 1997. Pairwise data clustering by deterministic annealing. IEEE Trans. Pattern Anal. Mach. Intell. 19, 1 (Jan.), 1-14. Google ScholarGoogle Scholar
  88. HOFMANN, T., PUZICHA, J., AND BUCHMANN, J. M. 1998. Unsupervised texture segmentation in a deterministic annealing framework. IEEE Trans. Pattern Anal. Mach. Intell. 20, 8, 803-818. Google ScholarGoogle Scholar
  89. HOLLAND, J.H. 1975. Adaption in Natural and Artificial Systems. University of Michigan Press, Ann Arbor, MI. Google ScholarGoogle Scholar
  90. HOOVER, A., JEAN-BAPTISTE, G., JIANG, X., FLYNN, P. J., BUNKE, H., GOLDGOF, D. B., BOWYER, K., EGGERT, D. W., FITZGIBBON, A., AND FISHER, R. B. 1996. An experimental comparison of range image segmentation algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 18, 7, 673- 689. Google ScholarGoogle Scholar
  91. HUTTENLOCHER, D. P., KLANDERMAN, G. A., AND RUCKLIDGE, W.J. 1993. Comparing images using the Hausdorff distance. IEEE Trans. Pattern Anal. Mach. Intell. 15, 9, 850-863. Google ScholarGoogle Scholar
  92. ICHINO, M. AND YAGUCHI, H. 1994. Generalized Minkowski metrics for mixed feature-type data analysis. IEEE Trans. Syst. Man Cybern. 24, 698-708.Google ScholarGoogle Scholar
  93. 1991. Proceedings of the International Joint Conference on Neural Networks. (IJCNN'91).Google ScholarGoogle Scholar
  94. 1992. Proceedings of the International Joint Conference on Neural Networks.Google ScholarGoogle Scholar
  95. ISMAIL, M. A. AND KAMEL, M. S. 1989. Multidimensional data clustering utilizing hybrid search strategies. Pattern Recogn. 22, 1 (Jan. 1989), 75-89. Google ScholarGoogle Scholar
  96. JAIN, A. K. AND DUBES, R.C. 1988. Algorithms for Clustering Data. Prentice-Hall advanced reference series. Prentice-Hall, Inc., Upper Saddle River, NJ. Google ScholarGoogle Scholar
  97. JAIN, A. K. AND FARROKHNIA, F. 1991. Unsupervised texture segmentation using Gabor filters. Pattern Recogn. 24, 12 (Dec. 1991), 1167-1186. Google ScholarGoogle Scholar
  98. JAIN, A. K. AND BHATTACHARJEE, S. 1992. Text segmentation using Gabor filters for automatic document processing. Mach. Vision Appl. 5, 3 (Summer 1992), 169-184. Google ScholarGoogle Scholar
  99. JAIN, A. J. AND FLYNN, P. J., Eds. 1993. Three Dimensional Object Recognition Systems. Elsevier Science Inc., New York, NY. Google ScholarGoogle Scholar
  100. JAIN, A. K. AND MAO, J. 1994. Neural networks and pattern recognition. In Computational Intelligence: Imitating Life, J. M. Zurada, R. J. Marks, and C. J. Robinson, Eds. 194- 212.Google ScholarGoogle Scholar
  101. JAIN, A. K. AND FLYNN, P.J. 1996. Image segmentation using clustering. In Advances in Image Understanding: A Festschrift for Azriel Rosenfeld, N. Ahuja and K. Bowyer, Eds, IEEE Press, Piscataway, NJ, 65-83.Google ScholarGoogle Scholar
  102. JAIN, A. K. AND MAO, J. 1996. Artificial neural networks: A tutorial. IEEE Computer 29 (Mar.), 31-44. Google ScholarGoogle Scholar
  103. JAIN, A. K., RATHA, N. K., AND LAKSHMANAN, S. 1997. Object detection using Gabor filters. Pattern Recogn. 30, 2, 295-309.Google ScholarGoogle Scholar
  104. JAIN, N. C., INDRAYAN, A., AND GOEL, L. R. 1986. Monte Carlo comparison of six hierarchical clustering methods on random data. Pattern Recogn. 19, 1 (Jan./Feb. 1986), 95-99. Google ScholarGoogle Scholar
  105. JAIN, R., KASTURI, R., AND SCHUNCK, B. G. 1995. Machine Vision. McGraw-Hill series in computer science. McGraw-Hill, Inc., New York, NY. Google ScholarGoogle Scholar
  106. JARVIS, R. A. AND PATRICK, E. A. 1973. Clustering using a similarity method based on shared near neighbors. IEEE Trans. Comput. C-22, 8 (Aug.), 1025-1034.Google ScholarGoogle Scholar
  107. JOLION, J.-M., MEER, P., AND BATAOUCHE, S. 1991. Robust clustering with applications in computer vision. IEEE Trans. Pattern Anal. Mach. Intell. 13, 8 (Aug. 1991), 791-802. Google ScholarGoogle Scholar
  108. JONES, D. AND BELTRAMO, M.A. 1991. Solving partitioning problems with genetic algorithms. In Proceedings of the Fourth International Conference on Genetic Algorithms, 442-449.Google ScholarGoogle Scholar
  109. JUDD, D., MCKINLEY, P., AND JAIN, A. K. 1996. Large-scale parallel data clustering. In Proceedings of the International Conference on Pattern Recognition (Vienna, AustriG), 488-493. Google ScholarGoogle Scholar
  110. KING, B. 1967. Step-wise clustering procedures. J. Am. Stat. Assoc. 69, 86-101.Google ScholarGoogle Scholar
  111. KIRKPATRICK, S., GELATT, C. D., JR., AND VECCHI, M.P. 1983. Optimization by simulated annealing. Science 220, 4598 (May), 671-680.Google ScholarGoogle Scholar
  112. KLEIN, R. W. AND DUBES, R. C. 1989. Experiments in projection and clustering by simulated annealing. Pattern Recogn. 22, 213-220.Google ScholarGoogle Scholar
  113. KNUTH, D. 1973. The Art of Computer Programming. Addison-Wesley, Reading, MA. Google ScholarGoogle Scholar
  114. KOONTZ, W. L. G., FUKUNAGA, K., AND NARENDRA, P.M. 1975. A branch and bound clustering algorithm. IEEE Trans. Comput. 23, 908- 914.Google ScholarGoogle Scholar
  115. KOHONEN, T. 1989. Self-Organization andAssociative Memory. 3rd ed. Springer information sciences series. Springer-Verlag, New York, NY. Google ScholarGoogle Scholar
  116. KRAAIJVELD, M., MAO, J., AND JAIN, A. K. 1995. A non-linear projection method based on Kohonen's topology preserving maps. IEEE Trans. Neural Netw. 6, 548-559. Google ScholarGoogle Scholar
  117. KRISHNAPURAM, R., FRIGUI, H., AND NASRAOUI, O. 1995. Fuzzy and probabilistic shell clustering algorithms and their application to boundary detection and surface approximation. IEEE Trans. Fuzzy Systems 3, 29-60. Google ScholarGoogle Scholar
  118. KURITA, T. 1991. An efficient agglomerative clustering algorithm using a heap. Pattern Recogn. 24, 3 (1991), 205-209. Google ScholarGoogle Scholar
  119. LIBRARY OF CONGRESS, 1990. LC classification outline. Library of Congress, Washington, DC.Google ScholarGoogle Scholar
  120. LEBOWITZ, M. 1987. Experiments with incremental concept formation. Mach. Learn. 2, 103-138. Google ScholarGoogle Scholar
  121. LEE, H.-Y. AND ONG, H.-L. 1996. Visualization support for data mining. IEEE Expert 11, 5 (Oct.), 69-75. Google ScholarGoogle Scholar
  122. LEE, R. C. T., SLAGLE, J. R., AND MONG, C. T. 1978. Towards automatic auditing of records. IEEE Trans. Softw. Eng. 4, 441- 448.Google ScholarGoogle Scholar
  123. LEE, R. C. T. 1981. Cluster analysis and its applications. In Advances in Information Systems Science, J. T. Tou, Ed. Plenum Press, New York, NY.Google ScholarGoogle Scholar
  124. LI, C. AND BISWAS, G. 1995. Knowledge-based scientific discovery in geological databases. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (Montreal, Canada, Aug. 20-21), 204 -209.Google ScholarGoogle Scholar
  125. Lu, S. Y. AND FU, K. S. 1978. A sentence-tosentence clustering procedure for pattern analysis. IEEE Trans. Syst. Man Cybern. 8, 381-389.Google ScholarGoogle Scholar
  126. LUNDERVOLD, A., FENSTAD, A. M., ERSLAND, L., AND TAXT, T. 1996. Brain tissue volumes from multispectral 3D MRI: A comparative study of four classifiers. In Proceedings of the Conference of the Society on Magnetic Resonance,Google ScholarGoogle Scholar
  127. MAAREK, Y. S. AND BEN SHAUL, I. Z. 1996. Automatically organizing bookmarks per contents. In Proceedings of the Fifth International Conference on the World Wide Web (Paris, May), http://www5conf.inria.fr/fichhtml/paper-sessions.html. Google ScholarGoogle Scholar
  128. MCQUEEN, J. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 281-297.Google ScholarGoogle Scholar
  129. MAO, J. AND JAIN, A.K. 1992. Texture classification and segmentation using multiresolution simultaneous autoregressive models. Pattern Recogn. 25, 2 (Feb. 1992), 173-188. Google ScholarGoogle Scholar
  130. MAO, J. AND JAIN, A.K. 1995. Artificial neural networks for feature extraction and multivariate data projection. IEEE Trans. Neural Netw. 6, 296-317. Google ScholarGoogle Scholar
  131. MAO, J. AND JAIN, A.K. 1996. A self-organizing network for hyperellipsoidal clustering (HEC). IEEE Trans. Neural Netw. 7, 16-29. Google ScholarGoogle Scholar
  132. MEVINS, A.J. 1995. A branch and bound incremental conceptual clusterer. Mach. Learn. 18, 5-22. Google ScholarGoogle Scholar
  133. MICHALSKI, R., STEPP, R. E., AND DIDAY, E. 1981. A recent advance in data analysis: Clustering objects into classes characterized by conjunctive concepts. In Progress in Pattern Recognition, Vol. 1, L. Kanal and A. Rosenfeld, Eds. North-Holland Publishing Co., Amsterdam, The Netherlands.Google ScholarGoogle Scholar
  134. MICHALSKI, R., STEPP, R. E., AND DIDAY, E. 1983. Automated construction of classifications: conceptual clustering versus numerical taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-5, 5 (Sept.), 396-409.Google ScholarGoogle Scholar
  135. MISHRA, S. K. AND RAGHAVAN, V. V. 1994. An empirical study of the performance of heuristic methods for clustering. In Pattern Recognition in Practice, E. S. Gelsema and L. N. Kanal, Eds. 425-436.Google ScholarGoogle Scholar
  136. MITCHELL, T. 1997. Machine Learning. McGraw- Hill, Inc., New York, NY. Google ScholarGoogle Scholar
  137. MOHIUDDIN, K. M. AND MAO, g. 1994. A comparative study of different classifiers for handprinted character recognition. In Pattern Recognition in Practice, E. S. Gelsema and L. N. Kanal, Eds. 437-448.Google ScholarGoogle Scholar
  138. MOOR, B.K. 1988. ART 1 and Pattern Clustering. In 1988 Connectionist Summer School, Morgan Kaufmann, San Mateo, CA, 174-185.Google ScholarGoogle Scholar
  139. MURTAGH, F. 1984. A survey of recent advances in hierarchical clustering algorithms which use cluster centers. Comput. J. 26, 354-359.Google ScholarGoogle Scholar
  140. MURTY, M. N. AND KRISHNA, G. 1980. A computationally efficient technique for data clustering. Pattern Recogn. 12, 153-158.Google ScholarGoogle Scholar
  141. MURTY, M. N. AND JAIN, A.K. 1995. Knowledgebased clustering scheme for collection management and retrieval of library books. Pattern Recogn. 28, 949-964.Google ScholarGoogle Scholar
  142. NAGY, G. 1968. State of the art in pattern recognition. Proc. IEEE 56, 836-862.Google ScholarGoogle Scholar
  143. NG, R. AND HAN, J. 1994. Very large data bases. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB'94, Santiago, Chile, Sept.), VLDB Endowment, Berkeley, CA, 144-155. Google ScholarGoogle Scholar
  144. NGUYEN, H. H. AND COHEN, P. 1993. Gibbs random fields, fuzzy clustering, and the unsupervised segmentation of textured images. CV- GIP: Graph. Models Image Process. 55, 1 (Jan. 1993), 1-19. Google ScholarGoogle Scholar
  145. OEHLER, K. L. AND GRAY, R. M. 1995. Combining image compression and classification using vector quantization. IEEE Trans. Pattern Anal. Mach. Intell. 17, 461-473. Google ScholarGoogle Scholar
  146. OJA, E. 1982. A simplified neuron model as a principal component analyzer. Bull. Math. Bio. 15, 267-273.Google ScholarGoogle Scholar
  147. OZAWA, K. 1985. A stratificational overlapping cluster scheme. Pattern Recogn. 18, 279-286.Google ScholarGoogle Scholar
  148. OPEN TEXT, 1999. http://index.opentext.net.Google ScholarGoogle Scholar
  149. KAMGAR-PARSI, B., GUALTIERI, J. A., DEVANEY, J. A., AND KAMGAR-PARSI, K. 1990. Clustering with neural networks. Biol. Cybern. 63, 201-208.Google ScholarGoogle Scholar
  150. LYCOS, 1999. http://www.lycos.com.Google ScholarGoogle Scholar
  151. PAL, N. R., BEZDEK, J. C., AND TSAO, E. C.-K. 1993. Generalized clustering networks and Kohonen's self-organizing scheme. IEEE Trans. Neural Netw. 4, 549-557.Google ScholarGoogle Scholar
  152. QUINLAN, J. R. 1990. Decision trees and decision making. IEEE Trans. Syst. Man Cybern. 20, 339-346.Google ScholarGoogle Scholar
  153. RAGHAVAN, V. V. AND BIRCHAND, K. 1979. A clustering strategy based on a formalism of the reproductive process in a natural system. In Proceedings of the Second International Conference on Information Storage and Retrieval, 10-22. Google ScholarGoogle Scholar
  154. RAGHAVAN, V. V. AND YU, C.T. 1981. A comparison of the stability characteristics of some graph theoretic clustering methods. IEEE Trans. Pattern Anal. Mach. Intell. 3, 393-402.Google ScholarGoogle Scholar
  155. RASMUSSEN, E. 1992. Clustering algorithms. In Information Retrieval: Data Structures and Algorithms, W. B. Frakes and R. Baeza-Yates, Eds. Prentice-Hall, Inc., Upper Saddle River, NJ, 419-442. Google ScholarGoogle Scholar
  156. RICH, E. 1983. ArtificialIntelligence. McGraw- Hill, Inc., New York, NY. Google ScholarGoogle Scholar
  157. RIPLEY, B. D., Ed. 1989. Statistical Inference for Spatial Processes. Cambridge University Press, New York, NY. Google ScholarGoogle Scholar
  158. ROSE, K., GUREWITZ, E., AND FOX, G. C. 1993. Deterministic annealing approach to constrained clustering. IEEE Trans. Pattern Anal. Mach. Intell. 15, 785-794. Google ScholarGoogle Scholar
  159. ROSENFELD, A. AND KAK, A.C. 1982. Digital Picture Processing. 2nd ed. Academic Press, Inc., New York, NY. Google ScholarGoogle Scholar
  160. ROSENFELD, A., SCHNEIDER, V. B., AND HUANG, M. K. 1969. An application of cluster detection to text and picture processing. IEEE Trans. Inf. Theor. 15, 6, 672-681.Google ScholarGoogle Scholar
  161. Ross, G. J. S. 1968. Classification techniques for large sets of data. In Numerical Taxonomy, A. J. Cole, Ed. Academic Press, Inc., New York, NY.Google ScholarGoogle Scholar
  162. RuSPINI, E.H. 1969. A new approach to clustering. Inf. Control 15, 22-32.Google ScholarGoogle Scholar
  163. SALTON, G. 1991. Developments in automatic text retrieval. Science 253, 974-980.Google ScholarGoogle Scholar
  164. SAMAL, A. AND IYENGAR, P.A. 1992. Automatic recognition and analysis of human faces and facial expressions: A survey. Pattern Recogn. 25, 1 (Jan. 1992), 65-77. Google ScholarGoogle Scholar
  165. SAMMON, J. W. JR. 1969. A nonlinear mapping for data structure analysis. IEEE Trans. Comput. 18, 401-409.Google ScholarGoogle Scholar
  166. SANGAL, R. 1991. Programming Paradigms in LISP. McGraw-Hill, Inc., New York, NY. Google ScholarGoogle Scholar
  167. SCHACHTER, B. J., DAVIS, L. S., AND ROSENFELD, A. 1979. Some experiments in image segmentation by clustering of local feature values. Pattern Recogn. 11, 19-28.Google ScholarGoogle Scholar
  168. SCHWEFEL, H.P. 1981. Numerical Optimization of Computer Models. John Wiley and Sons, Inc., New York, NY. Google ScholarGoogle Scholar
  169. SELIM, S. Z. AND ISMAIL, M.A. 1984. K-meanstype algorithms: A generalized convergence theorem and characterization of local optimality. IEEE Trans. Pattern Anal. Mach. Intell. 6, 81-87.Google ScholarGoogle Scholar
  170. SELIM, S. Z. AND ALSULTAN, K. 1991. A simulated annealing algorithm for the clustering problem. Pattern Recogn. 24, 10 (1991), 1003-1008. Google ScholarGoogle Scholar
  171. SEN, A. AND SRIVASTAVA, M. 1990. Regression Analysis. Springer-Verlag, New York, NY.Google ScholarGoogle Scholar
  172. SETHI, I. AND JAIN, A. K., Eds. 1991. Artificial Neural Networks and Pattern Recognition: Old and New Connections. Elsevier Science Inc., New York, NY. Google ScholarGoogle Scholar
  173. SHEKAR, B., MURTY, N. M., AND KRISHNA, G. 1987. A knowledge-based clustering scheme. Pattern Recogn. Lett. 5, 4 (Apr. 1, 1987), 253- 259. Google ScholarGoogle Scholar
  174. SILVERMAN, J. F. AND COOPER, D. B. 1988. Bayesian clustering for unsupervised estimation of surface and texture models. IEEE Trans. Pattern Anal. Mach. Intell. 10, 4 (July 1988), 482-495. Google ScholarGoogle Scholar
  175. SIMOUDIS, E. 1996. Reality check for data mining. IEEE Expert 11, 5 (Oct.), 26-33. Google ScholarGoogle Scholar
  176. SLAGLE, J. R., CHANG, C. L., AND HELLER, S. R. 1975. A clustering and data-reorganizing algorithm. IEEE Trans. Syst. Man Cybern. 5, 125-128.Google ScholarGoogle Scholar
  177. SNEATH, P. H. A. AND SOKAL, R. R. 1973. Numerical Taxonomy. Freeman, London, UK.Google ScholarGoogle Scholar
  178. SPATH, H. 1980. Cluster Analysis Algorithms for Data Reduction and Classification. Ellis Horwood, Upper Saddle River, NJ.Google ScholarGoogle Scholar
  179. SOLBERG, A., TAXT, T., AND JAIN, A. 1996. A Markov random field model for classification of multisource satellite imagery. IEEE Trans. Geoscience and Remote Sensing 34, 1, 100-113.Google ScholarGoogle Scholar
  180. SRIVASTAVA, A. AND MURTY, M. N 1990. A comparison between conceptual clustering and conventional clustering. Pattern Recogn. 23, 9 (1990), 975-981. Google ScholarGoogle Scholar
  181. STAHL, H. 1986. Cluster analysis of large data sets. In Classification as a Tool of Research, W. Gaul and M. Schader, Eds. Elsevier North-Holland, Inc., New York, NY, 423-430.Google ScholarGoogle Scholar
  182. STEPP, R. E. AND MICHALSKI, R. S. 1986. Conceptual clustering of structured objects: A goal-oriented approach. Artif. Intell. 28, 1 (Feb. 1986), 43-69. Google ScholarGoogle Scholar
  183. SUTTON, M., STARK, L., AND BOWYER, K. 1993. Function-based generic recognition for multiple object categories. In Three-Dimensional Object Recognition Systems, A. Jain and P. J. Flynn, Eds. Elsevier Science Inc., New York, NY.Google ScholarGoogle Scholar
  184. SYMON, M. J. 1977. Clustering criterion and multi-variate normal mixture. Biometrics 77, 35-43.Google ScholarGoogle Scholar
  185. TANAKA, E. 1995. Theoretical aspects of syntactic pattern recognition. Pattern Recogn. 28, 1053-1061.Google ScholarGoogle Scholar
  186. TAXT, T. AND LUNDERVOLD, A. 1994. Multispectral analysis of the brain using magnetic resonance imaging. IEEE Trans. Medical Imaging 13, 3, 470-481.Google ScholarGoogle Scholar
  187. TITTERINGTON, D. M., SMITH, A. F. M., AND MAKOV, U.E. 1985. Statistical Analysis of Finite Mixture Distributions. John Wiley and Sons, Inc., New York, NY.Google ScholarGoogle Scholar
  188. TOUSSAINT, G. T. 1980. The relative neighborhood graph of a finite planar set. Pattern Recogn. 12, 261-268.Google ScholarGoogle Scholar
  189. TRIER, O. D. AND JAIN, A. K. 1995. Goaldirected evaluation of binarization methods. IEEE Trans. Pattern Anal. Mach. Intell. 17, 1191-1201. Google ScholarGoogle Scholar
  190. UCHIYAMA, T. AND ARBIB, M.A. 1994. Color image segmentation using competitive learning. IEEE Trans. Pattern Anal. Mach. Intell. 16, 12 (Dec. 1994), 1197-1206. Google ScholarGoogle Scholar
  191. URQUHART, R.B. 1982. Graph theoretical clustering based on limited neighborhood sets. Pattern Recogn. 15, 173-187.Google ScholarGoogle Scholar
  192. VENKATESWARLU, N. B. AND RAJU, P. S. V. S. K. 1992. Fast ISODATA clustering algorithms. Pattern Recogn. 25, 3 (Mar. 1992), 335-342. Google ScholarGoogle Scholar
  193. VINOD, V. V., CHAUDHURY, S., MUKHERJEE, J., AND GHOSE, S. 1994. A connectionist approach for clustering with applications in image analysis. IEEE Trans. Syst. Man Cybern. 24, 365-384.Google ScholarGoogle Scholar
  194. WAH, B. W., Ed. 1996. Special section on mining of databases. IEEE Trans. Knowl. Data Eng. (Dec.).Google ScholarGoogle Scholar
  195. WARD, J. H. JR. 1963. Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58, 236-244.Google ScholarGoogle Scholar
  196. WATANABE, S. 1985. Pattern Recognition: Human and Mechanical. John Wiley and Sons, Inc., New York, NY. Google ScholarGoogle Scholar
  197. WESZKA, J. 1978. A survey of threshold selection techniques. Pattern Recogn. 7, 259-265.Google ScholarGoogle Scholar
  198. WHITLEY, D., STARKWEATHER, T., AND FUQUAY, D. 1989. Scheduling problems and traveling salesman: the genetic edge recombination. In Proceedings of the Third International Conference on Genetic Algorithms (George Mason University, June 4-7), J. D. Schaffer, Ed. Morgan Kaufmann Publishers Inc., San Francisco, CA, 133-140. Google ScholarGoogle Scholar
  199. WILSON, D. R. AND MARTINEZ, T. R. 1997. Improved heterogeneous distance functions. J. Artif Intell. Res. 6, 1-34. Google ScholarGoogle Scholar
  200. Wu, Z. AND LEAHY, R. 1993. An optimal graph theoretic approach to data clustering: Theory and its application to image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 15, 1101-1113. Google ScholarGoogle Scholar
  201. WULFEKUHLER, M. AND PUNCH, W. 1997. Finding salient features for personal web page categories. In Proceedings of the Sixth International Conference on the World Wide Web (Santa Clara, CA, Apr.), http://theory, stanford.edu/people/ wass/publications/Web Search/Web Search.html. Google ScholarGoogle Scholar
  202. ZADEH, L.A. 1965. Fuzzy sets. Inf. Control 8, 338 -353.Google ScholarGoogle Scholar
  203. ZAHN, C. T. 1971. Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans. Comput. C-20 (Apr.), 68-86.Google ScholarGoogle Scholar
  204. ZHANG, K. 1995. Algorithms for the constrained editing distance between ordered labeled trees and related problems. Pattern Recogn. 28, 463-474.Google ScholarGoogle Scholar
  205. ZHANG, J. AND MICHALSKI, R.S. 1995. An integration of rule induction and exemplar-based learning for graded concepts. Mach. Learn. 21, 3 (Dec. 1995), 235-267. Google ScholarGoogle Scholar
  206. ZHANG, T., RAMAKRISHNAN, R., AND LIVNY, M. 1996. BIRCH: An efficient data clustering method for very large databases. SIGMOD Rec. 25, 2, 103-114. Google ScholarGoogle Scholar
  207. ZUPAN, J. 1982. Clustering of Large Data Sets. Research Studies Press Ltd., Taunton, UK.Google ScholarGoogle Scholar

Recommendations

Reviews

Jose M. Ramirez

Data clustering is not defined the same way in each of the disciplines that use it to deal with problems that involve the extraction of information or structure from data. The authors have produced a good survey of this slippery topic. They devote a considerable amount of space to presenting clustering techniques from the perspective of several disciplines, including fuzzy systems, neural networks, and searching. The section of definitions and notations is weak: it is just a glossary of terms, with no context provided. It would have been better to define the terms when they were needed for each technique described. References are numerous, as expected in a survey, but are not annotated sufficiently to enable readers to define a research plan on a given aspect. In some places the paper does not reflect the state of the art in the use of clustering, as, for examples in neural networks and fuzzy systems. One cause of this weakness is the problem of dealing with a multidisciplinary subject whose advances are reported in a wide range of journals and proceedings. The other cause, namely the extremely long review process, is completely out of the authors' control: the paper was received in March 1997, but accepted in January 1999.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader