article

Free Access

Data clustering: a review

Authors:
A. K. Jain

Michigan State Univ., East Lansing

Michigan State Univ., East Lansing
View Profile

,
M. N. Murty

Indian Institute of Science, Bangalore, India

Indian Institute of Science, Bangalore, India
View Profile

,
P. J. Flynn

Ohio State Univ., Columbus

Ohio State Univ., Columbus
View Profile

Authors Info & Claims

ACM Computing Surveys Volume 31 Issue 3pp 264–323https://doi.org/10.1145/331499.331504

Published:01 September 1999Publication History

ACM Computing Surveys

Abstract

Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exploratory data analysis. However, clustering is a difficult problem combinatorially, and differences in assumptions and contexts in different communities has made the transfer of useful generic concepts and methodologies slow to occur. This paper presents an overview of pattern clustering methods from a statistical pattern recognition perspective, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners. We present a taxonomy of clustering techniques, and identify cross-cutting themes and recent advances. We also describe some important applications of clustering algorithms such as image segmentation, object recognition, and information retrieval.

References

AARTS, E. AND KORST, J. 1989. Simulated Annealing and Boltzmann Machines: A Stochastic Approach to Combinatorial Optimization and Neural Computing. Wiley-Interscience series in discrete mathematics and optimization. John Wiley and Sons, Inc., New York, NY. Google Scholar
ACM, 1994. ACM CR Classifications. ACM Computing Surveys 35, 5-16.Google Scholar
AL-SULTAN, K.S. 1995. A tabu search approach to clustering problems. Pattern Recogn. 28, 1443-1451.Google Scholar
AL-SULTAN, K. S. AND KHAN, M. M. 1996. Computational experience on four algorithms for the hard clustering problem. Pattern Recogn. Lett. 17, 3, 295-308. Google Scholar
ALLEN, P. A. AND ALLEN, J. R. 1990. Basin Analysis: Principles and Applications. Blackwell Scientific Publications, Inc., Cambridge, MA.Google Scholar
ALTA VISTA, 1999. http://altavista.digital.com.Google Scholar
AMADASUN, M. AND KING, R.A. 1988. Low-level segmentation of multispectral images via agglomerative clustering of uniform neighbourhoods. Pattern Recogn. 21, 3 (1988), 261-268. Google Scholar
ANDERBERG, M. R. 1973. Cluster Analysis for Applications. Academic Press, Inc., New York, NY.Google Scholar
AUGUSTSON, J. G. AND MINKER, J. 1970. An analysis of some graph theoretical clustering techniques. J. ACM 17, 4 (Oct. 1970), 571- 588. Google Scholar
BABU, G. P. AND MURTY, M. N. 1993. A nearoptimal initial seed value selection in K-means algorithm using a genetic algorithm. Pattern Recogn. Lett. 14, 10 (Oct. 1993), 763- 769. Google Scholar
BABU, G. P. AND MURTY, M.N. 1994. Clustering with evolution strategies. Pattern Recogn. 27, 321-329.Google Scholar
BABU, G. P., MURTY, M. N., AND KEERTHI, S. S. 2000. Stochastic connectionist approach for pattern clustering (To appear). IEEE Trans. Syst. Man Cybern. Google Scholar
BACKER, F. B. AND HUBERT, L.g. 1976. A graphtheoretic approach to goodness-of-fit in complete-link hierarchical clustering. J. Am. Stat. Assoc. 71,870-878.Google Scholar
BACKER, E. 1995. Computer-Assisted Reasoning in Cluster Analysis. Prentice Hall International (UK) Ltd., Hertfordshire, UK. Google Scholar
BAEZA-YATES, R.A. 1992. Introduction to data structures and algorithms related to information retrieval. In Information Retrieval: Data Structures and Algorithms, W. B. Frakes and R. Baeza-Yates, Eds. Prentice- Hall, Inc., Upper Saddle River, NJ, 13-27. Google Scholar
BAJCSY, P. 1997. Hierarchical segmentation and clustering using similarity analysis. Ph.D. Dissertation. Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL. Google Scholar
BALL, G. H. AND HALL, D.J. 1965. ISODATA, a novel method of data analysis and classification. Tech. Rep. Stanford University, Stanford, CA.Google Scholar
BENTLEY, J. L. AND FRIEDMAN, J.H. 1978. Fast algorithms for constructing minimal spanning trees in coordinate spaces. IEEE Trans. Comput. C-27, 6 (June), 97-105.Google Scholar
BEZDEK, J. C. 1981. Pattern Recognition With Fuzzy Objective Function Algorithms. Plenum Press, New York, NY. Google Scholar
BHUYAN, J. N., RAGHAVAN, V. V., AND VENKATESH, K.E. 1991. Genetic algorithm for clustering with an ordered representation. In Proceedings of the Fourth International Conference on Genetic Algorithms, 408-415.Google Scholar
BISWAS, G., WEINBERG, J., AND LI, C. 1995. A Conceptual Clustering Method for Knowledge Discovery in Databases. Editions Technip.Google Scholar
BRAILOVSKY, V. L. 1991. A probabilistic approach to clustering. Pattern Recogn. Lett. 12, 4 (Apr. 1991), 193-198. Google Scholar
BRODATZ, P. 1966. Textures: A Photographic Album for Artists and Designers. Dover Publications, Inc., Mineola, NY.Google Scholar
CAN, F. 1993. Incremental clustering for dynamic information processing. ACM Trans. Inf. Syst. 11, 2 (Apr. 1993), 143-164. Google Scholar
CARPENTER, G. AND GROSSBERG, S. 1990. ART3: Hierarchical search using chemical transmitters in self-organizing pattern recognition architectures. Neural Networks 3, 129-152.Google Scholar
CHEKURI, C., GOLDWASSER, M. H., RAGHAVAN, P., AND UPFAL, E. 1997. Web search using automatic classification. In Proceedings of the Sixth International Conference on the World Wide Web (Santa Clara, CA, Apr.), http:// theory.stanford.edu/people/wass/publications/ Web Search/Web Search.html.Google Scholar
CHENG, C. H. 1995. A branch-and-bound clustering algorithm. IEEE Trans. Syst. Man Cybern. 25, 895-898.Google Scholar
CHENG, Y. AND FU, K.S. 1985. Conceptual clustering in knowledge organization. IEEE Trans. Pattern Anal. Mach. Intell. 7, 592-598.Google Scholar
CHENG, Y. 1995. Mean shift, mode seeking, and clustering. IEEE Trans. Pattern Anal. Mach. Intell. 17, 7 (July), 790-799. Google Scholar
CHIEN, Y.T. 1978. Interactive Pattern Recognition. Marcel Dekker, Inc., New York, NY. Google Scholar
CHOUDHURY, S. AND MURTY, M.N. 1990. A divisive scheme for constructing minimal spanning trees in coordinate space. Pattern Recogn. Lett. 11, 6 (Jun. 1990), 385-389. Google Scholar
1996. Special issue on data mining. Commun. ACM 39, 11.Google Scholar
COLEMAN, G. B. AND ANDREWS, H. C. 1979. Image segmentation by clustering. Proc. IEEE 67, 5, 773-785.Google Scholar
CONNELL, S. AND JAIN, A. K. 1998. Learning prototypes for on-line handwritten digits. In Proceedings of the 14th International Conference on Pattern Recognition (Brisbane, Australia, Aug.), 182-184. Google Scholar
CROSS, S. E., Ed. 1996. Special issue on data mining. IEEE Expert 11, 5 (Oct.). Google Scholar
DALE, M. B. 1985. On the comparison of conceptual clustering and numerical taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 7, 241-244.Google Scholar
DAVE, R. N. 1992. Generalized fuzzy C-shells clustering and detection of circular and elliptic boundaries. Pattern Recogn. 25, 713-722.Google Scholar
DAVIS, T., Ed. 1991. The Handbook of Genetic Algorithms. Van Nostrand Reinhold Co., New York, NY.Google Scholar
DAY, W. H.E. 1992. Complexity theory: An introduction for practitioners of classification. In Clustering and Classification, P. Arabie and L. Hubert, Eds. World Scientific Publishing Co., Inc., River Edge, NJ.Google Scholar
DEMPSTER, A. P., LAIRD, N. M., AND RUB IN, D. B. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Stat. Soc. B. 39, 1, 1-38.Google Scholar
DIDAY, E. 1973. The dynamic cluster method in non-hierarchical clustering. J. Comput. Inf. Sci. 2, 61-88.Google Scholar
DIDAY, E. AND SIMON, J. C. 1976. Clustering analysis. In Digital Pattern Recognition, K. S. Fu, Ed. Springer-Verlag, Secaucus, NJ, 47-94.Google Scholar
DIDAY, E. 1988. The symbolic approach in clustering. In Classification and Related Methods, H. H. Bock, Ed. North-Holland Publishing Co., Amsterdam, The Netherlands.Google Scholar
DORAI, C. AND JAIN, A.K. 1995. Shape spectra based view grouping for free-form objects. In Proceedings of the International Conference on Image Processing (ICIP-95), 240-243. Google Scholar
DUBES, R. C. AND JAIN, A. K. 1976. Clustering techniques: The user's dilemma. Pattern Recogn. 8, 247-260.Google Scholar
DUBES, R. C. AND JAIN, A. K. 1980. Clustering methodology in exploratory data analysis. In Advances in Computers, M. C. Yovits,, Ed. Academic Press, Inc., New York, NY, 113- 125.Google Scholar
DUBES, R. C. 1987. How many clusters are best?--an experiment. Pattern Recogn. 20, 6 (Nov. 1, 1987), 645-663. Google Scholar
DUBES, R.C. 1993. Cluster analysis and related issues. In Handbook of Pattern Recognition & Computer Vision, C. H. Chen, L. F. Pau, and P. S. P. Wang, Eds. World Scientific Publishing Co., Inc., River Edge, NJ, 3-32. Google Scholar
DUBUISSON, M. P. AND JAIN, A.K. 1994. A modified Hausdorff distance for object matching. In Proceedings of the International Conference on Pattern Recognition (ICPR '94), 566-568.Google Scholar
DUDA, R. O. AND HART, P. E. 1973. Pattern Classification and Scene Analysis. John Wiley and Sons, Inc., New York, NY. Google Scholar
DUNN, S., JANOS, L., AND ROSENFELD, A. 1983. Bimean clustering. Pattern Recogn. Lett. 1, 169-173.Google Scholar
DURAN, B. S. AND ODELL, P. L. 1974. Cluster Analysis: A Survey. Springer-Verlag, New York, NY.Google Scholar
EDDY, W. F., MOCKUS, A., AND OUE, S. 1996. Approximate single linkage cluster analysis of large data sets in high-dimensional spaces. Comput. Stat. Data Anal. 23, 1, 29-43. Google Scholar
ETZIONI, O. 1996. The World-Wide Web: quagmire or gold mine? Commun. ACM 39, 11, 65-68. Google Scholar
EVERITT, B.S. 1993. Cluster Analysis. Edward Arnold, Ltd., London, UK.Google Scholar
FABER, V. 1994. Clustering and the continuous k-means algorithm. Los Alamos Science 22, 138-144.Google Scholar
FABER, V., HOCHBERG, J. C., KELLY, P. M., THOMAS, T. R., AND WHITE, J.M. 1994. Concept extraction: A data-mining technique. Los Alamos Science 22, 122-149.Google Scholar
FAYYAD, U. M. 1996. Data mining and knowledge discovery: Making sense out of data. IEEE Expert 11, 5 (Oct.), 20-25. Google Scholar
FISHER, D. AND LANGLEY, P. 1986. Conceptual clustering and its relation to numerical taxonomy. In Artificial Intelligence and Statistics, A W. Gale, Ed. Addison-Wesley Longman Publ. Co., Inc., Reading, MA, 77-116. Google Scholar
FISHER, D. 1987. Knowledge acquisition via incremental conceptual clustering. Mach. Learn. 2, 139-172. Google Scholar
FISHER, D., Xu, L., CARNES, R., RICH, Y., FENVES, S. J., CHEN, J., SHIAVI, R., BISWAS, G., AND WEIN- BERG, J. 1993. Applying AI clustering to engineering tasks. IEEE Expert 8, 51-60. Google Scholar
FISHER, L. AND VAN NESS, J. W. 1971. Admissible clustering procedures. Biometrika 58, 91-104.Google Scholar
FLYNN, P. J. AND JAIN, A.K. 1991. BONSAI: 3D object recognition using constrained search. IEEE Trans. Pattern Anal. Mach. Intell. 13, 10 (Oct. 1991), 1066-1075. Google Scholar
FOGEL, D. B. AND SIMPSON, P.K. 1993. Evolving fuzzy clusters. In Proceedings of the International Conference on Neural Networks (San Francisco, CA), 1829-1834.Google Scholar
FOGEL, D. B. AND FOGEL, L. J., Eds. 1994. Special issue on evolutionary computation. IEEE Trans. Neural Netw. (Jan.).Google Scholar
FOGEL, L. J., OWENS, A. J., AND WALSH, M. J. 1965. Artificial Intelligence Through Simulated Evolution. John Wiley and Sons, Inc., New York, NY.Google Scholar
FRAKES, W. B. AND BAEZA-YATES, R., Eds. 1992. Information Retrieval: Data Structures and Algorithms. Prentice-Hall, Inc., Upper Saddle River, NJ. Google Scholar
FRED, A. L. N. AND LEITAO, J. M. N. 1996. A minimum code length technique for clustering of syntactic patterns. In Proceedings of the International Conference on Pattern Recognition (Vienna, Austria), 680-684. Google Scholar
FRED, A. L. N. 1996. Clustering of sequences using a minimum grammar complexity criterion. In Grammatical Inference: Learning Syntax from Sentences, L. Miclet and C. Higuera, Eds. Springer-Verlag, Secaucus, NJ, 107-116. Google Scholar
Fu, K. S. AND LU, S.Y. 1977. A clustering procedure for syntactic patterns. IEEE Trans. Syst. Man Cybern. 7, 734-742.Google Scholar
Fu, K. S. AND MUI, J. K. 1981. A survey on image segmentation. Pattern Recogn. 13, 3-16.Google Scholar
FUKUNAGA, Z. 1990. Introduction to Statistical Pattern Recognition. 2nd ed. Academic Press Prof., Inc., San Diego, CA. Google Scholar
GLOVER, F. 1986. Future paths for integer programming and links to artificial intelligence. Comput. Oper. Res. 13, 5 (May 1986), 533- 549. Google Scholar
GOLDBERG, D. E. 1989. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Publishing Co., Inc., Redwood City, CA. Google Scholar
GORDON, A. D. AND HENDERSON, J. T. 1977. Algorithm for Euclidean sum of squares. Biometrics 33, 355-362.Google Scholar
GOTLIEB, G. C. AND KUMAR, S. 1968. Semantic clustering of index terms. J. ACM 15, 493- 513. Google Scholar
GOWDA, K. C. 1984. A feature reduction and unsupervised classification algorithm for multispectral data. Pattern Recogn. 17, 6, 667- 676.Google Scholar
GOWDA, K. C. AND KRISHNA, G. 1977. Agglomerative clustering using the concept of mutual nearest neighborhood. Pattern Recogn. 10, 105-112.Google Scholar
GOWDA, K. C. AND DIDAY, E. 1992. Symbolic clustering using a new dissimilarity meG- sure. IEEE Trans. Syst. Man Cybern. 22, 368-378.Google Scholar
GOWER, J. C. AND ROSS, G. J.S. 1969. Minimum spanning rees and single-linkage cluster analysis. Appl. Stat. 18, 54-64.Google Scholar
GREFENSTETTE, J 1986. Optimization of control parameters for genetic algorithms. IEEE Trans. Syst. Man Cybern. SMC-16, 1 (Jan./ Feb. 1986), 122-128. Google Scholar
HARALICK, R. M. AND KELLY, G. L. 1969. Pattern recognition with measurement space and spatial clustering for multiple images. Proc. IEEE 57, 4, 654-665.Google Scholar
HARTIGAN, J. A. 1975. Clustering Algorithms. John Wiley and Sons, Inc., New York, NY. Google Scholar
HEDBERG, S. 1996. Searching for the mother lode: Tales of the first data miners. IEEE Expert 11, 5 (Oct.), 4-7. Google Scholar
HERTZ, J., KROGH, A., AND PALMER, R. G. 1991. Introduction to the Theory of Neural Computation. Santa Fe Institute Studies in the Sciences of Complexity lecture notes. Addison- Wesley Longman Publ. Co., Inc., Reading, MA. Google Scholar
HOFFMAN, R. AND JAIN, A. K. 1987. Segmentation and classification of range images. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-9, 5 (Sept. 1987), 608-620. Google Scholar
HOFMANN, T. AND BUHMANN, J. 1997. Pairwise data clustering by deterministic annealing. IEEE Trans. Pattern Anal. Mach. Intell. 19, 1 (Jan.), 1-14. Google Scholar
HOFMANN, T., PUZICHA, J., AND BUCHMANN, J. M. 1998. Unsupervised texture segmentation in a deterministic annealing framework. IEEE Trans. Pattern Anal. Mach. Intell. 20, 8, 803-818. Google Scholar
HOLLAND, J.H. 1975. Adaption in Natural and Artificial Systems. University of Michigan Press, Ann Arbor, MI. Google Scholar
HOOVER, A., JEAN-BAPTISTE, G., JIANG, X., FLYNN, P. J., BUNKE, H., GOLDGOF, D. B., BOWYER, K., EGGERT, D. W., FITZGIBBON, A., AND FISHER, R. B. 1996. An experimental comparison of range image segmentation algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 18, 7, 673- 689. Google Scholar
HUTTENLOCHER, D. P., KLANDERMAN, G. A., AND RUCKLIDGE, W.J. 1993. Comparing images using the Hausdorff distance. IEEE Trans. Pattern Anal. Mach. Intell. 15, 9, 850-863. Google Scholar
ICHINO, M. AND YAGUCHI, H. 1994. Generalized Minkowski metrics for mixed feature-type data analysis. IEEE Trans. Syst. Man Cybern. 24, 698-708.Google Scholar
1991. Proceedings of the International Joint Conference on Neural Networks. (IJCNN'91).Google Scholar
1992. Proceedings of the International Joint Conference on Neural Networks.Google Scholar
ISMAIL, M. A. AND KAMEL, M. S. 1989. Multidimensional data clustering utilizing hybrid search strategies. Pattern Recogn. 22, 1 (Jan. 1989), 75-89. Google Scholar
JAIN, A. K. AND DUBES, R.C. 1988. Algorithms for Clustering Data. Prentice-Hall advanced reference series. Prentice-Hall, Inc., Upper Saddle River, NJ. Google Scholar
JAIN, A. K. AND FARROKHNIA, F. 1991. Unsupervised texture segmentation using Gabor filters. Pattern Recogn. 24, 12 (Dec. 1991), 1167-1186. Google Scholar
JAIN, A. K. AND BHATTACHARJEE, S. 1992. Text segmentation using Gabor filters for automatic document processing. Mach. Vision Appl. 5, 3 (Summer 1992), 169-184. Google Scholar
JAIN, A. J. AND FLYNN, P. J., Eds. 1993. Three Dimensional Object Recognition Systems. Elsevier Science Inc., New York, NY. Google Scholar
JAIN, A. K. AND MAO, J. 1994. Neural networks and pattern recognition. In Computational Intelligence: Imitating Life, J. M. Zurada, R. J. Marks, and C. J. Robinson, Eds. 194- 212.Google Scholar
JAIN, A. K. AND FLYNN, P.J. 1996. Image segmentation using clustering. In Advances in Image Understanding: A Festschrift for Azriel Rosenfeld, N. Ahuja and K. Bowyer, Eds, IEEE Press, Piscataway, NJ, 65-83.Google Scholar
JAIN, A. K. AND MAO, J. 1996. Artificial neural networks: A tutorial. IEEE Computer 29 (Mar.), 31-44. Google Scholar
JAIN, A. K., RATHA, N. K., AND LAKSHMANAN, S. 1997. Object detection using Gabor filters. Pattern Recogn. 30, 2, 295-309.Google Scholar
JAIN, N. C., INDRAYAN, A., AND GOEL, L. R. 1986. Monte Carlo comparison of six hierarchical clustering methods on random data. Pattern Recogn. 19, 1 (Jan./Feb. 1986), 95-99. Google Scholar
JAIN, R., KASTURI, R., AND SCHUNCK, B. G. 1995. Machine Vision. McGraw-Hill series in computer science. McGraw-Hill, Inc., New York, NY. Google Scholar
JARVIS, R. A. AND PATRICK, E. A. 1973. Clustering using a similarity method based on shared near neighbors. IEEE Trans. Comput. C-22, 8 (Aug.), 1025-1034.Google Scholar
JOLION, J.-M., MEER, P., AND BATAOUCHE, S. 1991. Robust clustering with applications in computer vision. IEEE Trans. Pattern Anal. Mach. Intell. 13, 8 (Aug. 1991), 791-802. Google Scholar
JONES, D. AND BELTRAMO, M.A. 1991. Solving partitioning problems with genetic algorithms. In Proceedings of the Fourth International Conference on Genetic Algorithms, 442-449.Google Scholar
JUDD, D., MCKINLEY, P., AND JAIN, A. K. 1996. Large-scale parallel data clustering. In Proceedings of the International Conference on Pattern Recognition (Vienna, AustriG), 488-493. Google Scholar
KING, B. 1967. Step-wise clustering procedures. J. Am. Stat. Assoc. 69, 86-101.Google Scholar
KIRKPATRICK, S., GELATT, C. D., JR., AND VECCHI, M.P. 1983. Optimization by simulated annealing. Science 220, 4598 (May), 671-680.Google Scholar
KLEIN, R. W. AND DUBES, R. C. 1989. Experiments in projection and clustering by simulated annealing. Pattern Recogn. 22, 213-220.Google Scholar
KNUTH, D. 1973. The Art of Computer Programming. Addison-Wesley, Reading, MA. Google Scholar
KOONTZ, W. L. G., FUKUNAGA, K., AND NARENDRA, P.M. 1975. A branch and bound clustering algorithm. IEEE Trans. Comput. 23, 908- 914.Google Scholar
KOHONEN, T. 1989. Self-Organization andAssociative Memory. 3rd ed. Springer information sciences series. Springer-Verlag, New York, NY. Google Scholar
KRAAIJVELD, M., MAO, J., AND JAIN, A. K. 1995. A non-linear projection method based on Kohonen's topology preserving maps. IEEE Trans. Neural Netw. 6, 548-559. Google Scholar
KRISHNAPURAM, R., FRIGUI, H., AND NASRAOUI, O. 1995. Fuzzy and probabilistic shell clustering algorithms and their application to boundary detection and surface approximation. IEEE Trans. Fuzzy Systems 3, 29-60. Google Scholar
KURITA, T. 1991. An efficient agglomerative clustering algorithm using a heap. Pattern Recogn. 24, 3 (1991), 205-209. Google Scholar
LIBRARY OF CONGRESS, 1990. LC classification outline. Library of Congress, Washington, DC.Google Scholar
LEBOWITZ, M. 1987. Experiments with incremental concept formation. Mach. Learn. 2, 103-138. Google Scholar
LEE, H.-Y. AND ONG, H.-L. 1996. Visualization support for data mining. IEEE Expert 11, 5 (Oct.), 69-75. Google Scholar
LEE, R. C. T., SLAGLE, J. R., AND MONG, C. T. 1978. Towards automatic auditing of records. IEEE Trans. Softw. Eng. 4, 441- 448.Google Scholar
LEE, R. C. T. 1981. Cluster analysis and its applications. In Advances in Information Systems Science, J. T. Tou, Ed. Plenum Press, New York, NY.Google Scholar
LI, C. AND BISWAS, G. 1995. Knowledge-based scientific discovery in geological databases. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (Montreal, Canada, Aug. 20-21), 204 -209.Google Scholar
Lu, S. Y. AND FU, K. S. 1978. A sentence-tosentence clustering procedure for pattern analysis. IEEE Trans. Syst. Man Cybern. 8, 381-389.Google Scholar
LUNDERVOLD, A., FENSTAD, A. M., ERSLAND, L., AND TAXT, T. 1996. Brain tissue volumes from multispectral 3D MRI: A comparative study of four classifiers. In Proceedings of the Conference of the Society on Magnetic Resonance,Google Scholar
MAAREK, Y. S. AND BEN SHAUL, I. Z. 1996. Automatically organizing bookmarks per contents. In Proceedings of the Fifth International Conference on the World Wide Web (Paris, May), http://www5conf.inria.fr/fichhtml/paper-sessions.html. Google Scholar
MCQUEEN, J. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 281-297.Google Scholar
MAO, J. AND JAIN, A.K. 1992. Texture classification and segmentation using multiresolution simultaneous autoregressive models. Pattern Recogn. 25, 2 (Feb. 1992), 173-188. Google Scholar
MAO, J. AND JAIN, A.K. 1995. Artificial neural networks for feature extraction and multivariate data projection. IEEE Trans. Neural Netw. 6, 296-317. Google Scholar
MAO, J. AND JAIN, A.K. 1996. A self-organizing network for hyperellipsoidal clustering (HEC). IEEE Trans. Neural Netw. 7, 16-29. Google Scholar
MEVINS, A.J. 1995. A branch and bound incremental conceptual clusterer. Mach. Learn. 18, 5-22. Google Scholar
MICHALSKI, R., STEPP, R. E., AND DIDAY, E. 1981. A recent advance in data analysis: Clustering objects into classes characterized by conjunctive concepts. In Progress in Pattern Recognition, Vol. 1, L. Kanal and A. Rosenfeld, Eds. North-Holland Publishing Co., Amsterdam, The Netherlands.Google Scholar
MICHALSKI, R., STEPP, R. E., AND DIDAY, E. 1983. Automated construction of classifications: conceptual clustering versus numerical taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-5, 5 (Sept.), 396-409.Google Scholar
MISHRA, S. K. AND RAGHAVAN, V. V. 1994. An empirical study of the performance of heuristic methods for clustering. In Pattern Recognition in Practice, E. S. Gelsema and L. N. Kanal, Eds. 425-436.Google Scholar
MITCHELL, T. 1997. Machine Learning. McGraw- Hill, Inc., New York, NY. Google Scholar
MOHIUDDIN, K. M. AND MAO, g. 1994. A comparative study of different classifiers for handprinted character recognition. In Pattern Recognition in Practice, E. S. Gelsema and L. N. Kanal, Eds. 437-448.Google Scholar
MOOR, B.K. 1988. ART 1 and Pattern Clustering. In 1988 Connectionist Summer School, Morgan Kaufmann, San Mateo, CA, 174-185.Google Scholar
MURTAGH, F. 1984. A survey of recent advances in hierarchical clustering algorithms which use cluster centers. Comput. J. 26, 354-359.Google Scholar
MURTY, M. N. AND KRISHNA, G. 1980. A computationally efficient technique for data clustering. Pattern Recogn. 12, 153-158.Google Scholar
MURTY, M. N. AND JAIN, A.K. 1995. Knowledgebased clustering scheme for collection management and retrieval of library books. Pattern Recogn. 28, 949-964.Google Scholar
NAGY, G. 1968. State of the art in pattern recognition. Proc. IEEE 56, 836-862.Google Scholar
NG, R. AND HAN, J. 1994. Very large data bases. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB'94, Santiago, Chile, Sept.), VLDB Endowment, Berkeley, CA, 144-155. Google Scholar
NGUYEN, H. H. AND COHEN, P. 1993. Gibbs random fields, fuzzy clustering, and the unsupervised segmentation of textured images. CV- GIP: Graph. Models Image Process. 55, 1 (Jan. 1993), 1-19. Google Scholar
OEHLER, K. L. AND GRAY, R. M. 1995. Combining image compression and classification using vector quantization. IEEE Trans. Pattern Anal. Mach. Intell. 17, 461-473. Google Scholar
OJA, E. 1982. A simplified neuron model as a principal component analyzer. Bull. Math. Bio. 15, 267-273.Google Scholar
OZAWA, K. 1985. A stratificational overlapping cluster scheme. Pattern Recogn. 18, 279-286.Google Scholar
OPEN TEXT, 1999. http://index.opentext.net.Google Scholar
KAMGAR-PARSI, B., GUALTIERI, J. A., DEVANEY, J. A., AND KAMGAR-PARSI, K. 1990. Clustering with neural networks. Biol. Cybern. 63, 201-208.Google Scholar
LYCOS, 1999. http://www.lycos.com.Google Scholar
PAL, N. R., BEZDEK, J. C., AND TSAO, E. C.-K. 1993. Generalized clustering networks and Kohonen's self-organizing scheme. IEEE Trans. Neural Netw. 4, 549-557.Google Scholar
QUINLAN, J. R. 1990. Decision trees and decision making. IEEE Trans. Syst. Man Cybern. 20, 339-346.Google Scholar
RAGHAVAN, V. V. AND BIRCHAND, K. 1979. A clustering strategy based on a formalism of the reproductive process in a natural system. In Proceedings of the Second International Conference on Information Storage and Retrieval, 10-22. Google Scholar
RAGHAVAN, V. V. AND YU, C.T. 1981. A comparison of the stability characteristics of some graph theoretic clustering methods. IEEE Trans. Pattern Anal. Mach. Intell. 3, 393-402.Google Scholar
RASMUSSEN, E. 1992. Clustering algorithms. In Information Retrieval: Data Structures and Algorithms, W. B. Frakes and R. Baeza-Yates, Eds. Prentice-Hall, Inc., Upper Saddle River, NJ, 419-442. Google Scholar
RICH, E. 1983. ArtificialIntelligence. McGraw- Hill, Inc., New York, NY. Google Scholar
RIPLEY, B. D., Ed. 1989. Statistical Inference for Spatial Processes. Cambridge University Press, New York, NY. Google Scholar
ROSE, K., GUREWITZ, E., AND FOX, G. C. 1993. Deterministic annealing approach to constrained clustering. IEEE Trans. Pattern Anal. Mach. Intell. 15, 785-794. Google Scholar
ROSENFELD, A. AND KAK, A.C. 1982. Digital Picture Processing. 2nd ed. Academic Press, Inc., New York, NY. Google Scholar
ROSENFELD, A., SCHNEIDER, V. B., AND HUANG, M. K. 1969. An application of cluster detection to text and picture processing. IEEE Trans. Inf. Theor. 15, 6, 672-681.Google Scholar
Ross, G. J. S. 1968. Classification techniques for large sets of data. In Numerical Taxonomy, A. J. Cole, Ed. Academic Press, Inc., New York, NY.Google Scholar
RuSPINI, E.H. 1969. A new approach to clustering. Inf. Control 15, 22-32.Google Scholar
SALTON, G. 1991. Developments in automatic text retrieval. Science 253, 974-980.Google Scholar
SAMAL, A. AND IYENGAR, P.A. 1992. Automatic recognition and analysis of human faces and facial expressions: A survey. Pattern Recogn. 25, 1 (Jan. 1992), 65-77. Google Scholar
SAMMON, J. W. JR. 1969. A nonlinear mapping for data structure analysis. IEEE Trans. Comput. 18, 401-409.Google Scholar
SANGAL, R. 1991. Programming Paradigms in LISP. McGraw-Hill, Inc., New York, NY. Google Scholar
SCHACHTER, B. J., DAVIS, L. S., AND ROSENFELD, A. 1979. Some experiments in image segmentation by clustering of local feature values. Pattern Recogn. 11, 19-28.Google Scholar
SCHWEFEL, H.P. 1981. Numerical Optimization of Computer Models. John Wiley and Sons, Inc., New York, NY. Google Scholar
SELIM, S. Z. AND ISMAIL, M.A. 1984. K-meanstype algorithms: A generalized convergence theorem and characterization of local optimality. IEEE Trans. Pattern Anal. Mach. Intell. 6, 81-87.Google Scholar
SELIM, S. Z. AND ALSULTAN, K. 1991. A simulated annealing algorithm for the clustering problem. Pattern Recogn. 24, 10 (1991), 1003-1008. Google Scholar
SEN, A. AND SRIVASTAVA, M. 1990. Regression Analysis. Springer-Verlag, New York, NY.Google Scholar
SETHI, I. AND JAIN, A. K., Eds. 1991. Artificial Neural Networks and Pattern Recognition: Old and New Connections. Elsevier Science Inc., New York, NY. Google Scholar
SHEKAR, B., MURTY, N. M., AND KRISHNA, G. 1987. A knowledge-based clustering scheme. Pattern Recogn. Lett. 5, 4 (Apr. 1, 1987), 253- 259. Google Scholar
SILVERMAN, J. F. AND COOPER, D. B. 1988. Bayesian clustering for unsupervised estimation of surface and texture models. IEEE Trans. Pattern Anal. Mach. Intell. 10, 4 (July 1988), 482-495. Google Scholar
SIMOUDIS, E. 1996. Reality check for data mining. IEEE Expert 11, 5 (Oct.), 26-33. Google Scholar
SLAGLE, J. R., CHANG, C. L., AND HELLER, S. R. 1975. A clustering and data-reorganizing algorithm. IEEE Trans. Syst. Man Cybern. 5, 125-128.Google Scholar
SNEATH, P. H. A. AND SOKAL, R. R. 1973. Numerical Taxonomy. Freeman, London, UK.Google Scholar
SPATH, H. 1980. Cluster Analysis Algorithms for Data Reduction and Classification. Ellis Horwood, Upper Saddle River, NJ.Google Scholar
SOLBERG, A., TAXT, T., AND JAIN, A. 1996. A Markov random field model for classification of multisource satellite imagery. IEEE Trans. Geoscience and Remote Sensing 34, 1, 100-113.Google Scholar
SRIVASTAVA, A. AND MURTY, M. N 1990. A comparison between conceptual clustering and conventional clustering. Pattern Recogn. 23, 9 (1990), 975-981. Google Scholar
STAHL, H. 1986. Cluster analysis of large data sets. In Classification as a Tool of Research, W. Gaul and M. Schader, Eds. Elsevier North-Holland, Inc., New York, NY, 423-430.Google Scholar
STEPP, R. E. AND MICHALSKI, R. S. 1986. Conceptual clustering of structured objects: A goal-oriented approach. Artif. Intell. 28, 1 (Feb. 1986), 43-69. Google Scholar
SUTTON, M., STARK, L., AND BOWYER, K. 1993. Function-based generic recognition for multiple object categories. In Three-Dimensional Object Recognition Systems, A. Jain and P. J. Flynn, Eds. Elsevier Science Inc., New York, NY.Google Scholar
SYMON, M. J. 1977. Clustering criterion and multi-variate normal mixture. Biometrics 77, 35-43.Google Scholar
TANAKA, E. 1995. Theoretical aspects of syntactic pattern recognition. Pattern Recogn. 28, 1053-1061.Google Scholar
TAXT, T. AND LUNDERVOLD, A. 1994. Multispectral analysis of the brain using magnetic resonance imaging. IEEE Trans. Medical Imaging 13, 3, 470-481.Google Scholar
TITTERINGTON, D. M., SMITH, A. F. M., AND MAKOV, U.E. 1985. Statistical Analysis of Finite Mixture Distributions. John Wiley and Sons, Inc., New York, NY.Google Scholar
TOUSSAINT, G. T. 1980. The relative neighborhood graph of a finite planar set. Pattern Recogn. 12, 261-268.Google Scholar
TRIER, O. D. AND JAIN, A. K. 1995. Goaldirected evaluation of binarization methods. IEEE Trans. Pattern Anal. Mach. Intell. 17, 1191-1201. Google Scholar
UCHIYAMA, T. AND ARBIB, M.A. 1994. Color image segmentation using competitive learning. IEEE Trans. Pattern Anal. Mach. Intell. 16, 12 (Dec. 1994), 1197-1206. Google Scholar
URQUHART, R.B. 1982. Graph theoretical clustering based on limited neighborhood sets. Pattern Recogn. 15, 173-187.Google Scholar
VENKATESWARLU, N. B. AND RAJU, P. S. V. S. K. 1992. Fast ISODATA clustering algorithms. Pattern Recogn. 25, 3 (Mar. 1992), 335-342. Google Scholar
VINOD, V. V., CHAUDHURY, S., MUKHERJEE, J., AND GHOSE, S. 1994. A connectionist approach for clustering with applications in image analysis. IEEE Trans. Syst. Man Cybern. 24, 365-384.Google Scholar
WAH, B. W., Ed. 1996. Special section on mining of databases. IEEE Trans. Knowl. Data Eng. (Dec.).Google Scholar
WARD, J. H. JR. 1963. Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58, 236-244.Google Scholar
WATANABE, S. 1985. Pattern Recognition: Human and Mechanical. John Wiley and Sons, Inc., New York, NY. Google Scholar
WESZKA, J. 1978. A survey of threshold selection techniques. Pattern Recogn. 7, 259-265.Google Scholar
WHITLEY, D., STARKWEATHER, T., AND FUQUAY, D. 1989. Scheduling problems and traveling salesman: the genetic edge recombination. In Proceedings of the Third International Conference on Genetic Algorithms (George Mason University, June 4-7), J. D. Schaffer, Ed. Morgan Kaufmann Publishers Inc., San Francisco, CA, 133-140. Google Scholar
WILSON, D. R. AND MARTINEZ, T. R. 1997. Improved heterogeneous distance functions. J. Artif Intell. Res. 6, 1-34. Google Scholar
Wu, Z. AND LEAHY, R. 1993. An optimal graph theoretic approach to data clustering: Theory and its application to image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 15, 1101-1113. Google Scholar
WULFEKUHLER, M. AND PUNCH, W. 1997. Finding salient features for personal web page categories. In Proceedings of the Sixth International Conference on the World Wide Web (Santa Clara, CA, Apr.), http://theory, stanford.edu/people/ wass/publications/Web Search/Web Search.html. Google Scholar
ZADEH, L.A. 1965. Fuzzy sets. Inf. Control 8, 338 -353.Google Scholar
ZAHN, C. T. 1971. Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans. Comput. C-20 (Apr.), 68-86.Google Scholar
ZHANG, K. 1995. Algorithms for the constrained editing distance between ordered labeled trees and related problems. Pattern Recogn. 28, 463-474.Google Scholar
ZHANG, J. AND MICHALSKI, R.S. 1995. An integration of rule induction and exemplar-based learning for graded concepts. Mach. Learn. 21, 3 (Dec. 1995), 235-267. Google Scholar
ZHANG, T., RAMAKRISHNAN, R., AND LIVNY, M. 1996. BIRCH: An efficient data clustering method for very large databases. SIGMOD Rec. 25, 2, 103-114. Google Scholar
ZUPAN, J. 1982. Clustering of Large Data Sets. Research Studies Press Ltd., Taunton, UK.Google Scholar

Recommendations

Survey of Clustering: Algorithms and Applications

This article is a survey into clustering applications and algorithms. A number of important well-known clustering methods are discussed. The authors present a brief history of the development of the field of clustering, discuss various types of ...
Read More
A new approach to clustering data with arbitrary shapes

In this paper we propose a clustering algorithm to cluster data with arbitrary shapes without knowing the number of clusters in advance. The proposed algorithm is a two-stage algorithm. In the first stage, a neural network incorporated with an ART-like ...
Read More
A novel optimization approach towards improving separability of clusters
Abstract
The objective functions in optimization models of the sum-of-squares clustering problem reflect intra-cluster similarity and inter-cluster dissimilarities and in general, optimal values of these functions can be considered as ...
Highlights
- New optimization model is formulated for hard partitional clustering problem.
- ...
Read More

Reviews

Reviewer: Jose M. Ramirez

Data clustering is not defined the same way in each of the disciplines that use it to deal with problems that involve the extraction of information or structure from data. The authors have produced a good survey of this slippery topic. They devote a considerable amount of space to presenting clustering techniques from the perspective of several disciplines, including fuzzy systems, neural networks, and searching. The section of definitions and notations is weak: it is just a glossary of terms, with no context provided. It would have been better to define the terms when they were needed for each technique described. References are numerous, as expected in a survey, but are not annotated sufficiently to enable readers to define a research plan on a given aspect. In some places the paper does not reflect the state of the art in the use of clustering, as, for examples in neural networks and fuzzy systems. One cause of this weakness is the problem of dealing with a multidisciplinary subject whose advances are reported in a wide range of journals and proceedings. The other cause, namely the extremely long review process, is completely out of the authors' control: the paper was received in March 1997, but accepted in January 1999.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Computing Surveys Volume 31, Issue 3
Sept. 1999
101 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/331499
Issue’s Table of Contents

Copyright © 1999 ACM
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 September 1999
Published in csur Volume 31, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cluster analysis
clustering applications
exploratory data analysis
incremental clustering
similarity indices
unsupervised learning
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 9,443
  Total Citations
  View Citations
- 74,681
  Total Downloads
- Downloads (Last 12 months)5,639
- Downloads (Last 6 weeks)867
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Data clustering: a review

ACM Computing Surveys

Abstract

References

Cited By

Recommendations

Survey of Clustering: Algorithms and Applications

A new approach to clustering data with arbitrary shapes

A novel optimization approach towards improving separability of clusters

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Data clustering: a review

ACM Computing Surveys

Abstract

References

Cited By

Recommendations

Survey of Clustering: Algorithms and Applications

A new approach to clustering data with arbitrary shapes

A novel optimization approach towards improving separability of clusters

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media