Abstract
Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exploratory data analysis. However, clustering is a difficult problem combinatorially, and differences in assumptions and contexts in different communities has made the transfer of useful generic concepts and methodologies slow to occur. This paper presents an overview of pattern clustering methods from a statistical pattern recognition perspective, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners. We present a taxonomy of clustering techniques, and identify cross-cutting themes and recent advances. We also describe some important applications of clustering algorithms such as image segmentation, object recognition, and information retrieval.
- AARTS, E. AND KORST, J. 1989. Simulated Annealing and Boltzmann Machines: A Stochastic Approach to Combinatorial Optimization and Neural Computing. Wiley-Interscience series in discrete mathematics and optimization. John Wiley and Sons, Inc., New York, NY. Google Scholar
- ACM, 1994. ACM CR Classifications. ACM Computing Surveys 35, 5-16.Google Scholar
- AL-SULTAN, K.S. 1995. A tabu search approach to clustering problems. Pattern Recogn. 28, 1443-1451.Google Scholar
- AL-SULTAN, K. S. AND KHAN, M. M. 1996. Computational experience on four algorithms for the hard clustering problem. Pattern Recogn. Lett. 17, 3, 295-308. Google Scholar
- ALLEN, P. A. AND ALLEN, J. R. 1990. Basin Analysis: Principles and Applications. Blackwell Scientific Publications, Inc., Cambridge, MA.Google Scholar
- ALTA VISTA, 1999. http://altavista.digital.com.Google Scholar
- AMADASUN, M. AND KING, R.A. 1988. Low-level segmentation of multispectral images via agglomerative clustering of uniform neighbourhoods. Pattern Recogn. 21, 3 (1988), 261-268. Google Scholar
- ANDERBERG, M. R. 1973. Cluster Analysis for Applications. Academic Press, Inc., New York, NY.Google Scholar
- AUGUSTSON, J. G. AND MINKER, J. 1970. An analysis of some graph theoretical clustering techniques. J. ACM 17, 4 (Oct. 1970), 571- 588. Google Scholar
- BABU, G. P. AND MURTY, M. N. 1993. A nearoptimal initial seed value selection in K-means algorithm using a genetic algorithm. Pattern Recogn. Lett. 14, 10 (Oct. 1993), 763- 769. Google Scholar
- BABU, G. P. AND MURTY, M.N. 1994. Clustering with evolution strategies. Pattern Recogn. 27, 321-329.Google Scholar
- BABU, G. P., MURTY, M. N., AND KEERTHI, S. S. 2000. Stochastic connectionist approach for pattern clustering (To appear). IEEE Trans. Syst. Man Cybern. Google Scholar
- BACKER, F. B. AND HUBERT, L.g. 1976. A graphtheoretic approach to goodness-of-fit in complete-link hierarchical clustering. J. Am. Stat. Assoc. 71,870-878.Google Scholar
- BACKER, E. 1995. Computer-Assisted Reasoning in Cluster Analysis. Prentice Hall International (UK) Ltd., Hertfordshire, UK. Google Scholar
- BAEZA-YATES, R.A. 1992. Introduction to data structures and algorithms related to information retrieval. In Information Retrieval: Data Structures and Algorithms, W. B. Frakes and R. Baeza-Yates, Eds. Prentice- Hall, Inc., Upper Saddle River, NJ, 13-27. Google Scholar
- BAJCSY, P. 1997. Hierarchical segmentation and clustering using similarity analysis. Ph.D. Dissertation. Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL. Google Scholar
- BALL, G. H. AND HALL, D.J. 1965. ISODATA, a novel method of data analysis and classification. Tech. Rep. Stanford University, Stanford, CA.Google Scholar
- BENTLEY, J. L. AND FRIEDMAN, J.H. 1978. Fast algorithms for constructing minimal spanning trees in coordinate spaces. IEEE Trans. Comput. C-27, 6 (June), 97-105.Google Scholar
- BEZDEK, J. C. 1981. Pattern Recognition With Fuzzy Objective Function Algorithms. Plenum Press, New York, NY. Google Scholar
- BHUYAN, J. N., RAGHAVAN, V. V., AND VENKATESH, K.E. 1991. Genetic algorithm for clustering with an ordered representation. In Proceedings of the Fourth International Conference on Genetic Algorithms, 408-415.Google Scholar
- BISWAS, G., WEINBERG, J., AND LI, C. 1995. A Conceptual Clustering Method for Knowledge Discovery in Databases. Editions Technip.Google Scholar
- BRAILOVSKY, V. L. 1991. A probabilistic approach to clustering. Pattern Recogn. Lett. 12, 4 (Apr. 1991), 193-198. Google Scholar
- BRODATZ, P. 1966. Textures: A Photographic Album for Artists and Designers. Dover Publications, Inc., Mineola, NY.Google Scholar
- CAN, F. 1993. Incremental clustering for dynamic information processing. ACM Trans. Inf. Syst. 11, 2 (Apr. 1993), 143-164. Google Scholar
- CARPENTER, G. AND GROSSBERG, S. 1990. ART3: Hierarchical search using chemical transmitters in self-organizing pattern recognition architectures. Neural Networks 3, 129-152.Google Scholar
- CHEKURI, C., GOLDWASSER, M. H., RAGHAVAN, P., AND UPFAL, E. 1997. Web search using automatic classification. In Proceedings of the Sixth International Conference on the World Wide Web (Santa Clara, CA, Apr.), http:// theory.stanford.edu/people/wass/publications/ Web Search/Web Search.html.Google Scholar
- CHENG, C. H. 1995. A branch-and-bound clustering algorithm. IEEE Trans. Syst. Man Cybern. 25, 895-898.Google Scholar
- CHENG, Y. AND FU, K.S. 1985. Conceptual clustering in knowledge organization. IEEE Trans. Pattern Anal. Mach. Intell. 7, 592-598.Google Scholar
- CHENG, Y. 1995. Mean shift, mode seeking, and clustering. IEEE Trans. Pattern Anal. Mach. Intell. 17, 7 (July), 790-799. Google Scholar
- CHIEN, Y.T. 1978. Interactive Pattern Recognition. Marcel Dekker, Inc., New York, NY. Google Scholar
- CHOUDHURY, S. AND MURTY, M.N. 1990. A divisive scheme for constructing minimal spanning trees in coordinate space. Pattern Recogn. Lett. 11, 6 (Jun. 1990), 385-389. Google Scholar
- 1996. Special issue on data mining. Commun. ACM 39, 11.Google Scholar
- COLEMAN, G. B. AND ANDREWS, H. C. 1979. Image segmentation by clustering. Proc. IEEE 67, 5, 773-785.Google Scholar
- CONNELL, S. AND JAIN, A. K. 1998. Learning prototypes for on-line handwritten digits. In Proceedings of the 14th International Conference on Pattern Recognition (Brisbane, Australia, Aug.), 182-184. Google Scholar
- CROSS, S. E., Ed. 1996. Special issue on data mining. IEEE Expert 11, 5 (Oct.). Google Scholar
- DALE, M. B. 1985. On the comparison of conceptual clustering and numerical taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 7, 241-244.Google Scholar
- DAVE, R. N. 1992. Generalized fuzzy C-shells clustering and detection of circular and elliptic boundaries. Pattern Recogn. 25, 713-722.Google Scholar
- DAVIS, T., Ed. 1991. The Handbook of Genetic Algorithms. Van Nostrand Reinhold Co., New York, NY.Google Scholar
- DAY, W. H.E. 1992. Complexity theory: An introduction for practitioners of classification. In Clustering and Classification, P. Arabie and L. Hubert, Eds. World Scientific Publishing Co., Inc., River Edge, NJ.Google Scholar
- DEMPSTER, A. P., LAIRD, N. M., AND RUB IN, D. B. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Stat. Soc. B. 39, 1, 1-38.Google Scholar
- DIDAY, E. 1973. The dynamic cluster method in non-hierarchical clustering. J. Comput. Inf. Sci. 2, 61-88.Google Scholar
- DIDAY, E. AND SIMON, J. C. 1976. Clustering analysis. In Digital Pattern Recognition, K. S. Fu, Ed. Springer-Verlag, Secaucus, NJ, 47-94.Google Scholar
- DIDAY, E. 1988. The symbolic approach in clustering. In Classification and Related Methods, H. H. Bock, Ed. North-Holland Publishing Co., Amsterdam, The Netherlands.Google Scholar
- DORAI, C. AND JAIN, A.K. 1995. Shape spectra based view grouping for free-form objects. In Proceedings of the International Conference on Image Processing (ICIP-95), 240-243. Google Scholar
- DUBES, R. C. AND JAIN, A. K. 1976. Clustering techniques: The user's dilemma. Pattern Recogn. 8, 247-260.Google Scholar
- DUBES, R. C. AND JAIN, A. K. 1980. Clustering methodology in exploratory data analysis. In Advances in Computers, M. C. Yovits,, Ed. Academic Press, Inc., New York, NY, 113- 125.Google Scholar
- DUBES, R. C. 1987. How many clusters are best?--an experiment. Pattern Recogn. 20, 6 (Nov. 1, 1987), 645-663. Google Scholar
- DUBES, R.C. 1993. Cluster analysis and related issues. In Handbook of Pattern Recognition & Computer Vision, C. H. Chen, L. F. Pau, and P. S. P. Wang, Eds. World Scientific Publishing Co., Inc., River Edge, NJ, 3-32. Google Scholar
- DUBUISSON, M. P. AND JAIN, A.K. 1994. A modified Hausdorff distance for object matching. In Proceedings of the International Conference on Pattern Recognition (ICPR '94), 566-568.Google Scholar
- DUDA, R. O. AND HART, P. E. 1973. Pattern Classification and Scene Analysis. John Wiley and Sons, Inc., New York, NY. Google Scholar
- DUNN, S., JANOS, L., AND ROSENFELD, A. 1983. Bimean clustering. Pattern Recogn. Lett. 1, 169-173.Google Scholar
- DURAN, B. S. AND ODELL, P. L. 1974. Cluster Analysis: A Survey. Springer-Verlag, New York, NY.Google Scholar
- EDDY, W. F., MOCKUS, A., AND OUE, S. 1996. Approximate single linkage cluster analysis of large data sets in high-dimensional spaces. Comput. Stat. Data Anal. 23, 1, 29-43. Google Scholar
- ETZIONI, O. 1996. The World-Wide Web: quagmire or gold mine? Commun. ACM 39, 11, 65-68. Google Scholar
- EVERITT, B.S. 1993. Cluster Analysis. Edward Arnold, Ltd., London, UK.Google Scholar
- FABER, V. 1994. Clustering and the continuous k-means algorithm. Los Alamos Science 22, 138-144.Google Scholar
- FABER, V., HOCHBERG, J. C., KELLY, P. M., THOMAS, T. R., AND WHITE, J.M. 1994. Concept extraction: A data-mining technique. Los Alamos Science 22, 122-149.Google Scholar
- FAYYAD, U. M. 1996. Data mining and knowledge discovery: Making sense out of data. IEEE Expert 11, 5 (Oct.), 20-25. Google Scholar
- FISHER, D. AND LANGLEY, P. 1986. Conceptual clustering and its relation to numerical taxonomy. In Artificial Intelligence and Statistics, A W. Gale, Ed. Addison-Wesley Longman Publ. Co., Inc., Reading, MA, 77-116. Google Scholar
- FISHER, D. 1987. Knowledge acquisition via incremental conceptual clustering. Mach. Learn. 2, 139-172. Google Scholar
- FISHER, D., Xu, L., CARNES, R., RICH, Y., FENVES, S. J., CHEN, J., SHIAVI, R., BISWAS, G., AND WEIN- BERG, J. 1993. Applying AI clustering to engineering tasks. IEEE Expert 8, 51-60. Google Scholar
- FISHER, L. AND VAN NESS, J. W. 1971. Admissible clustering procedures. Biometrika 58, 91-104.Google Scholar
- FLYNN, P. J. AND JAIN, A.K. 1991. BONSAI: 3D object recognition using constrained search. IEEE Trans. Pattern Anal. Mach. Intell. 13, 10 (Oct. 1991), 1066-1075. Google Scholar
- FOGEL, D. B. AND SIMPSON, P.K. 1993. Evolving fuzzy clusters. In Proceedings of the International Conference on Neural Networks (San Francisco, CA), 1829-1834.Google Scholar
- FOGEL, D. B. AND FOGEL, L. J., Eds. 1994. Special issue on evolutionary computation. IEEE Trans. Neural Netw. (Jan.).Google Scholar
- FOGEL, L. J., OWENS, A. J., AND WALSH, M. J. 1965. Artificial Intelligence Through Simulated Evolution. John Wiley and Sons, Inc., New York, NY.Google Scholar
- FRAKES, W. B. AND BAEZA-YATES, R., Eds. 1992. Information Retrieval: Data Structures and Algorithms. Prentice-Hall, Inc., Upper Saddle River, NJ. Google Scholar
- FRED, A. L. N. AND LEITAO, J. M. N. 1996. A minimum code length technique for clustering of syntactic patterns. In Proceedings of the International Conference on Pattern Recognition (Vienna, Austria), 680-684. Google Scholar
- FRED, A. L. N. 1996. Clustering of sequences using a minimum grammar complexity criterion. In Grammatical Inference: Learning Syntax from Sentences, L. Miclet and C. Higuera, Eds. Springer-Verlag, Secaucus, NJ, 107-116. Google Scholar
- Fu, K. S. AND LU, S.Y. 1977. A clustering procedure for syntactic patterns. IEEE Trans. Syst. Man Cybern. 7, 734-742.Google Scholar
- Fu, K. S. AND MUI, J. K. 1981. A survey on image segmentation. Pattern Recogn. 13, 3-16.Google Scholar
- FUKUNAGA, Z. 1990. Introduction to Statistical Pattern Recognition. 2nd ed. Academic Press Prof., Inc., San Diego, CA. Google Scholar
- GLOVER, F. 1986. Future paths for integer programming and links to artificial intelligence. Comput. Oper. Res. 13, 5 (May 1986), 533- 549. Google Scholar
- GOLDBERG, D. E. 1989. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Publishing Co., Inc., Redwood City, CA. Google Scholar
- GORDON, A. D. AND HENDERSON, J. T. 1977. Algorithm for Euclidean sum of squares. Biometrics 33, 355-362.Google Scholar
- GOTLIEB, G. C. AND KUMAR, S. 1968. Semantic clustering of index terms. J. ACM 15, 493- 513. Google Scholar
- GOWDA, K. C. 1984. A feature reduction and unsupervised classification algorithm for multispectral data. Pattern Recogn. 17, 6, 667- 676.Google Scholar
- GOWDA, K. C. AND KRISHNA, G. 1977. Agglomerative clustering using the concept of mutual nearest neighborhood. Pattern Recogn. 10, 105-112.Google Scholar
- GOWDA, K. C. AND DIDAY, E. 1992. Symbolic clustering using a new dissimilarity meG- sure. IEEE Trans. Syst. Man Cybern. 22, 368-378.Google Scholar
- GOWER, J. C. AND ROSS, G. J.S. 1969. Minimum spanning rees and single-linkage cluster analysis. Appl. Stat. 18, 54-64.Google Scholar
- GREFENSTETTE, J 1986. Optimization of control parameters for genetic algorithms. IEEE Trans. Syst. Man Cybern. SMC-16, 1 (Jan./ Feb. 1986), 122-128. Google Scholar
- HARALICK, R. M. AND KELLY, G. L. 1969. Pattern recognition with measurement space and spatial clustering for multiple images. Proc. IEEE 57, 4, 654-665.Google Scholar
- HARTIGAN, J. A. 1975. Clustering Algorithms. John Wiley and Sons, Inc., New York, NY. Google Scholar
- HEDBERG, S. 1996. Searching for the mother lode: Tales of the first data miners. IEEE Expert 11, 5 (Oct.), 4-7. Google Scholar
- HERTZ, J., KROGH, A., AND PALMER, R. G. 1991. Introduction to the Theory of Neural Computation. Santa Fe Institute Studies in the Sciences of Complexity lecture notes. Addison- Wesley Longman Publ. Co., Inc., Reading, MA. Google Scholar
- HOFFMAN, R. AND JAIN, A. K. 1987. Segmentation and classification of range images. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-9, 5 (Sept. 1987), 608-620. Google Scholar
- HOFMANN, T. AND BUHMANN, J. 1997. Pairwise data clustering by deterministic annealing. IEEE Trans. Pattern Anal. Mach. Intell. 19, 1 (Jan.), 1-14. Google Scholar
- HOFMANN, T., PUZICHA, J., AND BUCHMANN, J. M. 1998. Unsupervised texture segmentation in a deterministic annealing framework. IEEE Trans. Pattern Anal. Mach. Intell. 20, 8, 803-818. Google Scholar
- HOLLAND, J.H. 1975. Adaption in Natural and Artificial Systems. University of Michigan Press, Ann Arbor, MI. Google Scholar
- HOOVER, A., JEAN-BAPTISTE, G., JIANG, X., FLYNN, P. J., BUNKE, H., GOLDGOF, D. B., BOWYER, K., EGGERT, D. W., FITZGIBBON, A., AND FISHER, R. B. 1996. An experimental comparison of range image segmentation algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 18, 7, 673- 689. Google Scholar
- HUTTENLOCHER, D. P., KLANDERMAN, G. A., AND RUCKLIDGE, W.J. 1993. Comparing images using the Hausdorff distance. IEEE Trans. Pattern Anal. Mach. Intell. 15, 9, 850-863. Google Scholar
- ICHINO, M. AND YAGUCHI, H. 1994. Generalized Minkowski metrics for mixed feature-type data analysis. IEEE Trans. Syst. Man Cybern. 24, 698-708.Google Scholar
- 1991. Proceedings of the International Joint Conference on Neural Networks. (IJCNN'91).Google Scholar
- 1992. Proceedings of the International Joint Conference on Neural Networks.Google Scholar
- ISMAIL, M. A. AND KAMEL, M. S. 1989. Multidimensional data clustering utilizing hybrid search strategies. Pattern Recogn. 22, 1 (Jan. 1989), 75-89. Google Scholar
- JAIN, A. K. AND DUBES, R.C. 1988. Algorithms for Clustering Data. Prentice-Hall advanced reference series. Prentice-Hall, Inc., Upper Saddle River, NJ. Google Scholar
- JAIN, A. K. AND FARROKHNIA, F. 1991. Unsupervised texture segmentation using Gabor filters. Pattern Recogn. 24, 12 (Dec. 1991), 1167-1186. Google Scholar
- JAIN, A. K. AND BHATTACHARJEE, S. 1992. Text segmentation using Gabor filters for automatic document processing. Mach. Vision Appl. 5, 3 (Summer 1992), 169-184. Google Scholar
- JAIN, A. J. AND FLYNN, P. J., Eds. 1993. Three Dimensional Object Recognition Systems. Elsevier Science Inc., New York, NY. Google Scholar
- JAIN, A. K. AND MAO, J. 1994. Neural networks and pattern recognition. In Computational Intelligence: Imitating Life, J. M. Zurada, R. J. Marks, and C. J. Robinson, Eds. 194- 212.Google Scholar
- JAIN, A. K. AND FLYNN, P.J. 1996. Image segmentation using clustering. In Advances in Image Understanding: A Festschrift for Azriel Rosenfeld, N. Ahuja and K. Bowyer, Eds, IEEE Press, Piscataway, NJ, 65-83.Google Scholar
- JAIN, A. K. AND MAO, J. 1996. Artificial neural networks: A tutorial. IEEE Computer 29 (Mar.), 31-44. Google Scholar
- JAIN, A. K., RATHA, N. K., AND LAKSHMANAN, S. 1997. Object detection using Gabor filters. Pattern Recogn. 30, 2, 295-309.Google Scholar
- JAIN, N. C., INDRAYAN, A., AND GOEL, L. R. 1986. Monte Carlo comparison of six hierarchical clustering methods on random data. Pattern Recogn. 19, 1 (Jan./Feb. 1986), 95-99. Google Scholar
- JAIN, R., KASTURI, R., AND SCHUNCK, B. G. 1995. Machine Vision. McGraw-Hill series in computer science. McGraw-Hill, Inc., New York, NY. Google Scholar
- JARVIS, R. A. AND PATRICK, E. A. 1973. Clustering using a similarity method based on shared near neighbors. IEEE Trans. Comput. C-22, 8 (Aug.), 1025-1034.Google Scholar
- JOLION, J.-M., MEER, P., AND BATAOUCHE, S. 1991. Robust clustering with applications in computer vision. IEEE Trans. Pattern Anal. Mach. Intell. 13, 8 (Aug. 1991), 791-802. Google Scholar
- JONES, D. AND BELTRAMO, M.A. 1991. Solving partitioning problems with genetic algorithms. In Proceedings of the Fourth International Conference on Genetic Algorithms, 442-449.Google Scholar
- JUDD, D., MCKINLEY, P., AND JAIN, A. K. 1996. Large-scale parallel data clustering. In Proceedings of the International Conference on Pattern Recognition (Vienna, AustriG), 488-493. Google Scholar
- KING, B. 1967. Step-wise clustering procedures. J. Am. Stat. Assoc. 69, 86-101.Google Scholar
- KIRKPATRICK, S., GELATT, C. D., JR., AND VECCHI, M.P. 1983. Optimization by simulated annealing. Science 220, 4598 (May), 671-680.Google Scholar
- KLEIN, R. W. AND DUBES, R. C. 1989. Experiments in projection and clustering by simulated annealing. Pattern Recogn. 22, 213-220.Google Scholar
- KNUTH, D. 1973. The Art of Computer Programming. Addison-Wesley, Reading, MA. Google Scholar
- KOONTZ, W. L. G., FUKUNAGA, K., AND NARENDRA, P.M. 1975. A branch and bound clustering algorithm. IEEE Trans. Comput. 23, 908- 914.Google Scholar
- KOHONEN, T. 1989. Self-Organization andAssociative Memory. 3rd ed. Springer information sciences series. Springer-Verlag, New York, NY. Google Scholar
- KRAAIJVELD, M., MAO, J., AND JAIN, A. K. 1995. A non-linear projection method based on Kohonen's topology preserving maps. IEEE Trans. Neural Netw. 6, 548-559. Google Scholar
- KRISHNAPURAM, R., FRIGUI, H., AND NASRAOUI, O. 1995. Fuzzy and probabilistic shell clustering algorithms and their application to boundary detection and surface approximation. IEEE Trans. Fuzzy Systems 3, 29-60. Google Scholar
- KURITA, T. 1991. An efficient agglomerative clustering algorithm using a heap. Pattern Recogn. 24, 3 (1991), 205-209. Google Scholar
- LIBRARY OF CONGRESS, 1990. LC classification outline. Library of Congress, Washington, DC.Google Scholar
- LEBOWITZ, M. 1987. Experiments with incremental concept formation. Mach. Learn. 2, 103-138. Google Scholar
- LEE, H.-Y. AND ONG, H.-L. 1996. Visualization support for data mining. IEEE Expert 11, 5 (Oct.), 69-75. Google Scholar
- LEE, R. C. T., SLAGLE, J. R., AND MONG, C. T. 1978. Towards automatic auditing of records. IEEE Trans. Softw. Eng. 4, 441- 448.Google Scholar
- LEE, R. C. T. 1981. Cluster analysis and its applications. In Advances in Information Systems Science, J. T. Tou, Ed. Plenum Press, New York, NY.Google Scholar
- LI, C. AND BISWAS, G. 1995. Knowledge-based scientific discovery in geological databases. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (Montreal, Canada, Aug. 20-21), 204 -209.Google Scholar
- Lu, S. Y. AND FU, K. S. 1978. A sentence-tosentence clustering procedure for pattern analysis. IEEE Trans. Syst. Man Cybern. 8, 381-389.Google Scholar
- LUNDERVOLD, A., FENSTAD, A. M., ERSLAND, L., AND TAXT, T. 1996. Brain tissue volumes from multispectral 3D MRI: A comparative study of four classifiers. In Proceedings of the Conference of the Society on Magnetic Resonance,Google Scholar
- MAAREK, Y. S. AND BEN SHAUL, I. Z. 1996. Automatically organizing bookmarks per contents. In Proceedings of the Fifth International Conference on the World Wide Web (Paris, May), http://www5conf.inria.fr/fichhtml/paper-sessions.html. Google Scholar
- MCQUEEN, J. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 281-297.Google Scholar
- MAO, J. AND JAIN, A.K. 1992. Texture classification and segmentation using multiresolution simultaneous autoregressive models. Pattern Recogn. 25, 2 (Feb. 1992), 173-188. Google Scholar
- MAO, J. AND JAIN, A.K. 1995. Artificial neural networks for feature extraction and multivariate data projection. IEEE Trans. Neural Netw. 6, 296-317. Google Scholar
- MAO, J. AND JAIN, A.K. 1996. A self-organizing network for hyperellipsoidal clustering (HEC). IEEE Trans. Neural Netw. 7, 16-29. Google Scholar
- MEVINS, A.J. 1995. A branch and bound incremental conceptual clusterer. Mach. Learn. 18, 5-22. Google Scholar
- MICHALSKI, R., STEPP, R. E., AND DIDAY, E. 1981. A recent advance in data analysis: Clustering objects into classes characterized by conjunctive concepts. In Progress in Pattern Recognition, Vol. 1, L. Kanal and A. Rosenfeld, Eds. North-Holland Publishing Co., Amsterdam, The Netherlands.Google Scholar
- MICHALSKI, R., STEPP, R. E., AND DIDAY, E. 1983. Automated construction of classifications: conceptual clustering versus numerical taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-5, 5 (Sept.), 396-409.Google Scholar
- MISHRA, S. K. AND RAGHAVAN, V. V. 1994. An empirical study of the performance of heuristic methods for clustering. In Pattern Recognition in Practice, E. S. Gelsema and L. N. Kanal, Eds. 425-436.Google Scholar
- MITCHELL, T. 1997. Machine Learning. McGraw- Hill, Inc., New York, NY. Google Scholar
- MOHIUDDIN, K. M. AND MAO, g. 1994. A comparative study of different classifiers for handprinted character recognition. In Pattern Recognition in Practice, E. S. Gelsema and L. N. Kanal, Eds. 437-448.Google Scholar
- MOOR, B.K. 1988. ART 1 and Pattern Clustering. In 1988 Connectionist Summer School, Morgan Kaufmann, San Mateo, CA, 174-185.Google Scholar
- MURTAGH, F. 1984. A survey of recent advances in hierarchical clustering algorithms which use cluster centers. Comput. J. 26, 354-359.Google Scholar
- MURTY, M. N. AND KRISHNA, G. 1980. A computationally efficient technique for data clustering. Pattern Recogn. 12, 153-158.Google Scholar
- MURTY, M. N. AND JAIN, A.K. 1995. Knowledgebased clustering scheme for collection management and retrieval of library books. Pattern Recogn. 28, 949-964.Google Scholar
- NAGY, G. 1968. State of the art in pattern recognition. Proc. IEEE 56, 836-862.Google Scholar
- NG, R. AND HAN, J. 1994. Very large data bases. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB'94, Santiago, Chile, Sept.), VLDB Endowment, Berkeley, CA, 144-155. Google Scholar
- NGUYEN, H. H. AND COHEN, P. 1993. Gibbs random fields, fuzzy clustering, and the unsupervised segmentation of textured images. CV- GIP: Graph. Models Image Process. 55, 1 (Jan. 1993), 1-19. Google Scholar
- OEHLER, K. L. AND GRAY, R. M. 1995. Combining image compression and classification using vector quantization. IEEE Trans. Pattern Anal. Mach. Intell. 17, 461-473. Google Scholar
- OJA, E. 1982. A simplified neuron model as a principal component analyzer. Bull. Math. Bio. 15, 267-273.Google Scholar
- OZAWA, K. 1985. A stratificational overlapping cluster scheme. Pattern Recogn. 18, 279-286.Google Scholar
- OPEN TEXT, 1999. http://index.opentext.net.Google Scholar
- KAMGAR-PARSI, B., GUALTIERI, J. A., DEVANEY, J. A., AND KAMGAR-PARSI, K. 1990. Clustering with neural networks. Biol. Cybern. 63, 201-208.Google Scholar
- LYCOS, 1999. http://www.lycos.com.Google Scholar
- PAL, N. R., BEZDEK, J. C., AND TSAO, E. C.-K. 1993. Generalized clustering networks and Kohonen's self-organizing scheme. IEEE Trans. Neural Netw. 4, 549-557.Google Scholar
- QUINLAN, J. R. 1990. Decision trees and decision making. IEEE Trans. Syst. Man Cybern. 20, 339-346.Google Scholar
- RAGHAVAN, V. V. AND BIRCHAND, K. 1979. A clustering strategy based on a formalism of the reproductive process in a natural system. In Proceedings of the Second International Conference on Information Storage and Retrieval, 10-22. Google Scholar
- RAGHAVAN, V. V. AND YU, C.T. 1981. A comparison of the stability characteristics of some graph theoretic clustering methods. IEEE Trans. Pattern Anal. Mach. Intell. 3, 393-402.Google Scholar
- RASMUSSEN, E. 1992. Clustering algorithms. In Information Retrieval: Data Structures and Algorithms, W. B. Frakes and R. Baeza-Yates, Eds. Prentice-Hall, Inc., Upper Saddle River, NJ, 419-442. Google Scholar
- RICH, E. 1983. ArtificialIntelligence. McGraw- Hill, Inc., New York, NY. Google Scholar
- RIPLEY, B. D., Ed. 1989. Statistical Inference for Spatial Processes. Cambridge University Press, New York, NY. Google Scholar
- ROSE, K., GUREWITZ, E., AND FOX, G. C. 1993. Deterministic annealing approach to constrained clustering. IEEE Trans. Pattern Anal. Mach. Intell. 15, 785-794. Google Scholar
- ROSENFELD, A. AND KAK, A.C. 1982. Digital Picture Processing. 2nd ed. Academic Press, Inc., New York, NY. Google Scholar
- ROSENFELD, A., SCHNEIDER, V. B., AND HUANG, M. K. 1969. An application of cluster detection to text and picture processing. IEEE Trans. Inf. Theor. 15, 6, 672-681.Google Scholar
- Ross, G. J. S. 1968. Classification techniques for large sets of data. In Numerical Taxonomy, A. J. Cole, Ed. Academic Press, Inc., New York, NY.Google Scholar
- RuSPINI, E.H. 1969. A new approach to clustering. Inf. Control 15, 22-32.Google Scholar
- SALTON, G. 1991. Developments in automatic text retrieval. Science 253, 974-980.Google Scholar
- SAMAL, A. AND IYENGAR, P.A. 1992. Automatic recognition and analysis of human faces and facial expressions: A survey. Pattern Recogn. 25, 1 (Jan. 1992), 65-77. Google Scholar
- SAMMON, J. W. JR. 1969. A nonlinear mapping for data structure analysis. IEEE Trans. Comput. 18, 401-409.Google Scholar
- SANGAL, R. 1991. Programming Paradigms in LISP. McGraw-Hill, Inc., New York, NY. Google Scholar
- SCHACHTER, B. J., DAVIS, L. S., AND ROSENFELD, A. 1979. Some experiments in image segmentation by clustering of local feature values. Pattern Recogn. 11, 19-28.Google Scholar
- SCHWEFEL, H.P. 1981. Numerical Optimization of Computer Models. John Wiley and Sons, Inc., New York, NY. Google Scholar
- SELIM, S. Z. AND ISMAIL, M.A. 1984. K-meanstype algorithms: A generalized convergence theorem and characterization of local optimality. IEEE Trans. Pattern Anal. Mach. Intell. 6, 81-87.Google Scholar
- SELIM, S. Z. AND ALSULTAN, K. 1991. A simulated annealing algorithm for the clustering problem. Pattern Recogn. 24, 10 (1991), 1003-1008. Google Scholar
- SEN, A. AND SRIVASTAVA, M. 1990. Regression Analysis. Springer-Verlag, New York, NY.Google Scholar
- SETHI, I. AND JAIN, A. K., Eds. 1991. Artificial Neural Networks and Pattern Recognition: Old and New Connections. Elsevier Science Inc., New York, NY. Google Scholar
- SHEKAR, B., MURTY, N. M., AND KRISHNA, G. 1987. A knowledge-based clustering scheme. Pattern Recogn. Lett. 5, 4 (Apr. 1, 1987), 253- 259. Google Scholar
- SILVERMAN, J. F. AND COOPER, D. B. 1988. Bayesian clustering for unsupervised estimation of surface and texture models. IEEE Trans. Pattern Anal. Mach. Intell. 10, 4 (July 1988), 482-495. Google Scholar
- SIMOUDIS, E. 1996. Reality check for data mining. IEEE Expert 11, 5 (Oct.), 26-33. Google Scholar
- SLAGLE, J. R., CHANG, C. L., AND HELLER, S. R. 1975. A clustering and data-reorganizing algorithm. IEEE Trans. Syst. Man Cybern. 5, 125-128.Google Scholar
- SNEATH, P. H. A. AND SOKAL, R. R. 1973. Numerical Taxonomy. Freeman, London, UK.Google Scholar
- SPATH, H. 1980. Cluster Analysis Algorithms for Data Reduction and Classification. Ellis Horwood, Upper Saddle River, NJ.Google Scholar
- SOLBERG, A., TAXT, T., AND JAIN, A. 1996. A Markov random field model for classification of multisource satellite imagery. IEEE Trans. Geoscience and Remote Sensing 34, 1, 100-113.Google Scholar
- SRIVASTAVA, A. AND MURTY, M. N 1990. A comparison between conceptual clustering and conventional clustering. Pattern Recogn. 23, 9 (1990), 975-981. Google Scholar
- STAHL, H. 1986. Cluster analysis of large data sets. In Classification as a Tool of Research, W. Gaul and M. Schader, Eds. Elsevier North-Holland, Inc., New York, NY, 423-430.Google Scholar
- STEPP, R. E. AND MICHALSKI, R. S. 1986. Conceptual clustering of structured objects: A goal-oriented approach. Artif. Intell. 28, 1 (Feb. 1986), 43-69. Google Scholar
- SUTTON, M., STARK, L., AND BOWYER, K. 1993. Function-based generic recognition for multiple object categories. In Three-Dimensional Object Recognition Systems, A. Jain and P. J. Flynn, Eds. Elsevier Science Inc., New York, NY.Google Scholar
- SYMON, M. J. 1977. Clustering criterion and multi-variate normal mixture. Biometrics 77, 35-43.Google Scholar
- TANAKA, E. 1995. Theoretical aspects of syntactic pattern recognition. Pattern Recogn. 28, 1053-1061.Google Scholar
- TAXT, T. AND LUNDERVOLD, A. 1994. Multispectral analysis of the brain using magnetic resonance imaging. IEEE Trans. Medical Imaging 13, 3, 470-481.Google Scholar
- TITTERINGTON, D. M., SMITH, A. F. M., AND MAKOV, U.E. 1985. Statistical Analysis of Finite Mixture Distributions. John Wiley and Sons, Inc., New York, NY.Google Scholar
- TOUSSAINT, G. T. 1980. The relative neighborhood graph of a finite planar set. Pattern Recogn. 12, 261-268.Google Scholar
- TRIER, O. D. AND JAIN, A. K. 1995. Goaldirected evaluation of binarization methods. IEEE Trans. Pattern Anal. Mach. Intell. 17, 1191-1201. Google Scholar
- UCHIYAMA, T. AND ARBIB, M.A. 1994. Color image segmentation using competitive learning. IEEE Trans. Pattern Anal. Mach. Intell. 16, 12 (Dec. 1994), 1197-1206. Google Scholar
- URQUHART, R.B. 1982. Graph theoretical clustering based on limited neighborhood sets. Pattern Recogn. 15, 173-187.Google Scholar
- VENKATESWARLU, N. B. AND RAJU, P. S. V. S. K. 1992. Fast ISODATA clustering algorithms. Pattern Recogn. 25, 3 (Mar. 1992), 335-342. Google Scholar
- VINOD, V. V., CHAUDHURY, S., MUKHERJEE, J., AND GHOSE, S. 1994. A connectionist approach for clustering with applications in image analysis. IEEE Trans. Syst. Man Cybern. 24, 365-384.Google Scholar
- WAH, B. W., Ed. 1996. Special section on mining of databases. IEEE Trans. Knowl. Data Eng. (Dec.).Google Scholar
- WARD, J. H. JR. 1963. Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58, 236-244.Google Scholar
- WATANABE, S. 1985. Pattern Recognition: Human and Mechanical. John Wiley and Sons, Inc., New York, NY. Google Scholar
- WESZKA, J. 1978. A survey of threshold selection techniques. Pattern Recogn. 7, 259-265.Google Scholar
- WHITLEY, D., STARKWEATHER, T., AND FUQUAY, D. 1989. Scheduling problems and traveling salesman: the genetic edge recombination. In Proceedings of the Third International Conference on Genetic Algorithms (George Mason University, June 4-7), J. D. Schaffer, Ed. Morgan Kaufmann Publishers Inc., San Francisco, CA, 133-140. Google Scholar
- WILSON, D. R. AND MARTINEZ, T. R. 1997. Improved heterogeneous distance functions. J. Artif Intell. Res. 6, 1-34. Google Scholar
- Wu, Z. AND LEAHY, R. 1993. An optimal graph theoretic approach to data clustering: Theory and its application to image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 15, 1101-1113. Google Scholar
- WULFEKUHLER, M. AND PUNCH, W. 1997. Finding salient features for personal web page categories. In Proceedings of the Sixth International Conference on the World Wide Web (Santa Clara, CA, Apr.), http://theory, stanford.edu/people/ wass/publications/Web Search/Web Search.html. Google Scholar
- ZADEH, L.A. 1965. Fuzzy sets. Inf. Control 8, 338 -353.Google Scholar
- ZAHN, C. T. 1971. Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans. Comput. C-20 (Apr.), 68-86.Google Scholar
- ZHANG, K. 1995. Algorithms for the constrained editing distance between ordered labeled trees and related problems. Pattern Recogn. 28, 463-474.Google Scholar
- ZHANG, J. AND MICHALSKI, R.S. 1995. An integration of rule induction and exemplar-based learning for graded concepts. Mach. Learn. 21, 3 (Dec. 1995), 235-267. Google Scholar
- ZHANG, T., RAMAKRISHNAN, R., AND LIVNY, M. 1996. BIRCH: An efficient data clustering method for very large databases. SIGMOD Rec. 25, 2, 103-114. Google Scholar
- ZUPAN, J. 1982. Clustering of Large Data Sets. Research Studies Press Ltd., Taunton, UK.Google Scholar
Recommendations
Survey of Clustering: Algorithms and Applications
This article is a survey into clustering applications and algorithms. A number of important well-known clustering methods are discussed. The authors present a brief history of the development of the field of clustering, discuss various types of ...
A new approach to clustering data with arbitrary shapes
In this paper we propose a clustering algorithm to cluster data with arbitrary shapes without knowing the number of clusters in advance. The proposed algorithm is a two-stage algorithm. In the first stage, a neural network incorporated with an ART-like ...
A novel optimization approach towards improving separability of clusters
AbstractThe objective functions in optimization models of the sum-of-squares clustering problem reflect intra-cluster similarity and inter-cluster dissimilarities and in general, optimal values of these functions can be considered as ...
Highlights- New optimization model is formulated for hard partitional clustering problem.
- ...
Comments