ABSTRACT
In this study we show experimental results on using Independent Component Analysis (ICA) and the Self-Organizing Map (SOM) in document analysis. Our documents are segments of spoken dialogues carried out over the telephone in a customer service, transcribed into text. The task is to analyze the topics of the discussions, and to group the discussions into meaningful subsets. The quality of the grouping is studied by comparing to a manual topical classification of the documents.
- E. Bingham. Topic identification in dynamical text by extracting minimum complexity time components. In Proc. ICA2001, pages 546--551, 2001.Google Scholar
- P. Comon. Independent component analysis --- a new concept? Signal Processing, 36:287--314, 1994. Google ScholarDigital Library
- S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391--407, 1990.Google ScholarCross Ref
- A. Hyvärinen. Fast and robust fixed-point algorithms for independent component analysis. IEEE Tr. on Neural Networks, 10(3):626--634, May 1999. Google ScholarDigital Library
- A. Hyvärinen, J. Karhunen, and E. Oja. Independent component analysis. Wiley Interscience, 2001.Google Scholar
- C. L. Isbell and P. Viola. Restucturing sparse high dimensional data for effective retrieval. In Adv. in Neural Inf. Proc. Systems 11, pages 480--486, 1998. Google ScholarDigital Library
- K. Jokinen. Sigdial --- the USIX Interact project: Adaptivity in dialogue systems. Elsnews, 10(2):10, Summer 2001.Google Scholar
- S. Kaski. Dimensionality reduction by random mapping: Fast similarity computation for clustering. In Proc. IJCNN'98, volume 1, pages 413--418. 1998.Google ScholarCross Ref
- T. Kohonen. Self-Organizing Maps, volume 30 of Springer Series in Information Sciences. Springer, Berlin, 1995. Google ScholarDigital Library
- T. Kohonen, S. Kaski, K. Lagus, J. Salojärvi, V. Paatero, and A. Saarela. Organization of a massive document collection. IEEE Tr. on Neural Networks, 11(3):574--585, May 2000. Google ScholarDigital Library
- T. Kolenda, L. K. Hansen, and S. Sigurdsson. Independent components in text. In M. Girolami, editor, Advances in Independent Component Analysis, chapter 13, pages 235--256. Springer-Verlag, 2000.Google Scholar
- K. Lagus. Text retrieval using self-organized document maps. Neural Processing Letters, 15(1):21--29, 2002. Google ScholarDigital Library
- K. Lagus and S. Kaski. Keyword selection method for characterizing text document maps. In Proc. ICANN99, volume 1, pages 371--376, 1999.Google ScholarCross Ref
- G. Salton and M. McGill. Introduction to modern information retrieval. McGraw-Hill, New York, 1983. Google ScholarDigital Library
Index Terms
- ICA and SOM in text document analysis
Recommendations
Face Recognition Using IPCA-ICA Algorithm
In this paper, a fast incremental principal non-Gaussian directions analysis algorithm, called IPCA-ICA, is introduced. This algorithm computes the principal components of a sequence of image vectors incrementally without estimating the covariance ...
Text Retrieval Using Self-Organized Document Maps
A map of text documents arranged using the Self-Organizing Map (SOM) algorithm (1) is organized in a meaningful manner so that items with similar content appear at nearby locations of the 2-dimensional map display, and (2) clusters the data, resulting ...
Recognizing faces with PCA and ICA
Special issue on Face recognitionThis paper compares principal component analysis (PCA) and independent component analysis (ICA) in the context of a baseline face recognition system, a comparison motivated by contradictory claims in the literature. This paper shows how the relative ...
Comments