ScienceDirect® Home Skip Main Navigation Links
You have guest access to ScienceDirect. Find out more.
 
Home
Browse
My Settings
Alerts
Help
 Quick Search
 Search tips (Opens new window)
    Clear all fields    
advertisementadvertisement
International Journal of Approximate Reasoning
Volume 40, Issues 1-2, July 2005, Pages 55-80
Data Mining and Granular Computing
 
Font Size: Decrease Font Size  Increase Font Size
 Abstract - selected
Purchase PDF (601 K)

 
 
 
Related Articles in ScienceDirect
View More Related Articles
 
View Record in Scopus
 
doi:10.1016/j.ijar.2004.11.005    How to Cite or Link Using DOI (Opens New Window)
Copyright © 2005 Published by Elsevier Inc.

A simplicial complex, a hypergraph, structure in the latent semantic space of document clustering

Tsau Young Lina, Corresponding Author Contact Information, E-mail The Corresponding Author and I-Jen Chiangb, E-mail The Corresponding Author

aDepartment of Computer Science, San Jose State University, One Washington Square, San Jose, CA 95192-0249, USA bGraduate Institute of Medical Informatics, Taipei Medical University, 205 Wu-Hsien Street, Taipei 110, Taiwan, ROC

Received 1 July 2004; 
accepted 1 November 2004. 
Available online 7 January 2005.

Purchase the full-text article



References and further reading may be available for this article. To view references and further reading you must purchase this article.

Abstract

This paper presents a novel approach to document clustering based on some geometric structure in Combinatorial Topology. Given a set of documents, the set of associations among frequently co-occurring terms in documents forms naturally a simplicial complex. Our general thesis is each connected component of this simplicial complex represents a concept in the collection. Based on these concepts, documents can be clustered into meaningful classes. However, in this paper, we attack a softer notion, instead of connected components, we use maximal simplexes of highest dimension as representative of connected components, the concept so defined is called maximal primitive concepts.

Experiments with three different data sets from Web pages and medical literature have shown that the proposed unsupervised clustering approach performs significantly better than traditional clustering algorithms, such as k-means, AutoClass and Hierarchical Clustering (HAG). This abstract geometric model seems have captured the latent semantic structure of documents.

Keywords: Document clustering; Association rules; Topology; Hierarchical clustering; Simplicial complex


International Journal of Approximate Reasoning
Volume 40, Issues 1-2, July 2005, Pages 55-80
Data Mining and Granular Computing
 
Home
Browse
My Settings
Alerts
Help
Elsevier.com (Opens new window)
About ScienceDirect  |  Contact Us  |  Information for Advertisers  |  Terms & Conditions  |  Privacy Policy
Copyright © 2008 Elsevier B.V. All rights reserved. ScienceDirect® is a registered trademark of Elsevier B.V.