Document clustering: An evaluation of some experiments with the cranfield 1400 collection
References (16)
Automatic Information Organisation and Retrieval
(1968)- et al.
Mathematical Taxonomy
(1971) - et al.
Factors Determining the Performance of Indexing Systems
et al.Factors Determining the Performance of Indexing Systems
- et al.
Comparative Evaluation of Index Languages
- et al.
Report of an Information Science Index Languages Test
(1972) Further experiments with hierarchic clustering in document retrieval
Inform. Stor. Retr.
(1974)A review of classification
J.R. Statist. soc. A
(1971)- et al.
The use of hierarchic clustering in information retrieval
Inform. Stor. Retr.
(1971)
Cited by (69)
Probabilistic co-relevance for query-sensitive similarity measurement in information retrieval
2013, Information Processing and ManagementCitation Excerpt :This method was motivated by the cluster hypothesis, which states that “closely associated documents tend to be relevant to the same requests” (Jardine & Rijsbergen, 1971; Rijsbergen, 1979). Thus far, numerous studies have been conducted, for instance, initial trials based on hierarchical clustering that employed different types of merging criteria, i.e., single linkage, complete linkage, group average, and Ward’s method (Croft, 1980; El-Hamdouchi & Willett, 1986; Griffiths, Robinson, & Willett, 1984; Jardine & Rijsbergen, 1971; Rijsbergen & Croft, 1975; Voorhees, 1985). There are also more recent language modeling approaches based on partitional clustering (Liu & Croft, 2004; Na, Kang, Roh, & Lee, 2007) and document expansion using nearest neighbors as a cluster (Kurland & Lee, 2004; Tao, Wang, Mei, & Zhai, 2006).
Cluster-based patent retrieval
2007, Information Processing and ManagementA reliable FAQ retrieval system using a query log classification technique based on latent semantic analysis
2007, Information Processing and ManagementHigh-performance FAQ retrieval using an automatic clustering method of query logs
2006, Information Processing and ManagementCitation Excerpt :There have been numerous studies on how clustering can be employed to improve retrieval results (Liu & Croft, 2004). The cluster-based retrieval can be divided into two types: static clustering methods (Jardine & van Rijsbergen, 1971; van Rijsbergen & Croft, 1975) and query specific clustering methods (Hearst & Pedersen, 1996; Tombros, Villa, & van Rijsbergen, 2002). The static clustering methods group entire collections in advance, independent of the user’s query, and clusters are retrieved based on how well their centroids match the user’s query.
Tree view self-organisation of web content - Institute for Water Education
2005, NeurocomputingThe effectiveness of query-specific hierarchic clustering in information retrieval
2002, Information Processing and ManagementCitation Excerpt :Hierarchic methods1 on the other hand, result in tree-like classifications in which small clusters of documents that are found to be strongly similar to each other are nested within larger clusters that contain less similar documents (Willett, 1988). Two main methods, and many variants of them, for matching a query against a document hierarchy have been proposed (Croft, 1980; Jardine & Van Rijsbergen, 1971; Van Rijsbergen, 1974, 1975; Voorhees, 1985): a top–down search, and a bottom–up search. In both types of search, a single cluster that satisfies a retrieval criterion is retrieved.
- †
Present address: Royal Society Fellow at Cambridge University Computing Laboratory, Cambridge, England.