research-article

Exploring and measuring dependency trees for informationretrieval

Author:
Chang Liu

University of Ulster, Newtownabbey, United Kngdm

University of Ulster, Newtownabbey, United Kngdm
View Profile

SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrievalJuly 2008Pages 892https://doi.org/10.1145/1390334.1390567

Published:20 July 2008Publication History

SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

Pages 892

ABSTRACT

Natural language processing techniques are believed to hold a tremendous potential to supplement the purely quantitative methods of text information retrieval. This has led to the emergence of a large number of NLP-based IR research projects over the last few years, even though the empirical evidence to support this has often been inadequate. Most contributions of NLP to IR mainly concentrate on document representation and compound term matching strategies. Researchers have noted that the simple term-based representation of document content such as vector representation is usually inadequate for accurate discrimination. The "bag of words" representation does not invoke linguistic considerations and allow modelling of relationships between subsets of words. However, even though a variety of content indicator such as syntactic phrase have been tried and investigated for representing documents rather than single terms in IR systems, the matching strategy over those representation still cannot go beyond traditional statistical techniques that measure term co-occurrence characteristics and proximity in analyzing text structure.

In this paper, we propose a novel IR strategy (SIR) with NLP techniques involved at the syntactic level. Within SIR, documents and query representation are built on the basis of a syntactic data structure of the natural language text - the dependency tree, in which syntactic relationships between words are identified and structured in the form of a tree. In order to capture the syntactic relations between words in their hierarchical structural representation, the matching strategy in SIR upgrades from the traditional statistical techniques by introducing a similarity measure method executing on the graph representation level as the key determiner. A basic IR experiment is designed and implemented on the TREC data to evaluate if this novel IR model is feasible. Experimental results indicate that this approach has the potential to outperform the standard bag of words IR model, especially in response to syntactical structured queries.

References

D. Lin. Minipar. http://www.cs.ualberta.ca/ lindek/minipar.htm.Google Scholar
T. Strzalkowski. Natural Language Information Retrieval. Springer, Germany, 1999. Google ScholarDigital Library

Index Terms

Exploring and measuring dependency trees for informationretrieval
1. Information systems
  1. Information retrieval
    1. Document representation
    2. Retrieval models and ranking

Recommendations

Sub-Word Indexing and Blind Relevance Feedback for English, Bengali, Hindi, and Marathi IR

The Forum for Information Retrieval Evaluation (FIRE) provides document collections, topics, and relevance assessments for information retrieval (IR) experiments on Indian languages. Several research questions are explored in this article: 1) How to ...
Read More
Morphologically Annotated Amharic Text Corpora
SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

In information retrieval (IR), documents that match the query are retrieved. Search engines usually conflate word variants into a common stem when indexing documents because queries and documents do not need to use exactly the same word variant for the ...
Read More
A novel unsupervised corpus-based stemming technique using lexicon and corpus statistics
Abstract
Word Stemming is a widely used mechanism in the fields of Natural Language Processing, Information Retrieval, and Language Modeling. Language-independent stemmers discover classes of morphologically related words from the ambient ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
July 2008
934 pages
ISBN:9781605581644
DOI:10.1145/1390334
General Chairs:
Tat-Seng Chua
National University of Singapore
,
Mun-Kew Leong
National Library Board, Singapore
,
Program Chairs:
Syung Hyon Myaeng
Information and Communications University, Korea
,
Douglas W. Oard
University of Maryland, College Park, USA
,
Fabrizio Sebastiani
Consiglio Nazionale delle Ricerche, Italy
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 July 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
dependency tree
information retrieval
tree matching
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 341
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Exploring and measuring dependency trees for informationretrieval

SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Sub-Word Indexing and Blind Relevance Feedback for English, Bengali, Hindi, and Marathi IR

Morphologically Annotated Amharic Text Corpora

A novel unsupervised corpus-based stemming technique using lexicon and corpus statistics