Abstract
Many search applications involve documents with structure or fields. Since query terms often are related to specific structural components, mapping queries to fields and assigning weights to those fields is critical for retrieval effectiveness. Although several field-based retrieval models have been developed, there has not been a formal justification of field weighting.
In this work, we aim to improve the field weighting for structured document retrieval. We first introduce the notion of field relevance as the generalization of field weights, and discuss how it can be estimated using relevant documents, which effectively implements relevance feedback for field weighting. We then propose a framework for estimating field relevance based on the combination of several sources. Evaluation on several structured document collections show that field weighting based on the suggested framework improves retrieval effectiveness significantly.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bendersky, M., Metzler, D., Croft, W.B.: Learning concept importance using a weighted dependence model. In: WSDM 2010, pp. 31–40. ACM, New York (2010)
Craswell, N., Hugo Zaragoza, S.R.: Microsoft cambridge at trec-14: Enterprise track. In: The Fourteenth Text REtrieval Conference (2005)
Craswell, N., de Vries, A.P.: Overview of the trec-2005 enterprise track. In: The Fourteenth Text REtrieval Conf. Proc. (2005)
Kim, J., Croft, W.B.: Retreival experiments using pseudo-desktop collections. In: Proceedings of CIKM 2009, Hong Kong, China, pp. 1297–1306 (2009)
Kim, J., Xue, X., Croft, W.B.: A Probabilistic Retrieval Model for Semistructured Data. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 228–239. Springer, Heidelberg (2009)
Lavrenko, V.: A generative theory of relevance. PhD thesis, AAI3152722 (2004)
Lavrenko, V., Croft, W.B.: Relevance based language models. In: SIGIR 2001, pp. 120–127. ACM, New York (2001)
Lavrenko, V., Yi, X., Allan, J.: Information retrieval on empty fields. In: HLT-NAACL, pp. 89–96 (2007)
Li, X., Wang, Y.-Y., Acero, A.: Extracting structured information from user queries with semi-supervised conditional random fields. In: SIGIR 2009. ACM, New York (2009)
Metzler, D., Croft, W.B.: Linear feature-based models for information retrieval. Information Retrieval 10, 257–274 (2007)
Ogilvie, P., Callan, J.: Combining document representations for known-item search. In: SIGIR 2003: Proceedings of the 26th Annual International ACM SIGIR Conference, pp. 143–150. ACM, New York (2003)
Petkova, D., Croft, W.B., Diao, Y.: Refining Keyword Queries for XML Retrieval by Combining Content and Structure. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 662–669. Springer, Heidelberg (2009)
Ponte, J., Croft, W.B.: A language modeling approach to information retrieval, pp. 275–281. ACM, New York (1998)
Robertson, S., Zaragoza, H., Taylor, M.: Simple BM25 extension to multiple weighted fields. In: Proceedings of CIKM 2004, pp. 42–49. ACM, New York (2004)
Robertson, S.E., Jones, K.S.: Relevance weighting of search terms. Journal of the American Society for Information Science 27(3), 129–146 (1976)
Yi, X., Allan, J., Croft, W.B.: Matching resumes and jobs based on relevance models. In: SIGIR, pp. 809–810 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kim, J.Y., Croft, W.B. (2012). A Field Relevance Model for Structured Document Retrieval. In: Baeza-Yates, R., et al. Advances in Information Retrieval. ECIR 2012. Lecture Notes in Computer Science, vol 7224. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28997-2_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-28997-2_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28996-5
Online ISBN: 978-3-642-28997-2
eBook Packages: Computer ScienceComputer Science (R0)