Skip to main content

A Field Relevance Model for Structured Document Retrieval

  • Conference paper
Advances in Information Retrieval (ECIR 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7224))

Included in the following conference series:

Abstract

Many search applications involve documents with structure or fields. Since query terms often are related to specific structural components, mapping queries to fields and assigning weights to those fields is critical for retrieval effectiveness. Although several field-based retrieval models have been developed, there has not been a formal justification of field weighting.

In this work, we aim to improve the field weighting for structured document retrieval. We first introduce the notion of field relevance as the generalization of field weights, and discuss how it can be estimated using relevant documents, which effectively implements relevance feedback for field weighting. We then propose a framework for estimating field relevance based on the combination of several sources. Evaluation on several structured document collections show that field weighting based on the suggested framework improves retrieval effectiveness significantly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bendersky, M., Metzler, D., Croft, W.B.: Learning concept importance using a weighted dependence model. In: WSDM 2010, pp. 31–40. ACM, New York (2010)

    Chapter  Google Scholar 

  2. Craswell, N., Hugo Zaragoza, S.R.: Microsoft cambridge at trec-14: Enterprise track. In: The Fourteenth Text REtrieval Conference (2005)

    Google Scholar 

  3. Craswell, N., de Vries, A.P.: Overview of the trec-2005 enterprise track. In: The Fourteenth Text REtrieval Conf. Proc. (2005)

    Google Scholar 

  4. Kim, J., Croft, W.B.: Retreival experiments using pseudo-desktop collections. In: Proceedings of CIKM 2009, Hong Kong, China, pp. 1297–1306 (2009)

    Google Scholar 

  5. Kim, J., Xue, X., Croft, W.B.: A Probabilistic Retrieval Model for Semistructured Data. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 228–239. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  6. Lavrenko, V.: A generative theory of relevance. PhD thesis, AAI3152722 (2004)

    Google Scholar 

  7. Lavrenko, V., Croft, W.B.: Relevance based language models. In: SIGIR 2001, pp. 120–127. ACM, New York (2001)

    Chapter  Google Scholar 

  8. Lavrenko, V., Yi, X., Allan, J.: Information retrieval on empty fields. In: HLT-NAACL, pp. 89–96 (2007)

    Google Scholar 

  9. Li, X., Wang, Y.-Y., Acero, A.: Extracting structured information from user queries with semi-supervised conditional random fields. In: SIGIR 2009. ACM, New York (2009)

    Google Scholar 

  10. Metzler, D., Croft, W.B.: Linear feature-based models for information retrieval. Information Retrieval 10, 257–274 (2007)

    Article  Google Scholar 

  11. Ogilvie, P., Callan, J.: Combining document representations for known-item search. In: SIGIR 2003: Proceedings of the 26th Annual International ACM SIGIR Conference, pp. 143–150. ACM, New York (2003)

    Chapter  Google Scholar 

  12. Petkova, D., Croft, W.B., Diao, Y.: Refining Keyword Queries for XML Retrieval by Combining Content and Structure. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 662–669. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  13. Ponte, J., Croft, W.B.: A language modeling approach to information retrieval, pp. 275–281. ACM, New York (1998)

    Google Scholar 

  14. Robertson, S., Zaragoza, H., Taylor, M.: Simple BM25 extension to multiple weighted fields. In: Proceedings of CIKM 2004, pp. 42–49. ACM, New York (2004)

    Chapter  Google Scholar 

  15. Robertson, S.E., Jones, K.S.: Relevance weighting of search terms. Journal of the American Society for Information Science 27(3), 129–146 (1976)

    Article  Google Scholar 

  16. Yi, X., Allan, J., Croft, W.B.: Matching resumes and jobs based on relevance models. In: SIGIR, pp. 809–810 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kim, J.Y., Croft, W.B. (2012). A Field Relevance Model for Structured Document Retrieval. In: Baeza-Yates, R., et al. Advances in Information Retrieval. ECIR 2012. Lecture Notes in Computer Science, vol 7224. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28997-2_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28997-2_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28996-5

  • Online ISBN: 978-3-642-28997-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics