Skip to main content

XML-Structured Documents: Retrievable Units and Inheritance

  • Conference paper
Flexible Query Answering Systems (FQAS 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4027))

Included in the following conference series:

Abstract

We consider the retrieval of XML-structured documents, and of passages from such documents, defined as elements of the XML structure. These are considered from the point of view of passage retrieval, as a form of document retrieval. A retrievable unit (an element chosen as defining suitable passages for retrieval) is a textual document in its own right, but may inherit information from the other parts of the same document. Again, this inheritance is defined in terms of the XML structure. All retrievable units are mapped onto a common field structure, and the ranking function is a standard document retrieval function with a suitable field weighting. A small experiment to demonstrate the idea, using INEX data, is described.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Callan, J.: Passage-level evidence in document retrieval. In: Croft, W.B., van Rijsbergen, C.J. (eds.) SIGIR 1994: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 302–310. Springer, Heidelberg (1994)

    Google Scholar 

  2. Zaragoza, H., Craswell, N., Taylor, M., Saria, S., Robertson, S.: Microsoft Cambridge at TREC 2004: Web and HARD track. In: Voorhees, E.M., Buckland, L.P. (eds.) The Thirteenth Text REtrieval Conference, TREC 2004. NIST Special Publication 500-261. NIST, Gaithersburg (2005), http://trec.nist.gov/pubs/trec13/t13_proceedings.html

  3. Amitay, E., et al.: Juru at TREC 2003 – topic distillation using query-sensitive tuning and cohesiveness filtering. In: Voorhees, E.M., Buckland, L.P. (eds.) The Twelfth Text REtrieval Conference, TREC 2003. NIST Special Publication 500-255. pp. 276–282, NIST, Gaithersburg (2004), http://trec.nist.gov/pubs/trec12/t12_proceedings.html

  4. Wilkinson, R.: Effective retrieval of structured documents. In: Croft, W.B., van Rijsbergen, C.J. (eds.) SIGIR 1994: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 311–317. Springer, Heidelberg (1994)

    Google Scholar 

  5. Robertson, S., Zaragoza, H., Taylor, M.: Simple BM25 extension to multiple weighted fields. In: Evans, D.A., Gravano, L., Hertzog, O., Zhai, C.X., Ronthaler, M. (eds.) CIKM 2004: Proceedings of the 13th ACM Conference on Information and Knowledge Management, pp. 42–49. ACM Press, New York (2004)

    Chapter  Google Scholar 

  6. Craswell, N., Hawking, D.: Overview of the TREC 2004 web track. In: Voorhees, E.M., Buckland, L.P. (eds.) The Thirteenth Text REtrieval Conference, TREC 2004. NIST Special Publication 500-261. pp. 89–97, NIST, Gaithersburg (2005), http://trec.nist.gov/pubs/trec13/t13_proceedings.html

  7. Arvola, P., Junkkair, M., Kekalainen, J.: Generalized contextualisation method for XML information retrieval. In: Herzog, O., Schek, H., Fuhr, N., Chowdhury, A., Teiken, W. (eds.) CIKM 2005: Proceedings of the 14th ACM Conference on Information and Knowledge Management, pp. 20–27. ACM Press, New York (2005)

    Chapter  Google Scholar 

  8. Ogilvie, P., Callan, J.: Hierarchical language models for XML component retrieval. In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds.) INEX 2004. LNCS, vol. 3493, pp. 224–237. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  9. Sigurbjornsson, B., Kamps, J., de Rijke, M.: An element-based approach to XML retrieval. In: Fuhr, N., Malik, S., Lalmas, M. (eds.) INEX 2003: Second International Workshop of the Initative for the Evaluation of XML Retrieval, INEX, pp. 19–26 (2004)

    Google Scholar 

  10. Sigurbjörnsson, B., Kamps, J., de Rijke, M.: Mixture models, overlap, and structural hints in XML element retrieval. In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds.) INEX 2004. LNCS, vol. 3493, pp. 104–109. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  11. Mass, Y., Mandelbrod, M.: Retrieving the most relevant XML components. In: Fuhr, N., Malik, S., Lalmas, M. (eds.) INEX 2003: Second International Workshop of the Initative for the Evaluation of XML Retrieval, pp. 53–58. INEX (2004)

    Google Scholar 

  12. Mass, Y., Mandelbrod, M.: Component ranking and automatic query refinement for XML retrieval. In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds.) INEX 2004. LNCS, vol. 3493, pp. 134–140. Springer, Heidelberg (2005)

    Google Scholar 

  13. Taylor, M., Zaragoza, H., Craswell, N., Robertson, S.: Optimisation methods for ranking functions with multiple parameters (2006) (Submitted for publication)

    Google Scholar 

  14. Lu, W., Robertson, S., MacFarlane, A.: Field-weighted XML retrieval based on BM25. In: Fuhr, N., Lalmas, M., Malik, S., Kazai, G. (eds.) INEX 2005. LNCS, vol. 3977, pp. 161–171. Springer, Heidelberg (2006) (Submitted for publication)

    Chapter  Google Scholar 

  15. INEX: INitiative for the evaluation of XML retrieval (2006), http://inex.is.informatik.uni-duisburg.de/2005/ (visited February 13, 2006)

  16. Kazai, G., Lalmas, M.: INEX 2005 evaluation metrics (2005), http://inex.is.informatik.uni-duisburg.de/2005/inex-2005-metricsv6.pdf (visited February 22, 2006)

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Robertson, S., Lu, W., MacFarlane, A. (2006). XML-Structured Documents: Retrievable Units and Inheritance. In: Larsen, H.L., Pasi, G., Ortiz-Arroyo, D., Andreasen, T., Christiansen, H. (eds) Flexible Query Answering Systems. FQAS 2006. Lecture Notes in Computer Science(), vol 4027. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11766254_11

Download citation

  • DOI: https://doi.org/10.1007/11766254_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-34638-8

  • Online ISBN: 978-3-540-34639-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics