Skip to main content
Log in

A Novel Context-based Technique for Web Information Retrieval

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

In this paper we present context matching, a novel context-based technique for the ad-hoc retrieval of web documents. The aim of the technique is to dynamically generate a measure of document term significance during retrieval that can be used as a substitute or co-contributor of the term frequency measure. Unlike term frequency, which relies on a term occurring multiple times in a document to be considered significant, context matching is based on the notion that if a term in a given document occurs in that document in the context of the query, then that term is deemed to be significant. Context matching has the ability to potentially determine a term to be significant even if it occurs only once in a document. Vice versa, it also has the ability to determine a term to be insignificant, even if occurs frequently within a document. We show how expanded terms generated by a typical query expansion technique can be used effectively as query context for context matching. The technique is ideally suited to the nature of web information retrieval and we show how context matching significantly improves retrieval accuracy through experimental results on TREC web benchmark data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Allan, J., Callan, J., Feng F., Malin, D.: INQUERY and TREC-8. In: Proceedings of the 8th Text Retrieval Conference (TREC-8), Gaithersburg, USA, pp. 637–643 (1999)

  2. Amitay, E., Carmel, D., Darlow, A., Lempel, R., Soffer, A.: Topic distillation with knowledge agents. In: Proceedings of the 11th Text Retrieval Conference (TREC-11), Gaithersburg, Maryland, USA (2002)

  3. Anh, V., Moffat, A.: Robust and web retrieval document-centric integral impacts. In: Proceedings of the 12th Text Retrieval Conference (TREC-12), Gaithersburg, USA, pp. 726–731 (2003)

  4. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley, New York (1999)

    Google Scholar 

  5. Bharat, K., Hezinger, M.: Improved algorithms for topic distillation in a hyperlinked environment. In: Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 104–111, Melbourne, Australia (1998)

  6. Billhardt, H., Borrajo, D., Maojo, V.: A context vector model for information retrieval. J. Am. Soc. Inf. Sci. Technol. 53(3):236–249 (2002)

    Article  Google Scholar 

  7. Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: Proceedings of the 7th WWW Conference, pp. 107–117, Brisbane, Australia (1998a)

  8. Craswell, N., Hawking, D., Upstill, T., McLean, A., Wilkinson, R., Wu, M.: TREC 12 Web and interactive tracks at CSIRO. In: Proceedings of the 12th Text Retrieval Conference (TREC-12), Gaithersburg, USA, pp. 193–203 (2003)

  9. Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman G., Ruppin, E.: Placing search in context: the concept revisited. In: Proceedings of the 10th International World Wide Web Conference, pp. 406–414 (2001)

  10. Glover, E., Lawrence, S., Gordon, M., Birmingham, W., Lee Giles, C.: Web search — your way. Commun. ACM 44(12):97–102 (2001)

    Article  Google Scholar 

  11. He, B., Ounis, I.: A study of parameter tuning for term frequency normalization. In: Proceedings of the 12th International Conference on Information and Knowledge Management (CIKM), pp. 10–16 (2003)

  12. Hezinger, M.: Link analysis in web information retrieval. IEEE Data Engineering Bulletin 23(3):38–48 (2000)

    Google Scholar 

  13. Honkela, T., Kaski, S., Lagus, K., Kohonen, T.: WEBSOM — self-organizing maps of document collections. In: Proceedings of WSOM’97 (Workshop on Self-Organizing Maps), Espoo, Finland, pp. 310–315 (1997)

  14. Jing, H., Tzoukermann, E.: Information retrieval based on context distance and morphology. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in information Retrieval, pp. 90–96 (1999)

  15. Kang, I., Kim, G.: Query type classification for web document retrieval. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM Press, pp. 64–71 (2003)

  16. Kleinberg, J.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5):604–632 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  17. Lawrence, S., Giles, C.: Context and page analysis for improved web search. IEEE Internet Computing 2(4):38–46 (1998)

    Article  Google Scholar 

  18. Luhn, H.: A statistical approach to mechanized encoding and searching of literary information. IBM J. Res. Develop. 1(4):309–317 (1957)

    Article  MathSciNet  Google Scholar 

  19. Namba, I., Igata, N.: Fujitsu Laboratories TREC8 Report adhoc, small web, and large web track. In: Proceedings of the 8th Text Retrieval Conference (TREC-8), Gaithersburg, USA, pp. 275–284 (1999)

  20. Plachouris, V., Ounis, I.: Query-biased combination of evidence on the web. Workshop on Mathematical/Formal Methods in Information Retrieval, ACM SIGIR Conference, pp. 105–121 (2002)

  21. Plachouris, V., Cacheda, F., Ounis, Iadh, van Rijsbergen, C.: University of Glasgow at the Web Track: Dynamic Application of Hyperlink Analysis using the Query Scope. In: Proceedings of the 12th Text Retrieval Conference (TREC-12), Gaithersburg, USA, pp. 636–642 (2003)

  22. Robertson, S.: On term selection for query expansion. J. Doc. 46(4):359–364 (1990)

    Google Scholar 

  23. Robertson, S., Walker, S.: Okapi/Keenbow at TREC-8. In: Proceedings of the 8th Text Retrieval Conference (TREC-8), Gaithersburg, USA, pp. 151–161 (1999)

  24. Salton, G., Yang, C.: On the specification of term values in automatic indexing. J. Doc. 29(4):351–372 (1973)

    Google Scholar 

  25. Salton, G., Yang, C., Wong, A.: A vector space model for automatic indexing. Commun. ACM 18(11):613–620 (1975)

    Article  MATH  Google Scholar 

  26. Voorhees, E.: Using WordNet for text retrieval. WordNet: An Electronic Lexical Database, MIT Press, pp. 285–303 (1998)

  27. Walker, S., Robertson, S., Boughanem, M., Jones, G., Sparck Jones, K.: Okapi at TREC-6 Automatic ad hoc, VLC, Routing, Filtering and QSDR. In: Proceedings of the 6th Text Retrieval Conference (TREC-6), Gaithersburg, USA, pp. 125–136 (1997)

  28. Xu, J., Croft, B.: Query expansion using local and global document analysis. In: Proceedings of the Nineteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 4–11 (1996)

  29. Yu, S., Cai, D., Wen, J., Ma, W.: Improving pseudo-relevance feedback in web information retrieval using web page segmentation. In: Proceedings of the 12th International Word Wide Web Conference (2003)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to John Zakos.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zakos, J., Verma, B. A Novel Context-based Technique for Web Information Retrieval. World Wide Web 9, 485–503 (2006). https://doi.org/10.1007/s11280-006-0223-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-006-0223-y

Keywords

Navigation