Abstract
In this paper we present context matching, a novel context-based technique for the ad-hoc retrieval of web documents. The aim of the technique is to dynamically generate a measure of document term significance during retrieval that can be used as a substitute or co-contributor of the term frequency measure. Unlike term frequency, which relies on a term occurring multiple times in a document to be considered significant, context matching is based on the notion that if a term in a given document occurs in that document in the context of the query, then that term is deemed to be significant. Context matching has the ability to potentially determine a term to be significant even if it occurs only once in a document. Vice versa, it also has the ability to determine a term to be insignificant, even if occurs frequently within a document. We show how expanded terms generated by a typical query expansion technique can be used effectively as query context for context matching. The technique is ideally suited to the nature of web information retrieval and we show how context matching significantly improves retrieval accuracy through experimental results on TREC web benchmark data.
Similar content being viewed by others
References
Allan, J., Callan, J., Feng F., Malin, D.: INQUERY and TREC-8. In: Proceedings of the 8th Text Retrieval Conference (TREC-8), Gaithersburg, USA, pp. 637–643 (1999)
Amitay, E., Carmel, D., Darlow, A., Lempel, R., Soffer, A.: Topic distillation with knowledge agents. In: Proceedings of the 11th Text Retrieval Conference (TREC-11), Gaithersburg, Maryland, USA (2002)
Anh, V., Moffat, A.: Robust and web retrieval document-centric integral impacts. In: Proceedings of the 12th Text Retrieval Conference (TREC-12), Gaithersburg, USA, pp. 726–731 (2003)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley, New York (1999)
Bharat, K., Hezinger, M.: Improved algorithms for topic distillation in a hyperlinked environment. In: Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 104–111, Melbourne, Australia (1998)
Billhardt, H., Borrajo, D., Maojo, V.: A context vector model for information retrieval. J. Am. Soc. Inf. Sci. Technol. 53(3):236–249 (2002)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: Proceedings of the 7th WWW Conference, pp. 107–117, Brisbane, Australia (1998a)
Craswell, N., Hawking, D., Upstill, T., McLean, A., Wilkinson, R., Wu, M.: TREC 12 Web and interactive tracks at CSIRO. In: Proceedings of the 12th Text Retrieval Conference (TREC-12), Gaithersburg, USA, pp. 193–203 (2003)
Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman G., Ruppin, E.: Placing search in context: the concept revisited. In: Proceedings of the 10th International World Wide Web Conference, pp. 406–414 (2001)
Glover, E., Lawrence, S., Gordon, M., Birmingham, W., Lee Giles, C.: Web search — your way. Commun. ACM 44(12):97–102 (2001)
He, B., Ounis, I.: A study of parameter tuning for term frequency normalization. In: Proceedings of the 12th International Conference on Information and Knowledge Management (CIKM), pp. 10–16 (2003)
Hezinger, M.: Link analysis in web information retrieval. IEEE Data Engineering Bulletin 23(3):38–48 (2000)
Honkela, T., Kaski, S., Lagus, K., Kohonen, T.: WEBSOM — self-organizing maps of document collections. In: Proceedings of WSOM’97 (Workshop on Self-Organizing Maps), Espoo, Finland, pp. 310–315 (1997)
Jing, H., Tzoukermann, E.: Information retrieval based on context distance and morphology. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in information Retrieval, pp. 90–96 (1999)
Kang, I., Kim, G.: Query type classification for web document retrieval. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM Press, pp. 64–71 (2003)
Kleinberg, J.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5):604–632 (1999)
Lawrence, S., Giles, C.: Context and page analysis for improved web search. IEEE Internet Computing 2(4):38–46 (1998)
Luhn, H.: A statistical approach to mechanized encoding and searching of literary information. IBM J. Res. Develop. 1(4):309–317 (1957)
Namba, I., Igata, N.: Fujitsu Laboratories TREC8 Report adhoc, small web, and large web track. In: Proceedings of the 8th Text Retrieval Conference (TREC-8), Gaithersburg, USA, pp. 275–284 (1999)
Plachouris, V., Ounis, I.: Query-biased combination of evidence on the web. Workshop on Mathematical/Formal Methods in Information Retrieval, ACM SIGIR Conference, pp. 105–121 (2002)
Plachouris, V., Cacheda, F., Ounis, Iadh, van Rijsbergen, C.: University of Glasgow at the Web Track: Dynamic Application of Hyperlink Analysis using the Query Scope. In: Proceedings of the 12th Text Retrieval Conference (TREC-12), Gaithersburg, USA, pp. 636–642 (2003)
Robertson, S.: On term selection for query expansion. J. Doc. 46(4):359–364 (1990)
Robertson, S., Walker, S.: Okapi/Keenbow at TREC-8. In: Proceedings of the 8th Text Retrieval Conference (TREC-8), Gaithersburg, USA, pp. 151–161 (1999)
Salton, G., Yang, C.: On the specification of term values in automatic indexing. J. Doc. 29(4):351–372 (1973)
Salton, G., Yang, C., Wong, A.: A vector space model for automatic indexing. Commun. ACM 18(11):613–620 (1975)
Voorhees, E.: Using WordNet for text retrieval. WordNet: An Electronic Lexical Database, MIT Press, pp. 285–303 (1998)
Walker, S., Robertson, S., Boughanem, M., Jones, G., Sparck Jones, K.: Okapi at TREC-6 Automatic ad hoc, VLC, Routing, Filtering and QSDR. In: Proceedings of the 6th Text Retrieval Conference (TREC-6), Gaithersburg, USA, pp. 125–136 (1997)
Xu, J., Croft, B.: Query expansion using local and global document analysis. In: Proceedings of the Nineteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 4–11 (1996)
Yu, S., Cai, D., Wen, J., Ma, W.: Improving pseudo-relevance feedback in web information retrieval using web page segmentation. In: Proceedings of the 12th International Word Wide Web Conference (2003)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zakos, J., Verma, B. A Novel Context-based Technique for Web Information Retrieval. World Wide Web 9, 485–503 (2006). https://doi.org/10.1007/s11280-006-0223-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-006-0223-y