Skip to main content
Log in

Improving contextual advertising matching by using Wikipedia thesaurus knowledge

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

As a prevalent type of Web advertising, contextual advertising refers to the placement of the most relevant commercial ads within the content of a Web page, to provide a better user experience and as a result increase the user’s ad-click rate. However, due to the intrinsic problems of homonymy and polysemy, the low intersection of keywords, and a lack of sufficient semantics, traditional keyword matching techniques are not able to effectively handle contextual matching and retrieve relevant ads for the user, resulting in an unsatisfactory performance in ad selection. In this paper, we introduce a new contextual advertising approach to overcome these problems, which uses Wikipedia thesaurus knowledge to enrich the semantic expression of a target page (or an ad). First, we map each page into a keyword vector, upon which two additional feature vectors, the Wikipedia concept and category vector derived from the Wikipedia thesaurus structure, are then constructed. Second, to determine the relevant ads for a given page, we propose a linear similarity fusion mechanism, which combines the above three feature vectors in a unified manner. Last, we validate our approach using a set of real ads, real pages along with the external Wikipedia thesaurus. The experimental results show that our approach outperforms the conventional contextual advertising matching approaches and can substantially improve the performance of ad selection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. PricewaterhouseCoopers—www.pwc.com.

  2. Introduction to Computational Advertising—www.stanford.edu/class/msande239/.

  3. Wikipedia—www.wikipedia.org.

  4. Netease Youdao Dictionary—http://dict.youdao.com.

  5. Google Adwords—http://adwords.google.com.

  6. Wikimedia dump—http://download.wikimedia.org/enwiki.

  7. Mwdumper—www.mediawiki.org/wiki/Mwdumper.

  8. some categories in Wikipedia without main articles would not be included in the taxonomy, so the number of categories in the taxonomy is about 8,000.

References

  1. Anagnostopoulos A, Broder AZ, Gabrilovich E, Josifovski V, Riedel L (2011) Web page summarization for just-in-time contextual advertising. ACM Trans Intell Syst Tech 3(1):14:1–14:32

    Article  Google Scholar 

  2. Anagnostopoulos A, Broder A, Gabrilovich E, Josifovski V, Riedel L (2007) Just-in-time contextual advertising. In: Proceedings of the 16th ACM international conference on information and knowledge management (CIKM’07), ACM, New York, pp 331–340

  3. Broder A, Fontoura M, Josifovski V, Riedel L (2007) A semantic approach to contextual advertising. In: Proceedings of the 34th annual ACM SIGIR conference (SIGIR’07), ACM, New York, pp 559–566

  4. Chatterjee P, Hoffman DL, Novak TP (2003) Modeling the clickstream: implications for web-based advertising efforts. Mark Sci 22(4):520–541

    Article  Google Scholar 

  5. Ciaramita M, Murdock V, Plachouras V (2008) Semantic associations for contextual advertising. J Electron Commer Res 9(1):1–15

    Google Scholar 

  6. Comprehensive perl archive network (2007) http://search.cpan.org/jzhang/HTML-ContentExtractor-0.03/lib/HTML/ContentExtractor.pm

  7. Evgeniy G, Shaul M (2009) Wikipedia-based semantic interpretation for natural language processing. J Artif Intell Res 34:443–498

    Google Scholar 

  8. Evgeniy G, Shaul M (2007) Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: Proceeding of the 20th AAAI international conference on artificial intelligence (AAAI’11), AAAI, San Francisco

  9. Fan TK, Chang CH (2010) Sentiment-oriented contextual advertising. Knowl Inf Syst 23(3):321–344

    Article  Google Scholar 

  10. Fan TK, Chang CH (2011) Blogger-centric contextual advertising. Expert Syst Appl 38(3):1777–1788

    Article  Google Scholar 

  11. Gupta S, Kaiser GE, Grimm P, Chiang MF, Starren J (2005) Automating content extraction of html documents. World Wide Web J 8(2):179–224

    Article  Google Scholar 

  12. Gupta S, Kaiser GE, Grimm P, Chiang MF, Starren J (2009) Knowledge distribution via shared context between blog-based knowledge management systems: a case study of collaborative tagging. Expert Syst Appl 36(2):10,627–10,633

    Google Scholar 

  13. Hovy E, Navigli R, Ponzetto SP, Collaboratively built semi-structured content and artificial intelligence: the story so far. Artif Intell 194

  14. Hu XH, Zhang XD, Lu CM, Park EK, Zhou XH (2009) Exploiting wikipedia as external knowledge for document clustering. In: Proceedings of the 15th ACM SIGKDD conference on knowledge discovery and data mining (SIGKDD’09), ACM, New York, pp 389–396

  15. Hu J, Fang LJ, Cao Y, Zeng HJ, Li H, Yang Q, Chen Z (2008) Enhancing text clustering by leveraging wikipedia semantics. In: Proceedings of the 35th annual ACM SIGIR conference (SIGIR’08), ACM, New York, pp 179–186

  16. Hu J, Wang G, Lochovsky F, Sun JT, Chen Z (2009) Understanding user’s query intent with wikipedia. In: Proceedings of the 18th world wide web conference (WWW’09), ACM, New York, pp 471–480

  17. Lacerda A, Cristo M, Andre MG, Fan W, Ziviani N, Ribeiro-Neto B (2006) Learning to advertise. In: Proceedings of the 33th annual ACM SIGIR conference (SIGIR’06), ACM, New York, pp 549–556

  18. Mei T, Hua XS, Li SP (2011) Contextual internet multimedia advertising. Proc IEEE 98(8):1416–1433

    MathSciNet  Google Scholar 

  19. Mei T, Hua XS, Li SP (2008) Contextual in-image advertising. In: Proceeding of the 16th ACM international conference on multimedia (MM’08), ACM, New York, pp 439–448

  20. Milne D, Medelyan O, Witten IH (2006) Mining domain-specific thesauri from wikipedia: a case study. In: Proceedings of the 2006 IEEE/WIC/ACM international conference on web intelligence, IEEE, Los Alamitos, pp 442–448

  21. Murdock V, Ciaramita M, Plachouras V (2007) A noisy-channel approach to contextual advertising. In: Proceedings of SIGKDD workshops 07, ACM, New York, pp 21–27

  22. Pak AN (2011) Using wikipedia to improve prevision of contextual advertising. In: Proceedings of the 4th international conference on human language technology: challenges for computer science and linguistics (LTC’09), Springer, Berlin, pp 533–543

  23. Pak AN, Chung CW (2010) A wikipedia matching approach to contextual advertising. World Wide Web J 13(3):251–274

    Article  Google Scholar 

  24. Papadopoulos S, Menemenis F, Kompatsiaris Y, Brato B (2009) Lexical graphs for improved contextual ad recommendation. In: Proceedings of the 31st European conference on information retrieval (ECIR’09), pp 216–227

  25. Ribeiro-Neto B, Cristo M, Golgher PB, Moura ES (2005) Impedance coupling in content-targeted advertising. In: Proceedings of the 32th annual ACM SIGIR conference (SIGIR’05), ACM, New York, pp 496–503

  26. Salton G, Wong A, Yang C (1976) A vector space model for automatic indexing. Commun ACM 18(11):613–620

    Article  Google Scholar 

  27. Wang P, Hu J, Zeng HJ, Chen Z (2009) Using wikipedia knowledge to improve text classification. Knowl Inf Syst 19(3):265–281

    Article  Google Scholar 

  28. Wang C, Zhang P, Choi R, Eredita M (2002) Understanding consumers attitude toward advertising. In: Proceeding of the 8th Americas conference on information systems (AMCIS’02), pp 1143–1148

  29. Wu ZD, Xu GD, Pan R, Zhang YC, Hu ZW, Lu JF (2011) Leveraging wikipedia concept and category information to enhance contextual advertising. In: Proceedings of the 20th ACM international conference on information and knowledge management (CIKM’11), ACM, New York, pp 2105–2108

  30. Wu HC, Luk RWP, Wong KF, Kwok KL (2008) Interpreting tf-idf term weights as making relevance decisions. ACM Trans Inf Syst 26(3):13–50

    Article  Google Scholar 

  31. Wu ZD, Xu GD, Zhang YC, Dolog P, Lu CL (2012) An improved contextual advertising matching approach based on wikipedia knowledge. Comput J 55(3):277–292

    Article  Google Scholar 

  32. Yih W, Goodman J, Carvalho VR (2006) Finding advertising keywords on web pages. In: Proceedings of the 15th world wide web conference (WWW’06), ACM, New York, pp 213–222

  33. Zhang Y, Vogel S (2004) Measuring confidence intervals for the machine translation evaluation metrics. In: Proceedings of the 10th international conference on theoretical and methodological issues in machine translation (TMI’04), ACM, New York, pp 4–6

Download references

Acknowledgments

We thank anonymous reviewers for their very useful comments and suggestions. This work is supported by grants from the National Natural Science Foundation of China (Nos. 61202171 and 61272018), the China Postdoctoral Science Foundation funded projects (Nos. 2013T60623 and 2012M521251), the Zhejiang Provincial Natural Science Foundation of China (Nos. LY12F01016 and LQ13F020009), the Provincial Natural Science Foundation of Hubei (No. 2013CFB415) and the Fundamental Research Funds for the Central Universities (No. CUGL120281).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zongda Wu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, G., Wu, Z., Li, G. et al. Improving contextual advertising matching by using Wikipedia thesaurus knowledge. Knowl Inf Syst 43, 599–631 (2015). https://doi.org/10.1007/s10115-014-0745-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-014-0745-z

Keywords

Navigation