Abstract
As a prevalent type of Web advertising, contextual advertising refers to the placement of the most relevant commercial ads within the content of a Web page, to provide a better user experience and as a result increase the user’s ad-click rate. However, due to the intrinsic problems of homonymy and polysemy, the low intersection of keywords, and a lack of sufficient semantics, traditional keyword matching techniques are not able to effectively handle contextual matching and retrieve relevant ads for the user, resulting in an unsatisfactory performance in ad selection. In this paper, we introduce a new contextual advertising approach to overcome these problems, which uses Wikipedia thesaurus knowledge to enrich the semantic expression of a target page (or an ad). First, we map each page into a keyword vector, upon which two additional feature vectors, the Wikipedia concept and category vector derived from the Wikipedia thesaurus structure, are then constructed. Second, to determine the relevant ads for a given page, we propose a linear similarity fusion mechanism, which combines the above three feature vectors in a unified manner. Last, we validate our approach using a set of real ads, real pages along with the external Wikipedia thesaurus. The experimental results show that our approach outperforms the conventional contextual advertising matching approaches and can substantially improve the performance of ad selection.
Similar content being viewed by others
Notes
PricewaterhouseCoopers—www.pwc.com.
Introduction to Computational Advertising—www.stanford.edu/class/msande239/.
Wikipedia—www.wikipedia.org.
Netease Youdao Dictionary—http://dict.youdao.com.
Google Adwords—http://adwords.google.com.
Wikimedia dump—http://download.wikimedia.org/enwiki.
Mwdumper—www.mediawiki.org/wiki/Mwdumper.
some categories in Wikipedia without main articles would not be included in the taxonomy, so the number of categories in the taxonomy is about 8,000.
References
Anagnostopoulos A, Broder AZ, Gabrilovich E, Josifovski V, Riedel L (2011) Web page summarization for just-in-time contextual advertising. ACM Trans Intell Syst Tech 3(1):14:1–14:32
Anagnostopoulos A, Broder A, Gabrilovich E, Josifovski V, Riedel L (2007) Just-in-time contextual advertising. In: Proceedings of the 16th ACM international conference on information and knowledge management (CIKM’07), ACM, New York, pp 331–340
Broder A, Fontoura M, Josifovski V, Riedel L (2007) A semantic approach to contextual advertising. In: Proceedings of the 34th annual ACM SIGIR conference (SIGIR’07), ACM, New York, pp 559–566
Chatterjee P, Hoffman DL, Novak TP (2003) Modeling the clickstream: implications for web-based advertising efforts. Mark Sci 22(4):520–541
Ciaramita M, Murdock V, Plachouras V (2008) Semantic associations for contextual advertising. J Electron Commer Res 9(1):1–15
Comprehensive perl archive network (2007) http://search.cpan.org/jzhang/HTML-ContentExtractor-0.03/lib/HTML/ContentExtractor.pm
Evgeniy G, Shaul M (2009) Wikipedia-based semantic interpretation for natural language processing. J Artif Intell Res 34:443–498
Evgeniy G, Shaul M (2007) Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: Proceeding of the 20th AAAI international conference on artificial intelligence (AAAI’11), AAAI, San Francisco
Fan TK, Chang CH (2010) Sentiment-oriented contextual advertising. Knowl Inf Syst 23(3):321–344
Fan TK, Chang CH (2011) Blogger-centric contextual advertising. Expert Syst Appl 38(3):1777–1788
Gupta S, Kaiser GE, Grimm P, Chiang MF, Starren J (2005) Automating content extraction of html documents. World Wide Web J 8(2):179–224
Gupta S, Kaiser GE, Grimm P, Chiang MF, Starren J (2009) Knowledge distribution via shared context between blog-based knowledge management systems: a case study of collaborative tagging. Expert Syst Appl 36(2):10,627–10,633
Hovy E, Navigli R, Ponzetto SP, Collaboratively built semi-structured content and artificial intelligence: the story so far. Artif Intell 194
Hu XH, Zhang XD, Lu CM, Park EK, Zhou XH (2009) Exploiting wikipedia as external knowledge for document clustering. In: Proceedings of the 15th ACM SIGKDD conference on knowledge discovery and data mining (SIGKDD’09), ACM, New York, pp 389–396
Hu J, Fang LJ, Cao Y, Zeng HJ, Li H, Yang Q, Chen Z (2008) Enhancing text clustering by leveraging wikipedia semantics. In: Proceedings of the 35th annual ACM SIGIR conference (SIGIR’08), ACM, New York, pp 179–186
Hu J, Wang G, Lochovsky F, Sun JT, Chen Z (2009) Understanding user’s query intent with wikipedia. In: Proceedings of the 18th world wide web conference (WWW’09), ACM, New York, pp 471–480
Lacerda A, Cristo M, Andre MG, Fan W, Ziviani N, Ribeiro-Neto B (2006) Learning to advertise. In: Proceedings of the 33th annual ACM SIGIR conference (SIGIR’06), ACM, New York, pp 549–556
Mei T, Hua XS, Li SP (2011) Contextual internet multimedia advertising. Proc IEEE 98(8):1416–1433
Mei T, Hua XS, Li SP (2008) Contextual in-image advertising. In: Proceeding of the 16th ACM international conference on multimedia (MM’08), ACM, New York, pp 439–448
Milne D, Medelyan O, Witten IH (2006) Mining domain-specific thesauri from wikipedia: a case study. In: Proceedings of the 2006 IEEE/WIC/ACM international conference on web intelligence, IEEE, Los Alamitos, pp 442–448
Murdock V, Ciaramita M, Plachouras V (2007) A noisy-channel approach to contextual advertising. In: Proceedings of SIGKDD workshops 07, ACM, New York, pp 21–27
Pak AN (2011) Using wikipedia to improve prevision of contextual advertising. In: Proceedings of the 4th international conference on human language technology: challenges for computer science and linguistics (LTC’09), Springer, Berlin, pp 533–543
Pak AN, Chung CW (2010) A wikipedia matching approach to contextual advertising. World Wide Web J 13(3):251–274
Papadopoulos S, Menemenis F, Kompatsiaris Y, Brato B (2009) Lexical graphs for improved contextual ad recommendation. In: Proceedings of the 31st European conference on information retrieval (ECIR’09), pp 216–227
Ribeiro-Neto B, Cristo M, Golgher PB, Moura ES (2005) Impedance coupling in content-targeted advertising. In: Proceedings of the 32th annual ACM SIGIR conference (SIGIR’05), ACM, New York, pp 496–503
Salton G, Wong A, Yang C (1976) A vector space model for automatic indexing. Commun ACM 18(11):613–620
Wang P, Hu J, Zeng HJ, Chen Z (2009) Using wikipedia knowledge to improve text classification. Knowl Inf Syst 19(3):265–281
Wang C, Zhang P, Choi R, Eredita M (2002) Understanding consumers attitude toward advertising. In: Proceeding of the 8th Americas conference on information systems (AMCIS’02), pp 1143–1148
Wu ZD, Xu GD, Pan R, Zhang YC, Hu ZW, Lu JF (2011) Leveraging wikipedia concept and category information to enhance contextual advertising. In: Proceedings of the 20th ACM international conference on information and knowledge management (CIKM’11), ACM, New York, pp 2105–2108
Wu HC, Luk RWP, Wong KF, Kwok KL (2008) Interpreting tf-idf term weights as making relevance decisions. ACM Trans Inf Syst 26(3):13–50
Wu ZD, Xu GD, Zhang YC, Dolog P, Lu CL (2012) An improved contextual advertising matching approach based on wikipedia knowledge. Comput J 55(3):277–292
Yih W, Goodman J, Carvalho VR (2006) Finding advertising keywords on web pages. In: Proceedings of the 15th world wide web conference (WWW’06), ACM, New York, pp 213–222
Zhang Y, Vogel S (2004) Measuring confidence intervals for the machine translation evaluation metrics. In: Proceedings of the 10th international conference on theoretical and methodological issues in machine translation (TMI’04), ACM, New York, pp 4–6
Acknowledgments
We thank anonymous reviewers for their very useful comments and suggestions. This work is supported by grants from the National Natural Science Foundation of China (Nos. 61202171 and 61272018), the China Postdoctoral Science Foundation funded projects (Nos. 2013T60623 and 2012M521251), the Zhejiang Provincial Natural Science Foundation of China (Nos. LY12F01016 and LQ13F020009), the Provincial Natural Science Foundation of Hubei (No. 2013CFB415) and the Fundamental Research Funds for the Central Universities (No. CUGL120281).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Xu, G., Wu, Z., Li, G. et al. Improving contextual advertising matching by using Wikipedia thesaurus knowledge. Knowl Inf Syst 43, 599–631 (2015). https://doi.org/10.1007/s10115-014-0745-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-014-0745-z