Abstract
In this paper we present a novel approach to disambiguate names based on two different types of semantic information: lexical and thematic. We propose to use translation-based language models to resolve the synonymy problem in every word match, and to use topic-based ranking function to capture rich thematic contexts for names. We test three ranking functions that combine lexical relatedness and thematic relatedness. The experiments on Wikipedia data set and TAC-KBP 2010 data set show that our proposed method is very effective for name disambiguation.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Dredze, M., McNamee, P., Rao, D., Gerber, A., Finin, T.: Entity disambiguation for knowledge base population. In: Proc. COLING 2010, pp. 277–285 (2010)
Bunescu, R.: Using encyclopedic knowledge for named entity disambiguation. In: EACL, pp. 9–16 (2006)
Cucerzan, S.: Large-scale named entity disambiguation based on Wikipedia data. In: Proc. EMNLP-CoNLL 2007, pp. 708–716 (June 2007)
Gottipati, S., Jiang, J.: Linking entities to a knowledge base with query expansion. In: Proc. EMNLP 2011, pp. 804–813 (2011)
Pilz, A., Paaß, G.: From names to entities using thematic context distance. In: Proc. CIKM 2011, pp. 857–866 (2011)
Kozareva, Z., Ravi, S.: Unsupervised name ambiguity resolution using a generative model. In: Proc. EMNLP 2011, pp. 105–112 (2011)
Mihalcea, R., Csomai, A.: Wikify!: linking documents to encyclopedic knowledge. In: Proc. CIKM 2007, pp. 233–242 (2007)
Medelyan, O., Witten, I.H., Milne, D.: Topic indexing with wikipedia. In: Proc. AAAI 2008 (2008)
Milne, D., Witten, I.H.: Learning to link with wikipedia. In: Proc. CIKM 2008, pp. 509–518 (2008)
Han, X., Sun, L.: A generative entity-mention model for linking entities with knowledge base. In: Proc. HLT 2011, pp. 945–954 (2011)
Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst. 22(2), 179–214 (2004)
Berger, A., Lafferty, J.: Information retrieval as statistical translation. In: Proc. SIGIR 1999, pp. 222–229 (1999)
Xue, X., Jeon, J., Croft, W.B.: Retrieval models for question and answer archives. In: Proc. SIGIR 2008, pp. 475–482 (2008)
Gao, J., He, X., Nie, J.Y.: Clickthrough-based translation models for web search: from word models to phrase models. In: Proc. CIKM 2010, pp. 1139–1148 (2010)
Lu, Y., Zhai, C., Sundaresan, N.: Rated aspect summarization of short comments. In: Proc. WWW 2009, pp. 131–140 (2009)
Kullback, S., Leibler, R.A.: On information and sufficiency. The Annals of Mathematical Statistics 22(1), 79–86 (1951)
Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proc. UAI 2004, pp. 487–494 (2004)
Heng, J., Ralph, G., Hoa, T.D., Kira, G., Joe, E.: Overview of the tac 2010 knowledge base population track. In: Proc. TAC 2010 (2010)
McCallum, A.K.: Mallet: A machine learning for language toolkit (2002), http://mallet.cs.umass.edu
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, J., Zhao, W.X., Yan, R., Wei, H., Nie, JY., Li, X. (2012). Using Lexical and Thematic Knowledge for Name Disambiguation. In: Hou, Y., Nie, JY., Sun, L., Wang, B., Zhang, P. (eds) Information Retrieval Technology. AIRS 2012. Lecture Notes in Computer Science, vol 7675. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35341-3_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-35341-3_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35340-6
Online ISBN: 978-3-642-35341-3
eBook Packages: Computer ScienceComputer Science (R0)