Abstract
Extracting aliases of an entity is important for various tasks such as identification of relations among entities, web search and entity disambiguation. To extract relations among entities properly, one must first identify those entities. We propose a novel approach to find aliases of a given name using automatically extracted lexical patterns. We exploit a set of known names and their aliases as training data and extract lexical patterns that convey information related to aliases of names from text snippets returned by a web search engine. The patterns are then used to find candidate aliases of a given name. We use anchor texts to design a word co-occurrence model and use it to define various ranking scores to measure the association between a name and a candidate alias. The ranking scores are integrated with page-count-based association measures using support vector machines to leverage a robust alias detection method. The proposed method outperforms numerous baselines and previous work on alias extraction on a dataset of personal names, achieving a statistically significant mean reciprocal rank of 0.6718. Experiments carried out using a dataset of location names and Japanese personal names suggest the possibility of extending the proposed method to extract aliases for different types of named entities and for other languages. Moreover, the aliases extracted using the proposed method improve recall by 20% in a relation-detection task.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Artiles, J., Gonzalo, J., Verdejo, F.: A testbed for people searching strategies in the www. In: Proc. of SIGIR 2005, pp. 569–570 (2005)
Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval. ACM Press, New York (1999)
Bagga, A., Baldwin, B.: Entity-based cross-document coreferencing using the vector space model. In: Proc. of COLING 1998, pp. 79–85 (1998)
Bekkerman, R., McCallum, A.: Disambiguating web appearances of people in a social network. In: Proc. of WWW 2005, pp. 463–470 (2005)
Berland, M., Charniak, E.: Finding parts in very large corpora. In: Proc. of ACL 1999, pp. 57–64 (1999)
Bollegala, D., Matsuo, Y., Ishizuka, M.: Measuring semantic similarity between words using web search engines. In: Proc. of WWW 2007, pp. 757–766 (2007)
Chakrabarti, S.: Mining the Web: Discovering Knowledge from Hypertext Data. Morgan Kaufmann, San Francisco (2003)
Galvez, C., Moya-Anegon, F.: Approximate personal name-matching through finite-state graphs. Journal of the American Society for Information Science and Technology 58, 1–17 (2007)
Guha, R.V., McCool, R., Miller, E.: Semantic search. In: Proc. of WWW 2003, pp. 700–709 (2003)
Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proc. of COLING 1992, pp. 539–545 (1992)
Hokama, T., Kitagawa, H.: Extracting mnemonic names of people from the web. In: Sugimoto, S., Hunter, J., Rauber, A., Morishima, A. (eds.) ICADL 2006. LNCS, vol. 4312, pp. 121–130. Springer, Heidelberg (2006)
Joachims, T.: Optimizing search engines using clickthrough data. In: Proc. of KDD 2002 (2002)
Lin, D.: Automatic retrieval and clustering of similar words. In: Proc. of COLING 1998, pp. 768–774. Association for Computational Linguistics, Morristown (1998)
Mann, G.S., Yarowsky, D.: Unsupervised personal name disambiguation. In: Proc. of CoNLL 2003, pp. 33–40 (2003)
Manning, C., Schutze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
Matsuo, Y., Mori, J., Hamasaki, M., Ishida, K., Nishimura, T., Takeda, H., Hasida, K., Ishizuka, M.: Polyphonet: An advanced social network extraction system. In: Proc. of WWW 2006 (2006)
Ravichandran, D., Hovy, E.: Learning surface text patterns for a question answering system. In: Proc. of ACL 2002, pp. 41–47 (2001)
Snow, R., Jurafsky, D., Ng, Y.: Learning syntactic patterns for automatic hypernym discovery. In: Proc. of NIPS 2005 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bollegala, D., Honma, T., Matsuo, Y., Ishizuka, M. (2008). Automatically Extracting Personal Name Aliases from the Web. In: Nordström, B., Ranta, A. (eds) Advances in Natural Language Processing. GoTAL 2008. Lecture Notes in Computer Science(), vol 5221. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85287-2_8
Download citation
DOI: https://doi.org/10.1007/978-3-540-85287-2_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85286-5
Online ISBN: 978-3-540-85287-2
eBook Packages: Computer ScienceComputer Science (R0)