Abstract
Named entity recognition and classification (NERC) is fundamental for natural language processing tasks such as information extraction, question answering, and topic detection. State-of-the-art NERC systems are based on supervised machine learning and hence need to be trained on (manually) annotated corpora. However, annotated corpora hardly exist for non-standard languages and labeling additional data manually is tedious and costly. In this article, we present a novel method to automatically generate (partially) annotated corpora for NERC by exploiting the link structure of Wikipedia. Firstly, Wikipedia entries in the source language are labeled with the NERC tag set. Secondly, Wikipedia language links are exploited to propagate the annotations in the target language. Finally, mentions of the labeled entities in the target language are annotated with the respective tags. The procedure results in a partially annotated corpus that is likely to contain unannotated entities. To learn from such partially annotated data, we devise two simple extensions of hidden Markov models and structural perceptrons. Empirically, we observe that using the automatically generated data leads to more accurate prediction models than off-the-shelf NERC methods. We demonstrate that the novel extensions of HMMs and perceptrons effectively exploit the partially annotated data and outperforms their baseline counterparts in all settings.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Altun, Y., McAllester, D., Belkin, M.: Maximum margin semi-supervised learning for structured variables. In: Advances in Neural Information Processing Systems (2006)
Altun, Y., Tsochantaridis, I., Hofmann, T.: Hidden Markov support vector machines. In: Proceedings of the International Conference on Machine Learning (2003)
Atserias, J., Zaragoza, H., Ciaramita, M., Attardi, G.: Semantically annotated snapshot of the english wikipedia. In: Proceedings of the Sixth International Language Resources and Evaluation (LREC 2008), Marrakech, Morocco. European Language Resources Association (ELRA), May 2008
Augenstein, I., Maynard, D., Ciravegna, F.: Relation extraction from the web using distant supervision. In: Janowicz, K., Schlobach, S., Lambrix, P., Hyvönen, E. (eds.) EKAW 2014. LNCS, vol. 8876, pp. 26–41. Springer, Heidelberg (2014)
Baum, L.E., Petrie, T., Soules, G., Weiss, N.: A maximization technique occurring in the statistical analysis of probabilistic functions of markov chains. Ann. Math. Stat. 41(1), 164–171 (1970)
Brefeld, U., Scheffer, T.: Semi-supervised learning for structured output variables. In: Proceedings of the International Conference on Machine Learning (2006)
Cao, L., Chen, C.W.: A novel product coding and recurrent alternate decoding scheme for image transmission over noisy channels. IEEE Trans. Commun. 51(9), 1426–1431 (2003)
Chapelle, O., Schölkopf, B., Zien, A.: Semi-supervised Learning. MIT Press, Cambridge (2006)
Ciaramita, M., Altun, Y.: Broad-coverage sense disambiguation and information extraction with a supersense sequence tagger. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) (2006)
Collins, M.: Discriminative reranking for natural language processing. In: Proceedings of the International Conference on Machine Learning (2000)
Collins, M., Ranking algorithms for named-entity extraction: boosting and the voted perceptron. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (2002)
Cucerzan, S., Yarowsky, D.: Bootstrapping a multilingual part-of-speech tagger in one person-day. In: Proceedings of CoNLL 2002, pp. 132–138 (2002)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. 39(1), 1–38 (1977)
Dietterich, T.G.: Machine learning for sequential data: a review. In: Caelli, T.M., Amin, A., Duin, R.P.W., Kamel, M.S., de Ridder, D. (eds.) SPR 2002 and SSPR 2002. LNCS, vol. 2396, pp. 15–30. Springer, Heidelberg (2002)
Fernandes, E.R., Brefeld, U.: Learning from partially annotated sequences. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011, Part I. LNCS, vol. 6911, pp. 407–422. Springer, Heidelberg (2011)
Ferrández, S., Toral, A., Ferrández, Ó., Ferrández, A., Muñoz, R.: Exploiting wikipedia and eurowordnet to solve cross-lingual question answering. Inf. Sci. 179(20), 3473–3488 (2009)
Forney, G.D.: The Viterbi algorithm. Proc. IEEE 61(3), 268–278 (1973)
Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Veloso, M.M. (ed.) IJCAI, pp. 1606–1611. Morgan Kaufmann Publishers Inc., San Francisco (2007)
Hammersley, J.M., Clifford, P.E.: Markov random fields on finite graphs and lattices. Unpublished manuscript (1971)
Juang, B., Rabiner, L.: Hidden Markov models for speech recognition. Technometrics 33, 251–272 (1991)
Kazama, J., Torisawa, K.: Exploiting Wikipedia as external knowledge for named entity recognition. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 698–707, Prague, Czech Republic, June 2007. Association for Computational Linguistics
Lafferty, J., Liu, Y., Zhu, X., Kernel conditional random fields: Representation, clique selection, and semi-supervised learning. Technical Report CMU-CS-04-115, School of Computer Science, Carnegie Mellon University (2004)
Lafferty, J., McCallum, A., Pereira, F., Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the International Conference on Machine Learning (2001)
Lafferty, J., Zhu, X., Liu, Y., Kernel conditional random fields: representation and clique selection. In: Proceedings of the International Conference on Machine Learning (2004)
Lee, C., Wang, S., Jiao, F., Greiner, R., Schuurmans, D.: Learning to model spatial dependency: Semi-supervised discriminative random fields. In: Advances in Neural Information Processing Systems (2007)
Liao, W., Veermamachaneni, S.: A simple semi-supervised algorithm for named entity recognition. In: Proceedings of the NAACL HLT Workshop on Semi-supervised Learning for Natural Language Processing (2009)
Màrquez, L., de Gispert, A., Carreras, X., Padró, L.: Low-cost named entity classification for catalan: exploiting multilingual resources and unlabeled data. In: Proceedings of the ACL 2003 Workshop on Multilingual and Mixed-language Named Entity Recognition, pp. 25–32, Sapporo, Japan, July 2003. Association for Computational Linguistics
McAllester, D., Hazan, T., Keshet, J.: Direct loss minimization for structured prediction. In: Advances in Neural Information Processing Systems (2010)
McCallum, A., Freitag, D., Pereira, F.: Maximum entropy Markov models for information extraction and segmentation. In: Proceedings of the International Conference on Machine Learning (2000)
Mihalcea, R.: Using wikipedia for automatic word sense disambiguation. In: Proceedings of NAACL HLT 2007, pp. 196–203 (2007)
Mika, P., Ciaramita, M., Zaragoza, H., Atserias, J.: Learning to tag and tagging to learn: a case study on wikipedia. IEEE Intell. Syst. 23, 26–33 (2008)
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Linguisticae Investigationes 30(1), 3–26 (2007). John Benjamins Publishing Company
Nothman, J., Murphy, T., Curran, J.R.: Analysing wikipedia and gold-standard corpora for ner training. In: EACL 2009: Proceedings of the 12th Conference of the EuropeanChapter of the Association for Computational Linguistics, pp. 612–620, Morristown, NJ, USA (2009). Association for Computational Linguistics
Overell, S., Sigurbjörnsson, B., van Zwol, R.: Classifying tags using open content resources. In: WSDM 2009: Proceedings of the Second ACM International Conference on Web Search and Data Mining, pp. 64–73. ACM, New York (2009)
Sebastian, P., Mirella, L.: Cross-linguistic projection of role-semantic information. In: HLT/EMNLP. The Association for Computational Linguistics (2005)
Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)
Richman, A.E., Schone, P.: Mining wiki resources for multilingual named entity recognition. In: Proceedings of ACL 2008: HLT, pp. 1–9, Columbus, Ohio, June 2008. Association for Computational Linguistics
Ruiz-casado, M., Alfonseca, E., Castells, P.: Automatising the learning of lexical patterns: an application to the enrichment of wordnet by extracting semantic relationships from wikipedia. J. Data Knowl. Eng. 61, 484–499 (2007)
Tjong Kim Sang, E.F.: Introduction to the CoNLL-2002 shared task: language-independent named entity recognition. In: COLING-2002: Proceedings of the 6th Conference on Naturallanguage Learning, pp. 1–4, Morristown, NJ, USA (2002). Association for Computational Linguistics
Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In: Proceedings of CoNLL-2003, pp. 142–147 (2003)
Scheffer, T., Wrobel, S.: Active hidden Markov models for information extraction. In: Proceedings of the International Symposium on Intelligent Data Analysis (2001)
Schwarz, R., Chow, Y.L.: The \(n\)-best algorithm: An efficient and exact procedure for finding the \(n\) most likely hypotheses. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (1990)
Snyder, B., Barzilay, R.: Cross-lingual propagation for morphological analysis. In: Fox, D., Gomes, C.P. (eds.) AAAI, pp. 848–854. AAAI Press, Menlo Park (2008)
Suzuki, J., Isozaki, H.: Semi-supervised sequential labeling and segmentation using giga-wordscale unlabeled data. In: Proceedings of ACL 2008: HLT (2008)
Taskar, B., Guestrin, C., Koller, D.: Max-margin Markov networks. In: Advances in Neural Information Processing Systems (2004)
Toral, A., Muñoz, R., Monachini, M.: Named entity wordnet. In: Proceedings of the Sixth International Language Resources and Evaluation (LREC 2008), Marrakech, Morocco. European Language Resources Association (ELRA), May 2008
Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. J. Mach. Learn. Res. 6, 1453–1484 (2005)
Wu, Y., Zhao, J., Xu, B., Yu, H.: Chinese named entity recognition based on multiple features. In: HLT 2005: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 427–434, Morristown, NJ, USA (2005). Association for Computational Linguistics
Xu, L., Wilkinson, D., Southey, F., Schuurmans, D.: Discriminative unsupervised learning of structured predictors. In: Proceedings of the International Conference on Machine Learning (2006)
Yarowsky, D., Ngai, G.: Inducing multilingual pos taggers and np bracketers via robust projection across aligned corpora. In: NAACL (2001)
Zien, A., Brefeld, U., Scheffer, T.: Transductive support vector machines for structured variables. In: Proceedings of the International Conference on Machine Learning (2007)
Zinkevich, M., Weimer, M., Smola, A., Li, L.: Parallelized stochastic gradient descent. In: Advances in Neural Information Processing Systems, vol. 23 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Fernandes, E.R., Brefeld, U., Blanco, R., Atserias, J. (2016). Using Wikipedia for Cross-Language Named Entity Recognition. In: Atzmueller, M., Chin, A., Janssen, F., Schweizer, I., Trattner, C. (eds) Big Data Analytics in the Social and Ubiquitous Context. SENSEML MUSE MSM 2015 2014 2014. Lecture Notes in Computer Science(), vol 9546. Springer, Cham. https://doi.org/10.1007/978-3-319-29009-6_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-29009-6_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-29008-9
Online ISBN: 978-3-319-29009-6
eBook Packages: Computer ScienceComputer Science (R0)