Skip to main content

Named Entity Recognition Using Gazetteer of Hierarchical Entities

  • Conference paper
  • First Online:
Advances and Trends in Artificial Intelligence. From Theory to Practice (IEA/AIE 2019)

Abstract

This paper presents a named entity recognition method which finds predetermined entities in an unstructured text. The method uses word similarities based on typical word transformations (lemmatization and stemming), word embeddings and character level based similarity to map those entities onto words in the text. The approach is language independent, though language-dependent components are used for lemmatization, stemming and word embedding, and works on any given set of entities. Special attention is given to the entities which are represented in a hierarchical form with the hypernymy-hyponymy relation. The proposed method has the following advantages: it finds the normalized form of the recognized entity name; it is easy to adjust to a new domain; it respects the hierarchical organization of entities; and due to the modular approach can be constantly improved just by updating components for lemmatization, stemming or word embedding. The proposed entity recognition method was tested on a test set of tourist queries and hierarchical entities collected from Slovenia.info tourist portal.

Partially supported by Joint cooperation programme V-A Interreg Slovenia-Austria, project AS-IT-IC.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://spacy.io/api/phrasematcher.

  2. 2.

    https://spacy.io/.

  3. 3.

    https://github.com/clarinsi/reldi-tagger.

  4. 4.

    https://github.com/clarinsi/reldi-tokeniser.

  5. 5.

    https://www.slovenia.info/en/map.

  6. 6.

    https://sl.wikipedia.org/wiki/Seznam_ob%C4%8Din_v_Sloveniji.

  7. 7.

    https://translate.yandex.com/.

  8. 8.

    https://www.elastic.co/products/elasticsearch.

  9. 9.

    https://repo.ijs.si/DIS-AGENTS/entity-expert.

References

  1. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)

    Article  Google Scholar 

  2. Kazama, J., Torisawa, K.: Exploiting Wikipedia as external knowledge for named entity recognition. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) (2007)

    Google Scholar 

  3. Kozareva, Z.: Bootstrapping named entity recognition with automatically generated gazetteer lists. In: Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop, pp. 15–21. Association for Computational Linguistics (2006)

    Google Scholar 

  4. Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360 (2016)

  5. Maynard, D., Tablan, V., Ursu, C., Cunningham, H., Wilks, Y.: Named entity recognition from diverse text types. In: Recent Advances in Natural Language Processing 2001 Conference, pp. 257–274 (2001)

    Google Scholar 

  6. Mooney, C.Z., Duval, R.D., Duvall, R.: Bootstrapping: A Nonparametric Approach to Statistical Inference. No. 94-95. Sage, Thousand Oaks (1993)

    Google Scholar 

  7. Nadeau, D., Turney, P.D., Matwin, S.: Unsupervised named-entity recognition: generating gazetteers and resolving ambiguity. In: Lamontagne, L., Marchand, M. (eds.) AI 2006. LNCS (LNAI), vol. 4013, pp. 266–277. Springer, Heidelberg (2006). https://doi.org/10.1007/11766247_23

    Chapter  Google Scholar 

  8. Patel, A., Sands, A., Callison-Burch, C., Apidianaki, M.: Magnitude: a fast, efficient universal vector embedding utility package. arXiv preprint arXiv:1810.11190 (2018)

  9. Porter, M.F.: Snowball: a language for stemming algorithms (2001). http://snowball.tartarus.org/texts/introduction.html

  10. Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach. Pearson Education Limited, Kuala Lumpur (2016)

    MATH  Google Scholar 

  11. Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003-Volume 4, pp. 142–147. Association for Computational Linguistics (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jernej Zupančič .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Štravs, M., Zupančič, J. (2019). Named Entity Recognition Using Gazetteer of Hierarchical Entities. In: Wotawa, F., Friedrich, G., Pill, I., Koitz-Hristov, R., Ali, M. (eds) Advances and Trends in Artificial Intelligence. From Theory to Practice. IEA/AIE 2019. Lecture Notes in Computer Science(), vol 11606. Springer, Cham. https://doi.org/10.1007/978-3-030-22999-3_65

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-22999-3_65

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-22998-6

  • Online ISBN: 978-3-030-22999-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics