skip to main content
10.5555/1557769.1557830dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
research-article
Free Access

HunPos: an open source trigram tagger

Published:25 June 2007Publication History

ABSTRACT

In the world of non-proprietary NLP software the standard, and perhaps the best, HMM-based POS tagger is TnT (Brants, 2000). We argue here that some of the criticism aimed at HMM performance on languages with rich morphology should more properly be directed at TnT's peculiar license, free but not open source, since it is those details of the implementation which are hidden from the user that hold the key for improved POS tagging across a wider variety of languages. We present HunPos, a free and open source (LGPL-licensed) alternative, which can be tuned by the user to fully utilize the potential of HMM architectures, offering performance comparable to more complex models, but preserving the ease and speed of the training and tagging process.

References

  1. Michele Banko and Robert C. Moore. 2004. Part of speech tagging in context. In COLING '04: Proceedings of the 20th international conference on Computational Linguistics, page 556, Morristown, NJ, USA. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Thorsten Brants. 2000. TnT -- a statistical part-of-speech tagger. In Proceedings of the Sixth Applied Natural Language Processing Conference (ANLP-2000), Seattle, WA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Kenneth Ward Church. 1988. A stochastic parts program and noun phrase parser for unrestricted text. In Proceedings of the second conference on Applied natural language processing, pages 136--143, Morristown, NJ, USA. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Dóra Csendes, Jánós Csirik, and Tibor Gyimóthy. 2004. The Szeged Corpus: A POS tagged and syntactically annotated Hungarian natural language corpus. In Karel Pala Petr Sojka, Ivan Kopecek, editor, Text, Speech and Dialogue: 7th International Conference, TSD, pages 41--47.Google ScholarGoogle ScholarCross RefCross Ref
  5. Steven J. DeRose. 1988. Grammatical category disambiguation by statistical optimization. Computational Linguistics, 14:31--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Damien Doligez, Jacques Garrigue, Didier Rémy, and Jérôme Vouillon, 2004. The Objective Caml system. Institut National de Recherche en Informatique et en Automatique.Google ScholarGoogle Scholar
  7. Jesús Giménez and Lluís Màrquez. 2003. Fast and accurate part-of-speech tagging: The svm approach revisited. In Proceedings of RANLP, pages 153--163.Google ScholarGoogle Scholar
  8. Jan Hajič, Pavel Krbec, Karel Oliva, Pavel Květoň, and Vladimír Petkevič. 2001. Serial combination of rules and statistics: A case study in Czech tagging. In Proceedings of the 39th Association of Computational Linguistics Conference, pages 260--267, Toulouse, France. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Dilek Z. Hakkani-Tür, Kemal Oflazer, and Gökhan Tür. 2000. Statistical morphological disambiguation for agglutinative languages. In Proceedings of the 18th conference on Computational linguistics, pages 285--291, Saarbrücken, Germany. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Péter Halácsy, András Kornai, Csaba Oravecz, Viktor Trón, and Dániel Varga. 2006. Using a morphological analyzer in high precision POS tagging of Hungarian. In Proceedings of LREC 2006, pages 2245--2248.Google ScholarGoogle Scholar
  11. Bryan Jurish. 2003. A hybrid approach to part-of-speech tagging. Technical report, Berlin-Brandenburgische Akademie der Wissenschaften.Google ScholarGoogle Scholar
  12. Adwait Ratnaparkhi. 1996. A maximum entropy model for part-of-speech tagging. In Karel Pala Petr Sojka, Ivan Kopecek, editor, Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 133--142, University of Pennsylvania.Google ScholarGoogle Scholar
  13. Noah A. Smith, David A. Smith, and Roy W. Tromble. 2005. Context-based morphological disambiguation with random fields. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, Vancouver. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Kristina Toutanova, Dan Klein, Christopher Manning, and Yoram Singer. 2003. Feature-rich part-of-speech tagging with a cyclic dependency network. In Proceedings of HLT-NAACL, pages 252--259. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Viktor Trón, Péter Halácsy, Péter Rebrus, András Rung, Péter Vajda, and Eszter Simon. 2006. Morphdb. hu: Hungarian lexical database and morphological grammar. In Proceedings of LREC 2006, pages 1670--1673.Google ScholarGoogle Scholar
  1. HunPos: an open source trigram tagger

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image DL Hosted proceedings
          ACL '07: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
          June 2007
          247 pages

          Publisher

          Association for Computational Linguistics

          United States

          Publication History

          • Published: 25 June 2007

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate85of443submissions,19%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader