research-article

Free Access

HunPos: an open source trigram tagger

Authors:
Péter Halácsy

Budapest U. of Technology, Budapest, Stoczek

Budapest U. of Technology, Budapest, Stoczek
View Profile

,
András Kornai

MetaCarta Inc., Cambridge, MA

MetaCarta Inc., Cambridge, MA
View Profile

,
Csaba Oravecz

Institute of Linguistics, Budapest, Benczur

Institute of Linguistics, Budapest, Benczur
View Profile

ACL '07: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration SessionsJune 2007Pages 209–212

Published:25 June 2007Publication History

ACL '07: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions

Pages 209–212

ABSTRACT

In the world of non-proprietary NLP software the standard, and perhaps the best, HMM-based POS tagger is TnT (Brants, 2000). We argue here that some of the criticism aimed at HMM performance on languages with rich morphology should more properly be directed at TnT's peculiar license, free but not open source, since it is those details of the implementation which are hidden from the user that hold the key for improved POS tagging across a wider variety of languages. We present HunPos, a free and open source (LGPL-licensed) alternative, which can be tuned by the user to fully utilize the potential of HMM architectures, offering performance comparable to more complex models, but preserving the ease and speed of the training and tagging process.

References

Michele Banko and Robert C. Moore. 2004. Part of speech tagging in context. In COLING '04: Proceedings of the 20th international conference on Computational Linguistics, page 556, Morristown, NJ, USA. Association for Computational Linguistics. Google ScholarDigital Library
Thorsten Brants. 2000. TnT -- a statistical part-of-speech tagger. In Proceedings of the Sixth Applied Natural Language Processing Conference (ANLP-2000), Seattle, WA. Google ScholarDigital Library
Kenneth Ward Church. 1988. A stochastic parts program and noun phrase parser for unrestricted text. In Proceedings of the second conference on Applied natural language processing, pages 136--143, Morristown, NJ, USA. Association for Computational Linguistics. Google ScholarDigital Library
Dóra Csendes, Jánós Csirik, and Tibor Gyimóthy. 2004. The Szeged Corpus: A POS tagged and syntactically annotated Hungarian natural language corpus. In Karel Pala Petr Sojka, Ivan Kopecek, editor, Text, Speech and Dialogue: 7th International Conference, TSD, pages 41--47.Google ScholarCross Ref
Steven J. DeRose. 1988. Grammatical category disambiguation by statistical optimization. Computational Linguistics, 14:31--39. Google ScholarDigital Library
Damien Doligez, Jacques Garrigue, Didier Rémy, and Jérôme Vouillon, 2004. The Objective Caml system. Institut National de Recherche en Informatique et en Automatique.Google Scholar
Jesús Giménez and Lluís Màrquez. 2003. Fast and accurate part-of-speech tagging: The svm approach revisited. In Proceedings of RANLP, pages 153--163.Google Scholar
Jan Hajič, Pavel Krbec, Karel Oliva, Pavel Květoň, and Vladimír Petkevič. 2001. Serial combination of rules and statistics: A case study in Czech tagging. In Proceedings of the 39th Association of Computational Linguistics Conference, pages 260--267, Toulouse, France. Google ScholarDigital Library
Dilek Z. Hakkani-Tür, Kemal Oflazer, and Gökhan Tür. 2000. Statistical morphological disambiguation for agglutinative languages. In Proceedings of the 18th conference on Computational linguistics, pages 285--291, Saarbrücken, Germany. Google ScholarDigital Library
Péter Halácsy, András Kornai, Csaba Oravecz, Viktor Trón, and Dániel Varga. 2006. Using a morphological analyzer in high precision POS tagging of Hungarian. In Proceedings of LREC 2006, pages 2245--2248.Google Scholar
Bryan Jurish. 2003. A hybrid approach to part-of-speech tagging. Technical report, Berlin-Brandenburgische Akademie der Wissenschaften.Google Scholar
Adwait Ratnaparkhi. 1996. A maximum entropy model for part-of-speech tagging. In Karel Pala Petr Sojka, Ivan Kopecek, editor, Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 133--142, University of Pennsylvania.Google Scholar
Noah A. Smith, David A. Smith, and Roy W. Tromble. 2005. Context-based morphological disambiguation with random fields. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, Vancouver. Google ScholarDigital Library
Kristina Toutanova, Dan Klein, Christopher Manning, and Yoram Singer. 2003. Feature-rich part-of-speech tagging with a cyclic dependency network. In Proceedings of HLT-NAACL, pages 252--259. Google ScholarDigital Library
Viktor Trón, Péter Halácsy, Péter Rebrus, András Rung, Péter Vajda, and Eszter Simon. 2006. Morphdb. hu: Hungarian lexical database and morphological grammar. In Proceedings of LREC 2006, pages 1670--1673.Google Scholar

HunPos: an open source trigram tagger

Recommendations

Unsupervised Joint PoS Tagging and Stemming for Agglutinative Languages

The number of possible word forms is theoretically infinite in agglutinative languages. This brings up the out-of-vocabulary (OOV) issue for part-of-speech (PoS) tagging in agglutinative languages. Since inflectional morphology does not change the PoS ...
Read More
SemEval-2010 task 3: cross-lingual word sense disambiguation
SEW '09: Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions

We propose a multilingual unsupervised Word Sense Disambiguation (WSD) task for a sample of English nouns. Instead of providing manually sensetagged examples for each sense of a polysemous noun, our sense inventory is built up on the basis of the ...
Read More
Hindi Word Sense Disambiguation Using Lesk Approach on Bigram and Trigram Words
AICTC '16: Proceedings of the International Conference on Advances in Information Communication Technology & Computing

Word Sense Disambiguation (WSD) is a vital task which provides the definition of particular words according to their sense or according to given context. Lesk algorithm is originally based on the gloss overlap that can be observed as the measure, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ACL '07: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
June 2007
247 pages
Conference Chair:
Sophia Ananiadou
University of Manchester (UK)
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 25 June 2007
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate85of443submissions,19%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 19
  Total Citations
  View Citations
- 647
  Total Downloads
- Downloads (Last 12 months)12
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HunPos: an open source trigram tagger

ACL '07: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions

ABSTRACT

References

Cited By

Recommendations

Unsupervised Joint PoS Tagging and Stemming for Agglutinative Languages

SemEval-2010 task 3: cross-lingual word sense disambiguation

Hindi Word Sense Disambiguation Using Lesk Approach on Bigram and Trigram Words

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

HunPos: an open source trigram tagger

ACL '07: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions

ABSTRACT

References

Cited By

Recommendations

Unsupervised Joint PoS Tagging and Stemming for Agglutinative Languages

SemEval-2010 task 3: cross-lingual word sense disambiguation

Hindi Word Sense Disambiguation Using Lesk Approach on Bigram and Trigram Words

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media