Abstract
We describe a broadly-applicable conservative error correcting model, N-fold Templated Piped Correction or NTPC (“nitpick”), that consistently improves the accuracy of existing high-accuracy base models. Under circumstances where most obvious approaches actually reduce accuracy more than they improve it, NTPC nevertheless comes with little risk of accidentally degrading performance. NTPC is particularly well suited for natural language applications involving high-dimensional feature spaces, such as bracketing and disambiguation tasks, since its easily customizable template-driven learner allows efficient search over the kind of complex feature combinations that have typically eluded the base models. We show empirically that NTPC yields small but consistent accuracy gains on top of even high-performing models like boosting. We also give evidence that the various extreme design parameters in NTPC are indeed necessary for the intended operating range, even though they diverge from usual practice.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Brill, E.: Transformation-based error-driven learning and natural language processing: A case study in part of speech tagging. Computational Linguistics 21(4), 543–565 (1995)
Carreras, X., Màrques, L., Padró, L.: Named entity extraction using AdaBoost. In: Roth, D., van den Bosch, A. (eds.) Proceedings of CoNLL 2002, Taipei, Taiwan, pp. 167–170 (2002)
Escudero, G., Marquez, L., Rigau, G.: Boosting applied to word sense disambiguation. In: European Conference on Machine Learning, pp. 129–141 (2000)
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55(1), 119–139 (1997)
Rivest, R.L.: Learning decision lists. Machine Learning 2(3), 229–246 (1987)
Schapire, R.E., Singer, Y.: Boostexter: A boosting-based system for text categorization. Machine Learning 2(3), 135–168 (2000)
Sang, E.T.K., Meulder, F.: Introduction to the CoNLL 2003 shared task: Language-independent named entity recognition. In: Daelemans, W., Osborne, M. (eds.) Proceedings of CoNLL 2003, Edmonton, Canada (2003)
Sang, E.T.K.: Introduction to the CoNLL 2002 shared task: Languageindependent named entity recognition. In: Roth, D., van den Bosch, A. (eds.) Proceedings of CoNLL 2002, Taipei, Taiwan, pp. 155–158 (2002)
Tsukamoto, K., Mitsuishi, Y., Sassano, M.: Learning with multiple stacking for named entity recognition. In: Roth, D., van den Bosch, A. (eds.) Proceedings of CoNLL 2002, Taipei, Taiwan, pp. 191–194 (2002)
Wu, D., Ngai, G., Carpuat, M., Larsen, J., Yang, Y.: Boosting for named entity recognition. In: Roth, D., van den Bosch, A. (eds.) Proceedings of CoNLL 2002, Taipei, Taiwan, pp. 195–198 (2002)
Wu, D., Ngai, G., Carpuat, M.: A stacked, voted, stacked model for named entity recognition. In: Daelemans, W., Osborne, M. (eds.) Proceedings of CoNLL 2003, Edmonton, Canada, pp. 200–203 (2003)
Wu, D., Ngai, G., Carpuat, M.: Raising the bar: Stacked conservative error correction beyond boosting. In: Fourth International Conference on Language Resources and Evaluation (LREC 2004), Lisbon (May 2004)
Wu, D., Ngai, G., Carpuat, M.: Why nitpicking works: Evidence for Occam’s Razor in error correctors. In: 20th International Conference on Computational Linguistics (COLING 2004), Geneva (August 2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wu, D., Ngai, G., Carpuat, M. (2005). NTPC: N-fold Templated Piped Correction. In: Su, KY., Tsujii, J., Lee, JH., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2004. IJCNLP 2004. Lecture Notes in Computer Science(), vol 3248. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30211-7_50
Download citation
DOI: https://doi.org/10.1007/978-3-540-30211-7_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24475-2
Online ISBN: 978-3-540-30211-7
eBook Packages: Computer ScienceComputer Science (R0)