ABSTRACT
The recognition of Proper Nouns (PNs) is considered an important task in the area of Information Retrieval and Extraction. However the high performance of most existing PN classifiers heavily depends upon the availability of large dictionaries of domain-specific Proper Nouns, and a certain amount of manual work for rule writing or manual tagging. Though it is not a heavy requirement to rely on some existing PN dictionary (often these resources are available on the web), its coverage of a domain corpus may be rather low, in absence of manual updating. In this paper we propose a technique for the automatic updating of an PN Dictionary through the cooperation of an inductive and a probabilistic classifier. In our experiments we show that, whenever an existing PN Dictionary allows the identification of 50% of the proper nouns within a corpus, our technique allows, without additional manual effort, the successful recognition of about 90% of the remaining 50%.
- 1.Basili, g., Pazienza M.T., Velardi P., A (not-so) shallow parser for colloeational analysis. Proc. of Coling '94, Kyoto, Japan, 1994. Google ScholarDigital Library
- 2.Basili, R., Marziali A., Pazienza M.T., Modelling syntax uncertainty in lexical acquisition from texts. Journal of Quantitative Linguistics, vol. 1, n. 1, 1994.Google Scholar
- 3.Bikel D., Miller S., Schwartz R. and Weischedel R., Nymble: a High-Performance Learning Name-finder. Proc. of 5th Conference on Applied natural Language Processing, Washington, 1997 Google ScholarDigital Library
- 4.A. Borthwick, J. Sterling, E. Agichten and R. Gnshman. NYU: Description of the MENE named Entity system as Used in MUC-7. Proc. of MUC-7, 1998Google Scholar
- 5.Brill, E., Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part of Speech Tagging, Computational Linguistics, vol. 21, n. 24, 1995. Google ScholarDigital Library
- 6.Cowie, J. Description of the CRL/NMSU System Used for MUC-6. In {DARPA 1995}. Google ScholarDigital Library
- 7.Cucchiarelli A. and Velardi P., Finding a Domain- Appropriate Sense Inventory for Semantically Tagging a Corpus. Int. Journal on Natural Language Engineering, December 1998 Google ScholarDigital Library
- 8.Cucchiarelli A. and Velardi P, Using Corpus Evidence for Automatic Gazetteer Extension. Proc. of Conf, on Language Resources and Evaluation, Granada, Spain, 28-30 May 1998Google Scholar
- 9.Defense Advanced Research Projects Agency. Proceedings of the Sixth Message Understanding Conference (MUC-6), Morgan Kaufinann.Google Scholar
- 10.Defense Advanced Research Projects Agency. Proceedings of the Seventh Message Understanding Conference (MUC- 7), Morgan Kaufmann.Google Scholar
- 11.Day, D., Robinson, P., Vilain, M., and Yeh, A. Description of the ALEMBIC system as used for MUC-7. In {DARPA 1998}.Google Scholar
- 12.Gale, W. K. Church and D. Yarowsky. One sense per discourse. Proc. of the DARPA speech and Natural Language workshop, Harriman, NY, February 1992 Google ScholarDigital Library
- 13.Grishman, R., J. Sterling, Generalizing Automatically Generated Selectional Patterns. Proc. of COLING '94, Kyoto, August 1994. Google ScholarDigital Library
- 14.Humphreys, K., Gaizauskas, R., Cunningham, H., and Azzam, S. VIE Technical Specifications. Department of Computer Science, University of Sheffield.Google Scholar
- 15.Miller, George A., WordNet: a lexical database for English. Communications of the ACM 38 (11), November 1995, pp. 39 - 41 Google ScholarDigital Library
- 16.Quinlan, J. R., C4.5: Programs for machine learning, Morgan-Kaufmann, San Mateo, CA, 1993. Google ScholarDigital Library
- 17.S. Sekine, NYU System for Japanese NE-MET2. Proc. of MUC-7, 1998Google Scholar
- 18.Vilain, M., and Day, D., Finite-state phrase parsing by rule sequences. Proceedings of COLING.96, vol. 1, pp. 274-279. Google ScholarDigital Library
- 19.Yarowsky D., Word-Sense disambiguation using statistical models of Roget's categories trained on large corpora. Proc. of COLING 92, Nantes, July 1992. Google ScholarDigital Library
Index Terms
- Automatic adaptation of proper noun dictionaries through cooperation of machine learning and probabilistic methods
Recommendations
English-Arabic proper-noun transliteration-pairs creation
Proper nouns may be considered the most important query words in information retrieval. If the two languages use the same alphabet, the same proper nouns can be found in either language. However, if the two languages use different alphabets, the names ...
Converting on-line bilingual dictionaries from human-readable to machine-readable form
SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrievalWe describe a language called ABET that allows rapid conversion of on-line human-readable bilingual dictionaries to machine-readable form.
Machine Learning-based approach to automatic POS tagging of Macedonian language
BCI '17: Proceedings of the 8th Balkan Conference in InformaticsThis paper presents the research that has contributed to the creation of an automatic part-of-speech (POS) tagger of Macedonian, a Slavic language that has a rich morphology, but limited language resources and contributions towards establishing of ...
Comments