skip to main content
10.1145/345508.345563acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article
Free Access

Automatic adaptation of proper noun dictionaries through cooperation of machine learning and probabilistic methods

Authors Info & Claims
Published:01 July 2000Publication History

ABSTRACT

The recognition of Proper Nouns (PNs) is considered an important task in the area of Information Retrieval and Extraction. However the high performance of most existing PN classifiers heavily depends upon the availability of large dictionaries of domain-specific Proper Nouns, and a certain amount of manual work for rule writing or manual tagging. Though it is not a heavy requirement to rely on some existing PN dictionary (often these resources are available on the web), its coverage of a domain corpus may be rather low, in absence of manual updating. In this paper we propose a technique for the automatic updating of an PN Dictionary through the cooperation of an inductive and a probabilistic classifier. In our experiments we show that, whenever an existing PN Dictionary allows the identification of 50% of the proper nouns within a corpus, our technique allows, without additional manual effort, the successful recognition of about 90% of the remaining 50%.

References

  1. 1.Basili, g., Pazienza M.T., Velardi P., A (not-so) shallow parser for colloeational analysis. Proc. of Coling '94, Kyoto, Japan, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2.Basili, R., Marziali A., Pazienza M.T., Modelling syntax uncertainty in lexical acquisition from texts. Journal of Quantitative Linguistics, vol. 1, n. 1, 1994.Google ScholarGoogle Scholar
  3. 3.Bikel D., Miller S., Schwartz R. and Weischedel R., Nymble: a High-Performance Learning Name-finder. Proc. of 5th Conference on Applied natural Language Processing, Washington, 1997 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4.A. Borthwick, J. Sterling, E. Agichten and R. Gnshman. NYU: Description of the MENE named Entity system as Used in MUC-7. Proc. of MUC-7, 1998Google ScholarGoogle Scholar
  5. 5.Brill, E., Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part of Speech Tagging, Computational Linguistics, vol. 21, n. 24, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6.Cowie, J. Description of the CRL/NMSU System Used for MUC-6. In {DARPA 1995}. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7.Cucchiarelli A. and Velardi P., Finding a Domain- Appropriate Sense Inventory for Semantically Tagging a Corpus. Int. Journal on Natural Language Engineering, December 1998 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8.Cucchiarelli A. and Velardi P, Using Corpus Evidence for Automatic Gazetteer Extension. Proc. of Conf, on Language Resources and Evaluation, Granada, Spain, 28-30 May 1998Google ScholarGoogle Scholar
  9. 9.Defense Advanced Research Projects Agency. Proceedings of the Sixth Message Understanding Conference (MUC-6), Morgan Kaufinann.Google ScholarGoogle Scholar
  10. 10.Defense Advanced Research Projects Agency. Proceedings of the Seventh Message Understanding Conference (MUC- 7), Morgan Kaufmann.Google ScholarGoogle Scholar
  11. 11.Day, D., Robinson, P., Vilain, M., and Yeh, A. Description of the ALEMBIC system as used for MUC-7. In {DARPA 1998}.Google ScholarGoogle Scholar
  12. 12.Gale, W. K. Church and D. Yarowsky. One sense per discourse. Proc. of the DARPA speech and Natural Language workshop, Harriman, NY, February 1992 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13.Grishman, R., J. Sterling, Generalizing Automatically Generated Selectional Patterns. Proc. of COLING '94, Kyoto, August 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. 14.Humphreys, K., Gaizauskas, R., Cunningham, H., and Azzam, S. VIE Technical Specifications. Department of Computer Science, University of Sheffield.Google ScholarGoogle Scholar
  15. 15.Miller, George A., WordNet: a lexical database for English. Communications of the ACM 38 (11), November 1995, pp. 39 - 41 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. 16.Quinlan, J. R., C4.5: Programs for machine learning, Morgan-Kaufmann, San Mateo, CA, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17.S. Sekine, NYU System for Japanese NE-MET2. Proc. of MUC-7, 1998Google ScholarGoogle Scholar
  18. 18.Vilain, M., and Day, D., Finite-state phrase parsing by rule sequences. Proceedings of COLING.96, vol. 1, pp. 274-279. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. 19.Yarowsky D., Word-Sense disambiguation using statistical models of Roget's categories trained on large corpora. Proc. of COLING 92, Nantes, July 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Automatic adaptation of proper noun dictionaries through cooperation of machine learning and probabilistic methods

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              SIGIR '00: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
              July 2000
              396 pages
              ISBN:1581132263
              DOI:10.1145/345508

              Copyright © 2000 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 1 July 2000

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • Article

              Acceptance Rates

              Overall Acceptance Rate792of3,983submissions,20%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader