skip to main content
10.3115/1075096.1075124dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
Article
Free Access

Closing the gap: learning-based information extraction rivaling knowledge-engineering methods

Published:07 July 2003Publication History

ABSTRACT

In this paper, we present a learning approach to the scenario template task of information extraction, where information filling one template could come from multiple sentences. When tested on the MUC-4 task, our learning approach achieves accuracy competitive to the best of the MUC-4 systems, which were all built with manually engineered rules. Our analysis reveals that our use of full parsing and state-of-the-art learning algorithms have contributed to the good performance. To our knowledge, this is the first research to have demonstrated that a learning approach to the full-scale information extraction task could achieve performance rivaling that of the knowledge engineering approach.

References

  1. M. E. Califf and R. J. Mooney. 1999. Relational learning of pattern-match rules for information extraction. In Proceedings of AAAI99, pages 328--334. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. E. Charniak, C. Hendrickson, N. Jacobson, and M. Perkowitz. 1993. Equations for part-of-speech tagging. In Proceedings of AAA193, pages 784--789.Google ScholarGoogle Scholar
  3. H. L. Chieu and H. T. Ng. 2002a. A maximum entropy approach to information extraction from semi-structured and free text. In Proceedings of AAAI02, pages 786--791. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. H. L. Chieu and H. T. Ng. 2002b. Named entity recognition: A maximum entropy approach using global information. In Proceedings of COLING02, pages 190--196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. F. Ciravegna. 2001. Adaptive information extraction from text by rule induction and generalisation. In Proceedings of IJCAI01, pages 1251--1256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Collins. 1999. Head-driven statistical models for natural language parsing. Ph.D. thesis, Department of Computer and Information Science, University of Pennsylvania. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. O. Duda and P. E. Hart. 1973. Pattern Classification and Scene Analysis. Wiley, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. Fisher, S. Soderland, J. McCarthy, F. Feng, and W. Lehnert. 1995. Description of the UMass system as used for MUC-6. In Proceedings of MUC-6, pages 127--140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Gildea and D. Jurafsky. 2000. Automatic labelling of semantic roles. In Proceedings of ACL00, pages 512--520. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Miller, M. Crystal, H. Fox, L. Ramshaw, R. Schwartz, R. Stone, R. Weischedel, and the Annotation Group. 1998. Algorithms that learn to extract information BBN: Description of the SIFT system as used for MUC-7. In Proceedings of MUC-7.Google ScholarGoogle Scholar
  11. J. R. Quinlan. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Ratnaparkhi. 1998. Maximum Entropy Models for Natural Language Ambiguity Resolution. Ph.D. thesis, Department of Computer and Information Science, University of Pennsylvania. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. L. Rau, G. Krupka, and P. Jacobs. 1992. GE NL-TOOLSET: MUC-4 test results and analysis. In Proceedings of MUC-4, pages 94--99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Roth and W. Yih. 2001. Relational learning via propositional algorithms: An information extraction case study. In Proceedings of IJACI01, pages 1257--1263. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Soderland. 1999. Learning information extraction rules for semi-structured and free text. Machine Learning, 34(1/2/3):233--272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. W. M. Soon, H. T. Ng, and D. C. Y. Lim. 2001. A machine learning approach to coreference resolution of noun phrases. Computational Linguistics, 27(4):521--544. Google ScholarGoogle ScholarCross RefCross Ref
  17. V. N. Vapnik. 1995. The Nature of Statistical Learning Theory. Springer-Verlag, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Closing the gap: learning-based information extraction rivaling knowledge-engineering methods

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image DL Hosted proceedings
        ACL '03: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
        July 2003
        571 pages

        Publisher

        Association for Computational Linguistics

        United States

        Publication History

        • Published: 7 July 2003

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate85of443submissions,19%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader