ABSTRACT
In this paper, we present a learning approach to the scenario template task of information extraction, where information filling one template could come from multiple sentences. When tested on the MUC-4 task, our learning approach achieves accuracy competitive to the best of the MUC-4 systems, which were all built with manually engineered rules. Our analysis reveals that our use of full parsing and state-of-the-art learning algorithms have contributed to the good performance. To our knowledge, this is the first research to have demonstrated that a learning approach to the full-scale information extraction task could achieve performance rivaling that of the knowledge engineering approach.
- M. E. Califf and R. J. Mooney. 1999. Relational learning of pattern-match rules for information extraction. In Proceedings of AAAI99, pages 328--334. Google ScholarDigital Library
- E. Charniak, C. Hendrickson, N. Jacobson, and M. Perkowitz. 1993. Equations for part-of-speech tagging. In Proceedings of AAA193, pages 784--789.Google Scholar
- H. L. Chieu and H. T. Ng. 2002a. A maximum entropy approach to information extraction from semi-structured and free text. In Proceedings of AAAI02, pages 786--791. Google ScholarDigital Library
- H. L. Chieu and H. T. Ng. 2002b. Named entity recognition: A maximum entropy approach using global information. In Proceedings of COLING02, pages 190--196. Google ScholarDigital Library
- F. Ciravegna. 2001. Adaptive information extraction from text by rule induction and generalisation. In Proceedings of IJCAI01, pages 1251--1256. Google ScholarDigital Library
- M. Collins. 1999. Head-driven statistical models for natural language parsing. Ph.D. thesis, Department of Computer and Information Science, University of Pennsylvania. Google ScholarDigital Library
- R. O. Duda and P. E. Hart. 1973. Pattern Classification and Scene Analysis. Wiley, New York. Google ScholarDigital Library
- D. Fisher, S. Soderland, J. McCarthy, F. Feng, and W. Lehnert. 1995. Description of the UMass system as used for MUC-6. In Proceedings of MUC-6, pages 127--140. Google ScholarDigital Library
- D. Gildea and D. Jurafsky. 2000. Automatic labelling of semantic roles. In Proceedings of ACL00, pages 512--520. Google ScholarDigital Library
- S. Miller, M. Crystal, H. Fox, L. Ramshaw, R. Schwartz, R. Stone, R. Weischedel, and the Annotation Group. 1998. Algorithms that learn to extract information BBN: Description of the SIFT system as used for MUC-7. In Proceedings of MUC-7.Google Scholar
- J. R. Quinlan. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco. Google ScholarDigital Library
- A. Ratnaparkhi. 1998. Maximum Entropy Models for Natural Language Ambiguity Resolution. Ph.D. thesis, Department of Computer and Information Science, University of Pennsylvania. Google ScholarDigital Library
- L. Rau, G. Krupka, and P. Jacobs. 1992. GE NL-TOOLSET: MUC-4 test results and analysis. In Proceedings of MUC-4, pages 94--99. Google ScholarDigital Library
- D. Roth and W. Yih. 2001. Relational learning via propositional algorithms: An information extraction case study. In Proceedings of IJACI01, pages 1257--1263. Google ScholarDigital Library
- S. Soderland. 1999. Learning information extraction rules for semi-structured and free text. Machine Learning, 34(1/2/3):233--272. Google ScholarDigital Library
- W. M. Soon, H. T. Ng, and D. C. Y. Lim. 2001. A machine learning approach to coreference resolution of noun phrases. Computational Linguistics, 27(4):521--544. Google ScholarCross Ref
- V. N. Vapnik. 1995. The Nature of Statistical Learning Theory. Springer-Verlag, New York. Google ScholarDigital Library
- Closing the gap: learning-based information extraction rivaling knowledge-engineering methods
Recommendations
Towards Closing the Security Gap of Tweak-aNd-Tweak (TNT)
Advances in Cryptology – ASIACRYPT 2020AbstractTweakable block ciphers (TBCs) have been established as a valuable replacement for many applications of classical block ciphers. While several dedicated TBCs have been proposed in the previous years, generic constructions that build a TBC from a ...
Closing the Gap: A Learning Algorithm for Lost-Sales Inventory Systems with Lead Times
We consider a periodic-review, single-product inventory system with lost sales and positive lead times under censored demand. In contrast to the classical inventory literature, we assume the firm does not know the demand distribution a priori and makes an ...
Closing the Efficiency Gap Between Synchronous and Network-Agnostic Consensus
Advances in Cryptology – EUROCRYPT 2024AbstractIn the consensus problem, n parties want to agree on a common value, even if some of them are corrupt and arbitrarily misbehave. If the parties have a common input m, then they must agree on m.
Protocols solving consensus assume either a ...
Comments