Abstract
The benefits of the effective creation of Information Extraction (IE) in the last ten years, driven by the DARPA TIPSTER programme and the associated MUC evaluations, have been enormous, but it must now be time to ask what research issues face the systems we have built and what we should do next. We suggest that there are two classes of important research issues: those requiring detailed comparative evaluation of alternative approaches to IE subtasks and those to do with flexible adaptation of IE systems to new users and domains.
Both these classes of issues, we argue, can be profitably addressed within an architecture for language engineering called GATE, the General Architecture for Text Engineering. We describe GATE, which owes a great deal to the TIPSTER architecture, and also the LaSIE IE system, which is set within GATE and with which we have competed in MUC, and bring out the distinctive features that have led to its good performance in certain areas.
Within GATE, we can now reconfigure various Language Engineering modules so as to assemble alternative IE systems and then to compare their performance with LaSIE. In this way the environment provided by GATE will allow us to make significant strides in assessing alternative LE technologies and in rapidly adapting LE prototype systems for new users and domains.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Aberdeen, J., J. Burger, D. Day, L. Hirschman, P. Robinson, and M. Vilain.(1995). MITRE: Description of the Alembic System Used for MUC-6. In Proceedings of the Sixth Message Understanding Conference (MUC-6),pp. 141–156, Morgan Kaufmann.
Appelt, D., J. Hobbs, J. Bear, D. Israel, M. Kameyama, A. Kehler, D. Martin, K. Myers, and M. Tyson. (1995). SRI International FASTUS system: MUC-6 Test Results and Analysis. In Proceedings of the Sixth Message Understanding Conference (MUC-6),pp. 237–248. Morgan Kaufmann.
Beale, D., S. Nirenburg, and K. Mahesh. (1995). Semantic Analysis in the Mikrokosmos Machine Translation Project. In Proceedings of the Second Symposium on Natural Language Processing (SNLP-95), pp. 173–191.
Brill, E. (1992). A simple rule-based part-of-speech tagger. In Proceeding of the Third Conference on Applied Natural Language Processing, pp. 152–155, Trento, Italy.
Cowie, J. and W. Lehnert. (1996). Information extraction. Communications of the ACM, 39 (1), pp. 80–91.
Cunningham, H., Y. Wilks, and R.J. Gaizauskas. (1996). New Methods, Current Trends and Software Infrastructure for NLP. In Proceedings of the conference on New Methods in Natural Language Processing (NeMLaP-2),Bilkent University, Turkey, pp. 112. Also available as http://xxx.lanl.gov/ps/cmp-lg/9607025.
Cunningham, H., K. Humphreys, R. Gaizauskas, and Y. Wilks. (1997). Software Infrastructure for Natural Language Processing. In Proceedings of the Fifth Conference on Applied Natural Language Processing (ANLP-97),pp. 237–244. Available as http://xxx.lanl.gov/ps/9702005.
Defense Advanced Research Projects Agency. (1995). Proceedings of the Sixth Message Understanding Conference (MUC-6). Morgan Kaufmann.
ECRAN: Extraction of Content: Research at Near-Market. http://www2.echo.lu/langeng/en/lel/ecran/ecran.html. Site visited 29/05/97.
FACILE: Fast and Accurate Categorisation of Information by Language Engineering. http://www2.echo.lu/langeng/en/lel/facile/facile.html. Site visited 29/05/97.
Gaizauskas, R. (1995). XI: A Knowledge Representation Language Based on Cross-Classification and Inheritance. Technical Report CS-95–24, Department of Computer Science, University of Sheffield.
Gaizauskas, R., L.J. Cahill, and R. Evans. (1993). Description of the sussex system used for MUC-5. In Proceedings of the Fifth Message Undersanding Conference (MUC-5),pp. 321–335, Morgan Kaufmann.
Gaizauskas, R. and K. Humphreys. (1997). Using a semantic network for information extraction. Journal of Natural Language Engineering. In press.
Gaizauskas, R., T. Wakao, K Humphreys, H. Cunningham, and Y. Wilks. (1995). Description of the LaSIE system as used for MUC-6. In Proceedings of the Sixth Message Understanding Conference (MUC-6), pp. 207–220, Morgan Kaufman.
Gaizauskas, R., H. Cunningham, Y. Wilks, P. Rodgers, and K. Humphreys. (1996). GATE — an Environment to Support Research and Development in Natural Language Engineering. In Proceedings of the 8th IEEE International Conference on Tools with Artificial Intelligence (ICTAI-96), Toulouse, France.
Gazdar, G. and C. Mellish. (1989). Natural Language Processing in Prolog. Addison-Wesley, Wokingham.
Grishman, R. (1995). TIPSTER Architecture Design Document Version 1. 52 ( Tinman Architecture). Technical report, Department of Computer Science, New York University. Available at http://www.cs.nyu.edu/tipster .
Grishman, R. (1996). TIPSTER. Architecture Design Document Version 2. 2. Technical report, Defense Advanced Research Projects Agency. Available at http://www.tipster.org/.
Grishman, R. and B. Sundheim. (1996). Message understanding conference–6: A brief history. In Proceedings of the 16th International Conference on Computational Linguistics, Copenhagen, pp. 466–471.
Krupka, G.R. (1995). Description of the SRA System as used for MUC-6. In Proceedings of the Fourth Message Understanding Conference (MUC-6), pp. 221–236. Morgan Kaufmann.
Marcus, M.P., B. Santorini, and M.A. Marcinkiewicz. (1993). Building a large annotated corpus of english: The Penn treebank. Computational Linguistics, 19(2), pp. 313 330.
Miller, G.A., R. Beckwith, C. Fellbaum, D. Gross, and K. Miller. (1993). Introduction to WordNet: On-line. Distributed with the WordNet Software.
SPARKLE: Shallow parsing and knowledge extraction for language engineering. http://www2.echo.lu/langeng/en/lel/sparkle/sparkle.html. Site visited 10/06/97. Thompson, H.S. and D. McKelvie. (1996). A Software Architecture for Simple, Efficient
SGML Applications. In Proceedings of SGML Europe ‘86, Munich.Thurmair, G. (1997). Information extraction for intelligence systems. In Natural Language Processing: Extracting Information for Business Needs, Unicorn Seminars Ltd, London, pp. 135–149.
TREE: Trans European Employment.http://www2.echo.lu/langeng/en/lel/tree/tree.html. Site visited 29/05/97.
Wilks, Y., L. Guthrie, and B. Slator. (1996). Electric Words. MIT Press, Cambridge,MA.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Wilks, Y., Gaizauskas, R. (1999). LaSIE Jumps the GATE. In: Strzalkowski, T. (eds) Natural Language Information Retrieval. Text, Speech and Language Technology, vol 7. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-2388-6_8
Download citation
DOI: https://doi.org/10.1007/978-94-017-2388-6_8
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-5209-4
Online ISBN: 978-94-017-2388-6
eBook Packages: Springer Book Archive