Copyright © 2007 Published by Elsevier Ltd.
Automatic summarising: The state of the art
Received 22 February 2007;
References and further reading may be available for this article. To view references and further reading you must purchase this article.
Abstract
This paper reviews research on automatic summarising in the last decade. This work has grown, stimulated by technology and by evaluation programmes. The paper uses several frameworks to organise the review, for summarising itself, for the factors affecting summarising, for systems, and for evaluation.
The review examines the evaluation strategies applied to summarising, the issues they raise, and the major programmes. It considers the input, purpose and output factors investigated in recent summarising research, and discusses the classes of strategy, extractive and non-extractive, that have been explored, illustrating the range of systems built.
The conclusions drawn are that automatic summarisation has made valuable progress, with useful applications, better evaluation, and more task understanding. But summarising systems are still poorly motivated in relation to the factors affecting them, and evaluation needs taking much further to engage with the purposes summaries are intended to serve and the contexts in which they are used.
Keywords: Summarization; Evaluation; Sentence extraction; Abstraction natural language processing
Article Outline
- 1. Introduction
- 2. Discussion framework
- 3. Summary evaluation
- 3.1. Summary evaluation concepts
- 3.1.1. Text quality
- 3.1.2. Concept capture
- 3.1.3. Gold standards
- 3.1.4. Baselines and benchmarks
- 3.1.5. Recognising purpose
- 3.1.6. Purpose evaluations
- 3.2. Summary evaluation programmes
- 3.2.1. The DUC evaluations
- 3.2.2. Other programmes
- 3.3. Assessment of evaluations
- 4. Factors explored
- 4.1. Input factors
- 4.1.1. Form factors
- 4.1.2. Language
- 4.1.3. Register
- 4.1.4. Medium
- 4.1.5. Structure
- 4.1.6. Genre
- 4.1.7. Length
- 4.1.8. Other input factors
- 4.1.9. Subject
- 4.1.10. Units
- 4.1.11. Authorship
- 4.1.12. Header
- 4.2. Purpose factors
- 4.2.1. Use
- 4.2.2. Audience
- 4.2.3. Envelope factors
- 4.2.4. Time
- 4.2.5. Location
- 4.2.6. Formality
- 4.2.7. Triggering
- 4.2.8. Destination
- 4.3. Output factors
- 4.3.1. Material factors
- 4.3.2. Coverage
- 4.3.3. Reduction
- 4.3.4. Derivation
- 4.3.5. Speciality
- 4.3.6. Style
- 4.3.7. Format factors
- 4.3.8. Language
- 4.3.9. Register
- 4.3.10. Medium
- 4.3.11. Structure
- 4.3.12. Genre
- 4.4. Factor lessons
- 5. Systems: approaches and structures
- 5.1. Extractive strategies
- 5.1.1. Basic statistical approaches
- 5.2. Enriched statistical approaches: lexical units and features
- 5.3. Enriched statistical approaches: structures
- 5.4. Comments on extractive summarising
- 5.5. Machine learning
- 5.6. Non-extractive strategies
- 5.7. Comments on system characteristics
- 5.8. Factor influences on strategy choices
- 5.9. Exemplar systems
- 5.10. MEAD ([Radev et al., 2001b] and [Radev et al., 2004])
- 5.11. Newsblaster (NWBL, McKeown et al., 2002)
- 5.12. GISTexter (Harabagiu & Lacatusu, 2002)
- 5.13. Verbmobil (Reithinger et al., 2000)
- 6. Conclusion
- Further Reading
- References






E-mail Article
Add to my Quick Links

Cited By in Scopus (0)






