Abstract
In this paper we describe an extractive method of creating very short summaries or gists that capture the essence of a news story using a linguistic technique called lexical chaining. The recent interest in robust gisting and title generation techniques originates from a need to improve the indexing and browsing capabilities of interactive digital multimedia systems. More specifically these systems deal with streams of continuous data, like a news programme, that require further annotation before they can be presented to the user in a meaningful way. We automatically evaluate the performance of our lexical chaining-based gister with respect to four baseline extractive gisting methods on a collection of closed caption material taken from a series of news broadcasts. We also report results of a human-based evaluation of summary quality. Our results show that our novel lexical chaining approach to this problem outperforms standard extractive gisting methods.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Smeaton, A.F., Lee, H., O’Connor, N., Marlow, S., Murphy, N.: TV News Story Segmentation, Personalisation and Recommendation. In: AAAI 2003 Spring Symposium on Intelligent Multimedia Knowledge Management, Stanford University, March 24-26 (2003)
Document Understanding Conferences (DUC): http://www-nlpir.nist.gov/projects/duc/intro.html
Witbrock, M., Mittal, V.: Ultra-Summarisation: A Statistical approach to generating highly condensed non-extractive summaries. In: The Proceedings of the ACM-SIGIR, pp. 315–316 (1999)
Morris, J., Hirst, G.: Lexical Cohesion by Thesaural Relations as an Indicator of the Structure of Text. Computational Linguistics 17(1) (1991)
Halliday, M.A.K.: Spoken and Written Language. Oxford University Press, Oxford (1985)
Green, S.J.: Automatically Generating Hypertext By Comparing Semantic Similarity. University of Toronto, Technical Report number 366 (October 1997)
Barzilay, R., Elhadad, M.: Using Lexical Chains for Text Summarization. In: The proceedings of the Intelligent Scalable Text Summarization Workshop (ISTS 1997), ACL (1997)
Silber, G.H., McCoy, K.F.: Efficiently Computed Lexical Chains as an Intermediate Representation for Automatic Text Summarization. Computational Linguistics 28(4), 487–496 (2002)
Fuentes, M., Rodriguez, H., Alonso, L.: Mixed Approach to Headline Extraction for DUC 2003. In: The Proceedings of the HLT/NAACL workshop on Automatic Summarization/Document Understanding Conference (DUC 2003) (2003)
Chali, Y., Kolla, M., Singh, N., Zhang, Z.: The University of Lethbridge Text Summarizer at DUC 2003. In: The Proceedings of the HLT/NAACL workshop on Automatic Summarization/Document Understanding Conference (DUC 2003) (2003)
St-Onge, D.: Detecting and Correcting Malapropisms with Lexical Chains, Dept. of Computer Science, University of Toronto, M.Sc. Thesis (1995)
Stairmand, M.A.: A Computational Analysis of Lexical Cohesion with Applications in IR, PhD Thesis, Dept. of Language Engineering, UMIST (1996)
Stokes, N., Carthy, J.: First Story Detection using a Composite Document Representation. In: The Proceedings of the Human Language Technology Conference, pp. 134–141 (2001)
Stokes, N., Carthy, J., Smeaton, A.F.: Segmenting Broadcast News Streams using Lexical Chains. In: The Proceedings of STAIRS, pp. 145–154 (2002)
Okumura, M., Honda, T.: Word sense disambiguation and text segmentation based on lexical cohesion. In: Proceedings of COLING 1994, pp. 755–761 (1994)
Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.: Five Papers on WordNet. CSL Report 43, Cognitive Science Laboratory, Princeton University (July 1990)
Allan, J., Carbonell, J., Doddington, G., Yamron, J., Yang, Y.: Topic Detection and Tracking Pilot Study Final Report. In: The proceedings of the DARPA Broadcasting News Workshop, pp. 194–218 (1998)
Justeson, J.S., Katz, S.M.: Technical terminology: some linguistic properties and an algorithm for identification in text. Natural Language Engineering (11), 9–27 (1995)
Xu, J., Broglio, J., Croft, W.B.: The design and implementation of a part of speech tagger for English. Technical Report IR-52, University of Massachusetts, Amherst, Center for Intelligent Information Retrieval (1994)
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
Jin, R., Hauptmann, A.G.: A new probabilistic model for title generation. In: The Proceedings of the International Conference on Computational Linguistics (2002)
Dimitrov, M.: A light-weight approach to co-reference resolution for named entities in text, Master’s Thesis, University of Sofia (2002)
Kraaij, W., Spitters, M., Hulth, A.: Headline extraction based on a combination of uni- and multi-document summarization techniques. In: The Proceedings of the ACL workshop on Automatic Summarization/Document Understanding Conference (DUC 2002) (2002)
Alfonseca, E., Rodriguez, P.: Description of the UAM system for generating very short summaries at DUC 2003. In: The Proceedings of the HLT/NAACL workshop on Automatic Summarization/Document Understanding Conference (DUC 2003) (2003)
Copeck, T., Szpakowicz, S.: Picking phrases, picking sentences. In: The Proceedings of the HLT/NAACL workshop on Automatic Summarization/Document Understanding Conference (DUC 2003) (2003)
Zhou, L., Hovy, E.: Headline Summarization at ISI. In: The Proceedings of the HLT/NAACL workshop on Automatic Summarization/Document Understanding Conference (DUC 2003) (2003)
Banko, M., Mittal, V., Witbrock, M.: Generating Headline-Style Summaries. In: The Proceedings of the Association for Computational Linguistics (2000)
Berger, A.L., Mittal, V.O.: OCELOT: a system for summarizing Web pages. In: The Proceedings of the ACM-SIGIR, pp. 144–151 (2000)
Zajic, D., Dorr, B.: Automatic headline generation for newspaper stories. In: The Proceedings of the ACL workshop on Automatic Summarization/Document Understanding Conference (DUC 2002) (2002)
Dorr, B., Zajic, D.: Hedge Trimmer: A parse-and-trim approach to headline generation. In: The Proceedings of the HLT/NAACL workshop on Automatic Summarization/Document Understanding Conference (DUC 2003) (2003)
McKeown, K., Evans, D., Nenkova, A., Barzilay, R., Hatzivassiloglou, V., Schiffman, B., Blair-Goldensohn, S., Klavans, J., Sigelman, S.: The Columbia Multi-Document Summarizer for DUC 2002. In: The Proceedings of the ACL workshop on Automatic Summarization/Document Understanding Conference (DUC 2002) (2002)
Daume, H., Echihabi, D., Marcu, D., Munteanu, D.S., Soricut, R.: GLEANS: A generator of logical extracts and abstracts for nice summaries. In: The Proceedings of the ACL workshop on Automatic Summarization/Document Understanding Conference (DUC 2002) (2002)
Callan, J.P., Croft, W.B., Harding, S.M.: The INQUERY Retrieval System, Database and Expert Systems Applications. In: Tjoa, A.M., Ramos, I. (eds.) The Proceedings of the International Conference in Valencia, Spain, Springer, New York (1992)
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Stokes, N., Newman, E., Carthy, J., Smeaton, A.F. (2004). Broadcast News Gisting Using Lexical Cohesion Analysis. In: McDonald, S., Tait, J. (eds) Advances in Information Retrieval. ECIR 2004. Lecture Notes in Computer Science, vol 2997. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24752-4_16
Download citation
DOI: https://doi.org/10.1007/978-3-540-24752-4_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21382-6
Online ISBN: 978-3-540-24752-4
eBook Packages: Springer Book Archive