Skip to main content

Broadcast News Gisting Using Lexical Cohesion Analysis

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2997))

Abstract

In this paper we describe an extractive method of creating very short summaries or gists that capture the essence of a news story using a linguistic technique called lexical chaining. The recent interest in robust gisting and title generation techniques originates from a need to improve the indexing and browsing capabilities of interactive digital multimedia systems. More specifically these systems deal with streams of continuous data, like a news programme, that require further annotation before they can be presented to the user in a meaningful way. We automatically evaluate the performance of our lexical chaining-based gister with respect to four baseline extractive gisting methods on a collection of closed caption material taken from a series of news broadcasts. We also report results of a human-based evaluation of summary quality. Our results show that our novel lexical chaining approach to this problem outperforms standard extractive gisting methods.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Smeaton, A.F., Lee, H., O’Connor, N., Marlow, S., Murphy, N.: TV News Story Segmentation, Personalisation and Recommendation. In: AAAI 2003 Spring Symposium on Intelligent Multimedia Knowledge Management, Stanford University, March 24-26 (2003)

    Google Scholar 

  2. Document Understanding Conferences (DUC): http://www-nlpir.nist.gov/projects/duc/intro.html

  3. Witbrock, M., Mittal, V.: Ultra-Summarisation: A Statistical approach to generating highly condensed non-extractive summaries. In: The Proceedings of the ACM-SIGIR, pp. 315–316 (1999)

    Google Scholar 

  4. Morris, J., Hirst, G.: Lexical Cohesion by Thesaural Relations as an Indicator of the Structure of Text. Computational Linguistics 17(1) (1991)

    Google Scholar 

  5. Halliday, M.A.K.: Spoken and Written Language. Oxford University Press, Oxford (1985)

    Google Scholar 

  6. Green, S.J.: Automatically Generating Hypertext By Comparing Semantic Similarity. University of Toronto, Technical Report number 366 (October 1997)

    Google Scholar 

  7. Barzilay, R., Elhadad, M.: Using Lexical Chains for Text Summarization. In: The proceedings of the Intelligent Scalable Text Summarization Workshop (ISTS 1997), ACL (1997)

    Google Scholar 

  8. Silber, G.H., McCoy, K.F.: Efficiently Computed Lexical Chains as an Intermediate Representation for Automatic Text Summarization. Computational Linguistics 28(4), 487–496 (2002)

    Article  Google Scholar 

  9. Fuentes, M., Rodriguez, H., Alonso, L.: Mixed Approach to Headline Extraction for DUC 2003. In: The Proceedings of the HLT/NAACL workshop on Automatic Summarization/Document Understanding Conference (DUC 2003) (2003)

    Google Scholar 

  10. Chali, Y., Kolla, M., Singh, N., Zhang, Z.: The University of Lethbridge Text Summarizer at DUC 2003. In: The Proceedings of the HLT/NAACL workshop on Automatic Summarization/Document Understanding Conference (DUC 2003) (2003)

    Google Scholar 

  11. St-Onge, D.: Detecting and Correcting Malapropisms with Lexical Chains, Dept. of Computer Science, University of Toronto, M.Sc. Thesis (1995)

    Google Scholar 

  12. Stairmand, M.A.: A Computational Analysis of Lexical Cohesion with Applications in IR, PhD Thesis, Dept. of Language Engineering, UMIST (1996)

    Google Scholar 

  13. Stokes, N., Carthy, J.: First Story Detection using a Composite Document Representation. In: The Proceedings of the Human Language Technology Conference, pp. 134–141 (2001)

    Google Scholar 

  14. Stokes, N., Carthy, J., Smeaton, A.F.: Segmenting Broadcast News Streams using Lexical Chains. In: The Proceedings of STAIRS, pp. 145–154 (2002)

    Google Scholar 

  15. Okumura, M., Honda, T.: Word sense disambiguation and text segmentation based on lexical cohesion. In: Proceedings of COLING 1994, pp. 755–761 (1994)

    Google Scholar 

  16. Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.: Five Papers on WordNet. CSL Report 43, Cognitive Science Laboratory, Princeton University (July 1990)

    Google Scholar 

  17. Allan, J., Carbonell, J., Doddington, G., Yamron, J., Yang, Y.: Topic Detection and Tracking Pilot Study Final Report. In: The proceedings of the DARPA Broadcasting News Workshop, pp. 194–218 (1998)

    Google Scholar 

  18. Justeson, J.S., Katz, S.M.: Technical terminology: some linguistic properties and an algorithm for identification in text. Natural Language Engineering (11), 9–27 (1995)

    Google Scholar 

  19. Xu, J., Broglio, J., Croft, W.B.: The design and implementation of a part of speech tagger for English. Technical Report IR-52, University of Massachusetts, Amherst, Center for Intelligent Information Retrieval (1994)

    Google Scholar 

  20. Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)

    MATH  Google Scholar 

  21. Jin, R., Hauptmann, A.G.: A new probabilistic model for title generation. In: The Proceedings of the International Conference on Computational Linguistics (2002)

    Google Scholar 

  22. Dimitrov, M.: A light-weight approach to co-reference resolution for named entities in text, Master’s Thesis, University of Sofia (2002)

    Google Scholar 

  23. Kraaij, W., Spitters, M., Hulth, A.: Headline extraction based on a combination of uni- and multi-document summarization techniques. In: The Proceedings of the ACL workshop on Automatic Summarization/Document Understanding Conference (DUC 2002) (2002)

    Google Scholar 

  24. Alfonseca, E., Rodriguez, P.: Description of the UAM system for generating very short summaries at DUC 2003. In: The Proceedings of the HLT/NAACL workshop on Automatic Summarization/Document Understanding Conference (DUC 2003) (2003)

    Google Scholar 

  25. Copeck, T., Szpakowicz, S.: Picking phrases, picking sentences. In: The Proceedings of the HLT/NAACL workshop on Automatic Summarization/Document Understanding Conference (DUC 2003) (2003)

    Google Scholar 

  26. Zhou, L., Hovy, E.: Headline Summarization at ISI. In: The Proceedings of the HLT/NAACL workshop on Automatic Summarization/Document Understanding Conference (DUC 2003) (2003)

    Google Scholar 

  27. Banko, M., Mittal, V., Witbrock, M.: Generating Headline-Style Summaries. In: The Proceedings of the Association for Computational Linguistics (2000)

    Google Scholar 

  28. Berger, A.L., Mittal, V.O.: OCELOT: a system for summarizing Web pages. In: The Proceedings of the ACM-SIGIR, pp. 144–151 (2000)

    Google Scholar 

  29. Zajic, D., Dorr, B.: Automatic headline generation for newspaper stories. In: The Proceedings of the ACL workshop on Automatic Summarization/Document Understanding Conference (DUC 2002) (2002)

    Google Scholar 

  30. Dorr, B., Zajic, D.: Hedge Trimmer: A parse-and-trim approach to headline generation. In: The Proceedings of the HLT/NAACL workshop on Automatic Summarization/Document Understanding Conference (DUC 2003) (2003)

    Google Scholar 

  31. McKeown, K., Evans, D., Nenkova, A., Barzilay, R., Hatzivassiloglou, V., Schiffman, B., Blair-Goldensohn, S., Klavans, J., Sigelman, S.: The Columbia Multi-Document Summarizer for DUC 2002. In: The Proceedings of the ACL workshop on Automatic Summarization/Document Understanding Conference (DUC 2002) (2002)

    Google Scholar 

  32. Daume, H., Echihabi, D., Marcu, D., Munteanu, D.S., Soricut, R.: GLEANS: A generator of logical extracts and abstracts for nice summaries. In: The Proceedings of the ACL workshop on Automatic Summarization/Document Understanding Conference (DUC 2002) (2002)

    Google Scholar 

  33. Callan, J.P., Croft, W.B., Harding, S.M.: The INQUERY Retrieval System, Database and Expert Systems Applications. In: Tjoa, A.M., Ramos, I. (eds.) The Proceedings of the International Conference in Valencia, Spain, Springer, New York (1992)

    Google Scholar 

  34. Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Stokes, N., Newman, E., Carthy, J., Smeaton, A.F. (2004). Broadcast News Gisting Using Lexical Cohesion Analysis. In: McDonald, S., Tait, J. (eds) Advances in Information Retrieval. ECIR 2004. Lecture Notes in Computer Science, vol 2997. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24752-4_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24752-4_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-21382-6

  • Online ISBN: 978-3-540-24752-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics