Text Summarization by Sentence Extraction Using Unsupervised Learning

García-Hernández, René Arnulfo; Montiel, Romyna; Ledeneva, Yulia; Rendón, Eréndira; Gelbukh, Alexander; Cruz, Rafael

doi:10.1007/978-3-540-88636-5_12

René Arnulfo García-Hernández³,
Romyna Montiel³,
Yulia Ledeneva³,
Eréndira Rendón³,
Alexander Gelbukh³ &
…
Rafael Cruz³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5317))

Included in the following conference series:

Mexican International Conference on Artificial Intelligence

2309 Accesses
11 Citations

Abstract

The main problem for generating an extractive automatic text summary is to detect the most relevant information in the source document. Although, some approaches claim being domain and language independent, they use high dependence knowledge like key-phrases or golden samples for machine-learning approaches. In this work, we propose a language- and domain-independent automatic text summarization approach by sentence extraction using an unsupervised learning algorithm. Our hypothesis is that an unsupervised algorithm can help for clustering similar ideas (sentences). Then, for composing the summary, the most representative sentence is selected from each cluster. Several experiments in the standard DUC-2002 collection show that the proposed method obtains more favorable results than other approaches.

Work done under partial support of Mexican Government: CONACyT, SNI, SIP-IPN, PIFI-IPN.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Lin, C.Y., Hovy, E.: Automated Text Summarization in SUMMARIST. In: Proc. of ACL Workshop on Intelligent, Scalable Text Summarization, Madrid, Spain (1997)
Google Scholar
Song, Y., et al.: A Term Weighting Method based on Lexical Chain for Automatic Summarization. In: Gelbukh, A. (ed.) CICLing 2004. LNCS, vol. 2945, pp. 636–639. Springer, Heidelberg (2004)
Chapter Google Scholar
HaCohen-Kerner, Y., Zuriel, G., Asaf, M.: Automatic Extraction and Learning of Keyphrases from Scientific Articles. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 657–669. Springer, Heidelberg (2005)
Chapter Google Scholar
Villatoro-Tello, E., Villaseñor-Pineda, L., Montes-y-Gómez, M.: Using Word Sequences for Text Summarization. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 293–300. Springer, Heidelberg (2006)
Chapter Google Scholar
Chuang, T.W., Yang, J.: Text Summarization by Sentence Segment Extraction Using Machine Learning Algorithms. In: Proc. of the ACL 2004 Workshop, Barcelona, España (2004)
Google Scholar
Neto, L., Freitas, A.A., Kaestner, C.A.A.: Automatic Text Summarization using a Machine learning Approach. In: Proceedings of the ACL 2004 Workshop, Barcelona, España (2004)
Google Scholar
Ledeneva, Y., Gelbukh, A., García, H.R.: Terms Derived from Frequent Sequences for Extractive Text Summarization. In: Gelbukh, A. (ed.) CICLing 2008. LNCS, vol. 4919, pp. 593–604. Springer, Heidelberg (2008)
Chapter Google Scholar
Ledeneva, Y., Gelbukh, A., García, H.R.: Keeping Maximal Frequent Sequences Facilitates Extractive Summarization. Research in Computing Science 34 (2008)
Google Scholar
Cristea, D., Postolache, O., Pistol, I.: Summarization through Discourse Structure. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 632–644. Springer, Heidelberg (2005)
Chapter Google Scholar
Kupiec, J., Pedersen, J.O., Chen, F.: A Trainable Document Summarizer. In: Proc. 18th ACM-SIGIR Conf. on Research and Development in Information Retrieval, pp. 68–73 (1995)
Google Scholar
DUC. Document Understanding Conference 2002 (2002), www-nlpir.nist.gov/projects/duc
Xu, W., Li, W., Wu, M., Li, W., Yuan, C.: Deriving Event Relevance from the Ontology Constructed with Formal Concept Analysis. In: Gelbukh, A. (ed.) CICLing 2006. LNCS, vol. 3878, pp. 480–489. Springer, Heidelberg (2006)
Chapter Google Scholar
Mihalcea, R.: Random Walks on Text Structures. In: Gelbukh, A. (ed.) CICLing 2006. LNCS, vol. 3878, pp. 249–262. Springer, Heidelberg (2006)
Chapter Google Scholar
Mihalcea, R., Tarau, P.: TextRank: Bringing Order into Texts. In: Proc. Empirical Methods in Natural Language Processing (EMNLP 2004), Barcelona, Spain (2004)
Google Scholar
Hassan, S., Mihalcea, R., Banea, C.: Random-Walk Term Weighting for Improved Text Classification. In: Proc. Semantic Computing (ICSC 2007), Irvine, CA (2007)
Google Scholar
Liu, D., He, Y., Ji, D., Hua, J.: Multi-Document Summarization Based on BE-Vector Clustering. In: Gelbukh, A. (ed.) CICLing 2006. LNCS, vol. 3878, pp. 470–479. Springer, Heidelberg (2006)
Chapter Google Scholar
Bolshakov, I.A.: Getting One’s First Million...Collocations. In: Gelbukh, A. (ed.) CICLing 2004. LNCS, vol. 2945, pp. 229–242. Springer, Heidelberg (2004)
Chapter Google Scholar
Koster, C.H.A.: Transducing Text to Multiword Units. In: Workshop on Multiword Units MEMURA at 4th Int. Conf. on Language Resources and Evaluation, LREC 2004, Portugal (2004)
Google Scholar
Sidorov, G., Gelbukh, A.: Automatic Detection of Semantically Primitive Words Using Their Reachability in an Explanatory Dictionary. In: Proc. Int. Workshop on Natural Language Processing and Knowledge Engineering, NLPKE 2001, USA, pp. 1683–1687 (2001)
Google Scholar
Luhn, H.P.: A Statical Approach to Mechanical Encoding and Searching of Literary Information. IBM Journal of Research and Development, 309–317 (1975)
Google Scholar
Salton, G., Buckley, C.: Term-Weighting Approaches in Automatic Text Retrieval. Information Processing & Management 24, 513–523 (1988)
Article Google Scholar
Lin, C.Y.: ROUGE: A Package for Automatic Evaluation of Summaries. In: Proceedings of Workshop on Text Summarization of ACL, Spain (2004)
Google Scholar
Lin, C.Y., Hovy, E.: Automatic Evaluation of Summaries Using N-gram Co-Occurrence Statistics. In: Proceedings of HLT-NAACL, Canada (2003)
Google Scholar
Spark Jones, K., Willet, P.: Reading in Information Retrieval. Morgan Kaufmann, San Francisco (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

Pattern Recognition Laboratory, Toluca Institute of Technology, Mexico, Autonomous University of the State of Mexico, Mexico, Center for Computing Research, National Polytechnic Institute, Mexico, SoNet RC, University of Center Europe in Skalica, Slovakia
René Arnulfo García-Hernández, Romyna Montiel, Yulia Ledeneva, Eréndira Rendón, Alexander Gelbukh & Rafael Cruz

Authors

René Arnulfo García-Hernández
View author publications
You can also search for this author in PubMed Google Scholar
Romyna Montiel
View author publications
You can also search for this author in PubMed Google Scholar
Yulia Ledeneva
View author publications
You can also search for this author in PubMed Google Scholar
Eréndira Rendón
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Gelbukh
View author publications
You can also search for this author in PubMed Google Scholar
Rafael Cruz
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research, National Polytechnic Institute, 07738, Mexico City, México
Alexander Gelbukh
Ciencias Computacionales, Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE), Luis Enrique Erro #1 , Sta. María Tonantzintla, 72840, Puebla, México
Eduardo F. Morales

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

García-Hernández, R.A., Montiel, R., Ledeneva, Y., Rendón, E., Gelbukh, A., Cruz, R. (2008). Text Summarization by Sentence Extraction Using Unsupervised Learning. In: Gelbukh, A., Morales, E.F. (eds) MICAI 2008: Advances in Artificial Intelligence. MICAI 2008. Lecture Notes in Computer Science(), vol 5317. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88636-5_12

Download citation

DOI: https://doi.org/10.1007/978-3-540-88636-5_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88635-8
Online ISBN: 978-3-540-88636-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Text Summarization by Sentence Extraction Using Unsupervised Learning