skip to main content
survey

An Empirical Survey on Long Document Summarization: Datasets, Models, and Metrics

Published:23 December 2022Publication History
Skip Abstract Section

Abstract

Long documents such as academic articles and business reports have been the standard format to detail out important issues and complicated subjects that require extra attention. An automatic summarization system that can effectively condense long documents into short and concise texts to encapsulate the most important information would thus be significant in aiding the reader’s comprehension. Recently, with the advent of neural architectures, significant research efforts have been made to advance automatic text summarization systems, and numerous studies on the challenges of extending these systems to the long document domain have emerged. In this survey, we provide a comprehensive overview of the research on long document summarization and a systematic evaluation across the three principal components of its research setting: benchmark datasets, summarization models, and evaluation metrics. For each component, we organize the literature within the context of long document summarization and conduct an empirical analysis to broaden the perspective on current research progress. The empirical analysis includes a study on the intrinsic characteristics of benchmark datasets, a multi-dimensional analysis of summarization models, and a review of the summarization evaluation metrics. Based on the overall findings, we conclude by proposing possible directions for future exploration in this rapidly growing field.

Skip Supplemental Material Section

Supplemental Material

REFERENCES

  1. [1] Beltagy Iz, Lo Kyle, and Cohan Arman. 2019. SciBERT: A pretrained language model for scientific text. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 36153620.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Beltagy Iz, Peters Matthew E., and Cohan Arman. 2020. Longformer: The long-document transformer. arXiv:2004.05150. Retrieved from https://arxiv.org/abs/2004.05150.Google ScholarGoogle Scholar
  3. [3] Bhandari Manik, Gour Pranav, Ashfaq Atabak, Liu Pengfei, and Neubig Graham. 2020. Re-evaluating evaluation in text summarization. arXiv:2010.07100. Retrieved from https://arxiv.org/abs/2010.07100.Google ScholarGoogle Scholar
  4. [4] Bommasani Rishi and Cardie Claire. 2020. Intrinsic evaluation of summarization datasets. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 80758096.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Boorugu Ravali and Ramesh G.. 2020. A survey on NLP based text summarization for summarizing product reviews. In Proceedings of the 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA). IEEE, 352356.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Boudin Florian and Morin Emmanuel. 2013. Keyphrase extraction for n-best reranking in multi-sentence compression. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 298305.Google ScholarGoogle Scholar
  7. [7] Brown Tom B., Mann Benjamin, Ryder Nick, Subbiah Melanie, Kaplan Jared, Dhariwal Prafulla, Neelakantan Arvind, Shyam Pranav, Sastry Girish, Askell Amanda, et al. 2020. Language models are few-shot learners. arXiv:2005.14165. Retrieved from https://arxiv.org/abs/2005.14165.Google ScholarGoogle Scholar
  8. [8] Cao Ziqiang, Wei Furu, Li Wenjie, and Li Sujian. 2018. Faithful to the original: Fact aware neural abstractive summarization. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Carletta Jean, Ashby Simone, Bourban Sebastien, Flynn Mike, Guillemot Mael, Hain Thomas, Kadlec Jaroslav, Karaiskos Vasilis, Kraaij Wessel, Kronenthal Melissa, et al. 2005. The AMI meeting corpus: A pre-announcement. In Proceedings of the International Workshop on Machine Learning for Multimodal Interaction. Springer, 2839.Google ScholarGoogle Scholar
  10. [10] Celikyilmaz Asli, Bosselut Antoine, He Xiaodong, and Choi Yejin. 2018. Deep communicating agents for abstractive summarization. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 16621675.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Chaganty Arun, Mussmann Stephen, and Liang Percy. 2018. The price of debiasing automatic metrics in natural language evalaution. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 643653.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Chali Yllias, Hasan Sadid A., and Joty Shafiq R.. 2009. A SVM-based ensemble approach to multi-document summarization. In Proceedings of the Canadian Conference on Artificial Intelligence. Springer, 199202.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Chen Danqi, Bolton Jason, and Manning Christopher D.. 2016. A thorough examination of the CNN/daily mail reading comprehension task. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 23582367.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Chen Wang, Li Piji, and King Irwin. 2021. A training-free and reference-free summarization evaluation metric via centrality-weighted relevance and self-referenced redundancy. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, 404414.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Cheng Jianpeng and Lapata Mirella. 2016. Neural summarization by extracting sentences and words. arXiv:1603.07252. Retrieved from https://arxiv.org/abs/1603.07252.Google ScholarGoogle Scholar
  16. [16] Chintagunta Bharath, Katariya Namit, Amatriain Xavier, and Kannan Anitha. 2021. Medically Aware GPT-3 as a data generator for medical dialogue summarization. In Proceedings of the 2nd Workshop on Natural Language Processing for Medical Conversations. 6676.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Christensen Janara, Soderland Stephen, Etzioni Oren, et al. 2013. Towards coherent multi-document summarization. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 11631173.Google ScholarGoogle Scholar
  18. [18] Clark Elizabeth, Celikyilmaz Asli, and Smith Noah A.. 2019. Sentence mover’s similarity: Automatic evaluation for multi-sentence texts. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 27482760.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Clifton Ann, Reddy Sravana, Yu Yongze, Pappu Aasish, Rezapour Rezvaneh, Bonab Hamed, Eskevich Maria, Jones Gareth, Karlgren Jussi, Carterette Ben, and Jones Rosie. 2020. 100,000 Podcasts: A spoken english document corpus. In Proceedings of the 28th International Conference on Computational Linguistics. International Committee on Computational Linguistics, 59035917.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Cohan Arman, Dernoncourt Franck, Kim Doo Soon, Bui Trung, Kim Seokhwan, Chang Walter, and Goharian Nazli. 2018. A discourse-aware attention model for abstractive summarization of long documents. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). 615621.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Cui Peng and Hu Le. 2021. Sliding selector network with dynamic memory for extractive summarization of long documents. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 58815891.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Cui Peng, Hu Le, and Liu Yuanchao. 2020. Enhancing extractive text summarization with topic-aware graph neural networks. In Proceedings of the 28th International Conference on Computational Linguistics. 53605371.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 41714186.Google ScholarGoogle Scholar
  24. [24] DeYoung Jay, Beltagy Iz, Zuylen Madeleine van, Kuehl Bailey, and Wang Lucy. 2021. MS\({}^{\wedge }\)2: Multi-document summarization of medical studies. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 74947513.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Dong Yue, Romascanu Andrei Mircea, and Cheung Jackie Chi Kit. 2021. Discourse-aware unsupervised summarization for long scientific documents. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. 10891102.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Dou Zi-Yi, Liu Pengfei, Hayashi Hiroaki, Jiang Zhengbao, and Neubig Graham. 2021. GSum: A general framework for guided neural abstractive summarization. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 48304842.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Durmus Esin, He He, and Diab Mona. 2020. FEQA: A question answering evaluation framework for faithfulness assessment in abstractive summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 50555070.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] El-Kassas Wafaa S., Salama Cherif R., Rafea Ahmed A., and Mohamed Hoda K.. 2021. Automatic text summarization: A comprehensive survey. Expert Systems with Applications 165 (2021), 113679.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Erkan Günes and Radev Dragomir R.. 2004. Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research 22 (2004), 457479.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Feng Xiachong, Feng Xiaocheng, Qin Libo, Qin Bing, and Liu Ting. 2021. Language model as an annotator: Exploring dialoGPT for dialogue summarization. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, 14791491.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Gabriel Saadia, Celikyilmaz Asli, Jha Rahul, Choi Yejin, and Gao Jianfeng. 2021. GO FIGURE: A meta evaluation of factuality in summarization. In Proceedings of the Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Association for Computational Linguistics, 478487.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Gambhir Mahak and Gupta Vishal. 2017. Recent automatic text summarization techniques: A survey. Artificial Intelligence Review 47, 1 (2017), 166.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Ganesan Kavita. 2018. ROUGE 2.0: Updated and improved measures for evaluation of summarization tasks. arXiv:1803.01937. Retrieved from https://arxiv.org/abs/1803.01937.Google ScholarGoogle Scholar
  34. [34] Gao Tianyu, Fisch Adam, and Chen Danqi. 2021. Making pre-trained language models better few-shot learners. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, 38163830.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Gao Yang, Zhao Wei, and Eger Steffen. 2020. SUPERT: Towards new frontiers in unsupervised evaluation metrics for multi-document summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 13471354.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Gehrmann Sebastian, Deng Yuntian, and Rush Alexander M.. 2018. Bottom-up abstractive summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 40984109.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Gidiotis Alexios and Tsoumakas Grigorios. 2020. A divide-and-conquer approach to the summarization of long documents. IEEE/ACM Transactions on Audio, Speech, and Language Processing 28 (2020), 30293040.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Gong Yihong and Liu Xin. 2001. Generic text summarization using relevance measure and latent semantic analysis. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 1925.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Goodrich Ben, Rao Vinay, Liu Peter J, and Saleh Mohammad. 2019. Assessing the factual accuracy of generated text. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 166175.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] Goyal Tanya and Durrett Greg. 2020. Evaluating factuality in generation with dependency-level entailment. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, 35923603.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Graham Yvette. 2015. Re-evaluating automatic summarization with BLEU and 192 shades of ROUGE. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 128137.Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Grenander Matt, Dong Yue, Cheung Jackie Chi Kit, and Louis Annie. 2019. Countering the effects of lead bias in news summarization via multi-stage training and auxiliary losses. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 60196024.Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Grusky Max, Naaman Mor, and Artzi Yoav. 2018. Newsroom: A dataset of 1.3 million summaries with diverse extractive strategies. arXiv:1804.11283. Retrieved from https://arxiv.org/abs/1804.11283.Google ScholarGoogle Scholar
  44. [44] Hashimoto Tatsunori B., Zhang Hugh, and Liang Percy. 2019. Unifying human and statistical evaluation for natural language generation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 16891701.Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] He Junxian, Kryściński Wojciech, McCann Bryan, Rajani Nazneen, and Xiong Caiming. 2020. Ctrlsum: Towards generic controllable text summarization. arXiv:2012.04281. Retrieved from https://arxiv.org/abs/2012.04281.Google ScholarGoogle Scholar
  46. [46] Hessel Jack, Holtzman Ari, Forbes Maxwell, Bras Ronan Le, and Choi Yejin. 2021. CLIPScore: A reference-free evaluation metric for image captioning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 75147528.Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Hsu Wan-Ting, Lin Chieh-Kai, Lee Ming-Ying, Min Kerui, Tang Jing, and Sun Min. 2018. A unified model for extractive and abstractive summarization using inconsistency loss. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 132141.Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] Huang Dandan, Cui Leyang, Yang Sen, Bao Guangsheng, Wang Kun, Xie Jun, and Zhang Yue. 2020. What have we achieved on text summarization? In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 446469.Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Huang Luyang, Cao Shuyang, Parulian Nikolaus, Ji Heng, and Wang Lu. 2021. Efficient attentions for long document summarization. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 14191436.Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Jaidka Kokil, Chandrasekaran Muthu Kumar, Rustagi Sajal, and Kan Min-Yen. 2016. Overview of the CL-Scisumm 2016 shared task. In Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL). 93102.Google ScholarGoogle Scholar
  51. [51] Janin Adam, Baron Don, Edwards Jane, Ellis Dan, Gelbart David, Morgan Nelson, Peskin Barbara, Pfau Thilo, Shriberg Elizabeth, Stolcke Andreas, et al. 2003. The ICSI meeting corpus. In Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, Proceedings (ICASSP’03). Vol. 1. IEEE, I–I.Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Ji Yangfeng and Eisenstein Jacob. 2014. Representation learning for text-level discourse parsing. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (volume 1: Long Papers). 1324.Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Jones Karen Sparck. 1972. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation (1972).Google ScholarGoogle Scholar
  54. [54] Ju Jiaxin, Liu Ming, Gao Longxiang, and Pan Shirui. 2020. Monash-Summ@ LongSumm 20 SciSummPip: An unsupervised scientific paper summarization pipeline. In Proceedings of the First Workshop on Scholarly Document Processing. 318327.Google ScholarGoogle ScholarCross RefCross Ref
  55. [55] Ju Jiaxin, Liu Ming, Koh Huan Yee, Jin Yuan, Du Lan, and Pan Shirui. 2021. Leveraging information bottleneck for scientific document summarization. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP’21. 40914098.Google ScholarGoogle ScholarCross RefCross Ref
  56. [56] Kim Byeongchang, Kim Hyunwoo, and Kim Gunhee. 2019. Abstractive summarization of reddit posts with multi-level memory networks. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 25192531.Google ScholarGoogle Scholar
  57. [57] Knight Kevin and Marcu Daniel. 2002. Summarization beyond sentence extraction: A probabilistic approach to sentence compression. Artificial Intelligence 139, 1 (2002), 91107.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. [58] Kobayashi Hayato, Noguchi Masaki, and Yatsuka Taichi. 2015. Summarization based on embedding distributions. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 19841989.Google ScholarGoogle ScholarCross RefCross Ref
  59. [59] Kornilova Anastassia and Eidelman Vlad. 2019. BillSum: A corpus for automatic summarization of US legislation. EMNLP-IJCNLP (2019), 48.Google ScholarGoogle Scholar
  60. [60] Koupaee Mahnaz and Wang William Yang. 2018. WikiHow: A large scale text summarization dataset. arXiv:1810.09305. Retrieved from https://arxiv.org/abs/1810.09305.Google ScholarGoogle Scholar
  61. [61] Kryściński Wojciech, Keskar Nitish Shirish, McCann Bryan, Xiong Caiming, and Socher Richard. 2019. Neural text summarization: A critical evaluation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 540551.Google ScholarGoogle ScholarCross RefCross Ref
  62. [62] Kryscinski Wojciech, McCann Bryan, Xiong Caiming, and Socher Richard. 2020. Evaluating the factual consistency of abstractive text summarization. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 93329346.Google ScholarGoogle ScholarCross RefCross Ref
  63. [63] Kryściński Wojciech, Rajani Nazneen, Agarwal Divyansh, Xiong Caiming, and Radev Dragomir. 2021. BookSum: A collection of datasets for long-form narrative summarization. arXiv:2105.08209. Retrieved from https://arxiv.org/abs/2105.08209.Google ScholarGoogle Scholar
  64. [64] Kupiec Julian, Pedersen Jan, and Chen Francine. 1995. A trainable document summarizer. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 6873.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. [65] Kusner Matt, Sun Yu, Kolkin Nicholas, and Weinberger Kilian. 2015. From word embeddings to document distances. In Proceedings of the International Conference on Machine Learning. PMLR, 957966.Google ScholarGoogle Scholar
  66. [66] Lewis Mike, Liu Yinhan, Goyal Naman, Ghazvininejad Marjan, Mohamed Abdelrahman, Levy Omer, Stoyanov Veselin, and Zettlemoyer Luke. 2020. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 78717880.Google ScholarGoogle ScholarCross RefCross Ref
  67. [67] Lewis Patrick, Perez Ethan, Piktus Aleksandra, Petroni Fabio, Karpukhin Vladimir, Goyal Naman, Küttler Heinrich, Lewis Mike, Yih Wen-tau, Rocktäschel Tim, et al. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. In Proceedings of the Advances in Neural Information Processing Systems.Google ScholarGoogle Scholar
  68. [68] Li Jiazheng, Yang Linyi, Smyth Barry, and Dong Ruihai. 2020. MAEC: A multimodal aligned earnings conference call dataset for financial risk prediction. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 30633070.Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. [69] Liang Xinnian, Wu Shuangzhi, Li Mu, and Li Zhoujun. 2021. Improving unsupervised extractive summarization with facet-aware modeling. In Proceedings of the Findings of the Association for Computational Linguistics: ACL-IJCNLP (2021). 16851697.Google ScholarGoogle ScholarCross RefCross Ref
  70. [70] Lin Chin-Yew. 2004. Rouge: A package for automatic evaluation of summaries. In Proceedings of the Text Summarization Branches Out. 7481.Google ScholarGoogle Scholar
  71. [71] Liu Chunyi, Wang Peng, Xu Jiang, Li Zang, and Ye Jieping. 2019. Automatic dialogue summary generation for customer service. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 19571965.Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. [72] Liu Hui and Wan Xiaojun. 2021. Video paragraph captioning as a text summarization task. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 5560.Google ScholarGoogle ScholarCross RefCross Ref
  73. [73] Liu Peter J., Saleh Mohammad, Pot Etienne, Goodrich Ben, Sepassi Ryan, Kaiser Lukasz, and Shazeer Noam. 2018. Generating wikipedia by summarizing long sequences. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  74. [74] Liu Yang and Lapata Mirella. 2019. Text summarization with pretrained encoders. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 37303740.Google ScholarGoogle ScholarCross RefCross Ref
  75. [75] Loukas Lefteris, Fergadiotis Manos, Androutsopoulos Ion, and Malakasiotis Prodromos. 2021. EDGAR-CORPUS: Billions of tokens make the world go round. arXiv:2109.14394. Retrieved from https://arxiv.org/abs/2109.14394.Google ScholarGoogle Scholar
  76. [76] Lyu Chenyang, Shang Lifeng, Graham Yvette, Foster Jennifer, Jiang Xin, and Liu Qun. 2021. Improving unsupervised question answering via summarization-informed question generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 41344148.Google ScholarGoogle ScholarCross RefCross Ref
  77. [77] Manakul Potsawee and Gales Mark. 2021. Long-span summarization via local attention and content selection. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 60266041.Google ScholarGoogle ScholarCross RefCross Ref
  78. [78] Mani Inderjeet and Bloedorn Eric. 1998. Machine learning of generic and user-focused summarization. In Proceedings of the AAAI/IAAI. 821826.Google ScholarGoogle Scholar
  79. [79] Mann William C. and Thompson Sandra A.. 1988. Rhetorical structure theory: Toward a functional theory of text organization. Text-Interdisciplinary Journal for the Study of Discourse 8, 3 (1988), 243281.Google ScholarGoogle ScholarCross RefCross Ref
  80. [80] Mao Yuning, Liu Liyuan, Zhu Qi, Ren Xiang, and Han Jiawei. 2020. Facet-aware evaluation for extractive summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 49414957.Google ScholarGoogle ScholarCross RefCross Ref
  81. [81] Maynez Joshua, Narayan Shashi, Bohnet Bernd, and McDonald Ryan. 2020. On faithfulness and factuality in abstractive summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 19061919.Google ScholarGoogle ScholarCross RefCross Ref
  82. [82] Meng Rui, Thaker Khushboo, Zhang Lei, Dong Yue, Yuan Xingdi, Wang Tong, and He Daqing. 2021. Bringing structure into summaries: A faceted summarization dataset for long scientific documents. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Association for Computational Linguistics, 10801089.Google ScholarGoogle ScholarCross RefCross Ref
  83. [83] Mihalcea Rada and Tarau Paul. 2004. Textrank: Bringing order into text. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 404411.Google ScholarGoogle Scholar
  84. [84] Nallapati Ramesh, Zhou Bowen, Santos Cicero dos, Gulçehre Ça glar, and Xiang Bing. 2016. Abstractive text summarization using sequence-to-sequence RNNs and beyond. In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning. 280290.Google ScholarGoogle ScholarCross RefCross Ref
  85. [85] Nan Feng, Santos Cicero Nogueira dos, Zhu Henghui, Ng Patrick, McKeown Kathleen, Nallapati Ramesh, Zhang Dejiao, Wang Zhiguo, Arnold Andrew O., and Xiang Bing. 2021. Improving factual consistency of abstractive summarization via question answering. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics.Google ScholarGoogle ScholarCross RefCross Ref
  86. [86] Narasimhan Medhini, Rohrbach Anna, and Darrell Trevor. 2021. CLIP-It! language-guided video summarization. In Proceedings of the 35th Conference on Neural Information Processing Systems.Google ScholarGoogle Scholar
  87. [87] Narayan Shashi, Cohen Shay B., and Lapata Mirella. 2018. Don’t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 17971807.Google ScholarGoogle ScholarCross RefCross Ref
  88. [88] Ng Jun Ping and Abrecht Viktoria. 2015. Better summarization evaluation with word embeddings for ROUGE. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 19251930.Google ScholarGoogle ScholarCross RefCross Ref
  89. [89] Pagnoni Artidoro, Balachandran Vidhisha, and Tsvetkov Yulia. 2021. Understanding factuality in abstractive summarization with FRANK: A benchmark for factuality metrics. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 48124829.Google ScholarGoogle ScholarCross RefCross Ref
  90. [90] Paulus Romain, Xiong Caiming, and Socher Richard. 2018. A deep reinforced model for abstractive summarization. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  91. [91] Petroni Fabio, Piktus Aleksandra, Fan Angela, Lewis Patrick, Yazdani Majid, Cao Nicola De, Thorne James, Jernite Yacine, Karpukhin Vladimir, Maillard Jean, et al. 2021. KILT: A benchmark for knowledge intensive language tasks. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 25232544.Google ScholarGoogle ScholarCross RefCross Ref
  92. [92] Peyrard Maxime. 2019. A simple theoretical model of importance for summarization. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 10591073.Google ScholarGoogle ScholarCross RefCross Ref
  93. [93] Peyrard Maxime. 2019. Studying summarization evaluation metrics in the appropriate scoring range. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 50935100.Google ScholarGoogle ScholarCross RefCross Ref
  94. [94] Pilault Jonathan, Li Raymond, Subramanian Sandeep, and Pal Christopher. 2020. On extractive and abstractive neural document summarization with transformer language models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 93089319.Google ScholarGoogle ScholarCross RefCross Ref
  95. [95] Qazvinian Vahed and Radev Dragomir. 2008. Scientific paper summarization using citation summary networks. In Proceedings of the 22nd International Conference on Computational Linguistics (Coling’08). 689696.Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. [96] Raffel Colin, Shazeer Noam, Roberts Adam, Lee Katherine, Narang Sharan, Matena Michael, Zhou Yanqi, Li Wei, and Liu Peter J.. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research 21 (2020), 167.Google ScholarGoogle Scholar
  97. [97] Rameshkumar Revanth and Bailey Peter. 2020. Storytelling with dialogue: A critical role dungeons and dragons dataset. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 51215134.Google ScholarGoogle ScholarCross RefCross Ref
  98. [98] Reimers Nils and Gurevych Iryna. 2019. Sentence-BERT: Sentence embeddings using siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 39823992.Google ScholarGoogle ScholarCross RefCross Ref
  99. [99] Rohde Tobias, Wu Xiaoxia, and Liu Yinhan. 2021. Hierarchical learning for generation with long source sequences. arXiv:2104.07545. Retrieved from https://arxiv.org/abs/2104.07545.Google ScholarGoogle Scholar
  100. [100] Rothe Sascha, Maynez Joshua, and Narayan Shashi. 2021. A thorough evaluation of task-specific pretraining for summarization. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 140145.Google ScholarGoogle ScholarCross RefCross Ref
  101. [101] Rush Alexander M., Chopra Sumit, and Weston Jason. 2015. A neural attention model for abstractive sentence summarization. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 379389.Google ScholarGoogle ScholarCross RefCross Ref
  102. [102] Sandhaus Evan. 2008. The new york times annotated corpus. Linguistic Data Consortium, Philadelphia 6, 12 (2008), e26752.Google ScholarGoogle Scholar
  103. [103] See Abigail, Liu Peter J., and Manning Christopher D.. 2017. Get to the point: Summarization with pointer-generator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 10731083.Google ScholarGoogle ScholarCross RefCross Ref
  104. [104] Sharma Eva, Li Chen, and Wang Lu. 2019. BIGPATENT: A large-scale dataset for abstractive and coherent summarization. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 22042213.Google ScholarGoogle ScholarCross RefCross Ref
  105. [105] Shi Tian, Keneshloo Yaser, Ramakrishnan Naren, and Reddy Chandan K.. 2021. Neural abstractive text summarization with sequence-to-sequence models. ACM Transactions on Data Science 2, 1 (2021), 137.Google ScholarGoogle ScholarDigital LibraryDigital Library
  106. [106] Shivakumar K. and Soumya Rab. 2015. Text summarization using clustering technique and SVM technique. International Journal of Applied Engineering Research 10, 12 (2015), 2887328881.Google ScholarGoogle Scholar
  107. [107] Sun Edward, Hou Yufang, Wang Dakuo, Zhang Yunfeng, and Wang Nancy XR. 2021. D2S: Document-to-slide generation via query-based text summarization. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 14051418.Google ScholarGoogle ScholarCross RefCross Ref
  108. [108] Talmor Alon, Elazar Yanai, Goldberg Yoav, and Berant Jonathan. 2020. oLMpics-on what language model pre-training captures. Transactions of the Association for Computational Linguistics 8 (2020), 743758.Google ScholarGoogle ScholarCross RefCross Ref
  109. [109] Tay Yi, Dehghani Mostafa, Abnar Samira, Shen Yikang, Bahri Dara, Pham Philip, Rao Jinfeng, Yang Liu, Ruder Sebastian, and Metzler Donald. 2020. Long range arena: A benchmark for efficient transformers. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  110. [110] Tay Yi, Dehghani Mostafa, Bahri Dara, and Metzler Donald. 2022. Efficient transformers: A survey. ACM Computing Surveys (2022). Just Accepted.Google ScholarGoogle Scholar
  111. [111] Tejaswin Priyam, Naik Dhruv, and Liu Pengfei. 2021. How well do you know your summarization datasets? arXiv:2106.11388. Retrieved from https://arxiv.org/abs/2106.11388.Google ScholarGoogle Scholar
  112. [112] Vanderwende Lucy, Suzuki Hisami, Brockett Chris, and Nenkova Ani. 2007. Beyond SumBasic: Task-focused summarization with sentence simplification and lexical expansion. Information Processing & Management 43, 6 (2007), 16061618.Google ScholarGoogle ScholarDigital LibraryDigital Library
  113. [113] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N, Kaiser Łukasz, and Polosukhin Illia. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 60006010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  114. [114] Wang Alex, Cho Kyunghyun, and Lewis Mike. 2020. Asking and answering questions to evaluate the factual consistency of summaries. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 50085020.Google ScholarGoogle ScholarCross RefCross Ref
  115. [115] Wilber Matt, Timkey William, and Schijndel Marten van. 2021. To point or not to point: Understanding how abstractive summarizers paraphrase text. In Proceedings of the Findings of the Association for Computational Linguistics: ACL-IJCNLP’21. Association for Computational Linguistics, 33623376.Google ScholarGoogle ScholarCross RefCross Ref
  116. [116] Wu Chien-Sheng, Liu Linqing, Liu Wenhao, Stenetorp Pontus, and Xiong Caiming. 2021. Controllable abstractive dialogue summarization with sketch supervision. In Proceedings of the Findings of the Association for Computational Linguistics: ACL-IJCNLP’21. Association for Computational Linguistics, 51085122.Google ScholarGoogle ScholarCross RefCross Ref
  117. [117] Wu Hanlu, Ma Tengfei, Wu Lingfei, Manyumwa Tariro, and Ji Shouling. 2020. Unsupervised reference-free summary quality evaluation via contrastive learning. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 36123621.Google ScholarGoogle ScholarCross RefCross Ref
  118. [118] Xiao Wen and Carenini Giuseppe. 2019. Extractive summarization of long documents by combining global and local context. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 30113021.Google ScholarGoogle ScholarCross RefCross Ref
  119. [119] Xu Jiacheng, Desai Shrey, and Durrett Greg. 2020. Understanding neural abstractive summarization models via uncertainty. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 62756281.Google ScholarGoogle ScholarCross RefCross Ref
  120. [120] Xu Jiacheng, Gan Zhe, Cheng Yu, and Liu Jingjing. 2020. Discourse-aware neural extractive text summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 50215031.Google ScholarGoogle ScholarCross RefCross Ref
  121. [121] Yasunaga Michihiro, Kasai Jungo, Zhang Rui, Fabbri Alexander R., Li Irene, Friedman Dan, and Radev Dragomir R.. 2019. Scisummnet: A large annotated corpus and content-impact models for scientific paper summarization with citation networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 73867393.Google ScholarGoogle ScholarDigital LibraryDigital Library
  122. [122] Yasunaga Michihiro, Zhang Rui, Meelu Kshitijh, Pareek Ayush, Srinivasan Krishnan, and Radev Dragomir. 2017. Graph-based neural multi-document summarization. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL’17). 452462.Google ScholarGoogle ScholarCross RefCross Ref
  123. [123] Yatskar Mark. 2019. A qualitative comparison of CoQA, SQuAD 2.0 and QuAC. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 23182323.Google ScholarGoogle Scholar
  124. [124] Yogatama Dani, Liu Fei, and Smith Noah A.. 2015. Extractive summarization by maximizing semantic volume. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 19611966.Google ScholarGoogle ScholarCross RefCross Ref
  125. [125] Yu Tiezheng, Dai Wenliang, Liu Zihan, and Fung Pascale. 2021. Vision guided generative pre-trained language models for multimodal abstractive summarization. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 39954007.Google ScholarGoogle ScholarCross RefCross Ref
  126. [126] Yuan Weizhe, Liu Pengfei, and Neubig Graham. 2021. Can we automate scientific reviewing? arXiv:2102.00176. Retrieved from https://arxiv.org/abs/2102.00176.Google ScholarGoogle Scholar
  127. [127] Zaheer Manzil, Guruganesh Guru, Dubey Kumar Avinava, Ainslie Joshua, Alberti Chris, Ontanon Santiago, Pham Philip, Ravula Anirudh, Wang Qifan, Yang Li, et al. 2020. Big bird: Transformers for longer sequences. In Proceedings of the NeurIPS.Google ScholarGoogle Scholar
  128. [128] Zhang Fang-Fang, Yao Jin-ge, and Yan Rui. 2018. On the abstractiveness of neural document summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 785790.Google ScholarGoogle ScholarCross RefCross Ref
  129. [129] Zhang Jingqing, Zhao Yao, Saleh Mohammad, and Liu Peter. 2020. Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. In Proceedings of the International Conference on Machine Learning. PMLR, 1132811339.Google ScholarGoogle Scholar
  130. [130] Zhang Tianyi, Kishore Varsha, Wu Felix, Weinberger Kilian Q., and Artzi Yoav. 2019. BERTScore: Evaluating text generation with BERT. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  131. [131] Zhang Yusen, Ni Ansong, Yu Tao, Zhang Rui, Zhu Chenguang, Deb Budhaditya, Celikyilmaz Asli, Awadallah Ahmed Hassan, and Radev Dragomir. 2021. An exploratory study on long dialogue summarization: What works and what’s next. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: Findings. (2021).Google ScholarGoogle Scholar
  132. [132] Zhao Jinming, Liu Ming, Gao Longxiang, Jin Yuan, Du Lan, Zhao He, Zhang He, and Haffari Gholamreza. 2020. SummPip: Unsupervised multi-document summarization with sentence graph compression. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 19491952.Google ScholarGoogle ScholarDigital LibraryDigital Library
  133. [133] Zhao Wei, Peyrard Maxime, Liu Fei, Gao Yang, Meyer Christian M., and Eger Steffen. 2019. MoverScore: Text generation evaluating with contextualized embeddings and earth mover distance. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 563578.Google ScholarGoogle ScholarCross RefCross Ref
  134. [134] Zhao Yao, Saleh Mohammad, and Liu Peter J.. 2020. Seal: Segment-wise extractive-abstractive long-form text summarization. arXiv:2006.10213. Retrieved from https://arxiv.org/abs/2006.10213.Google ScholarGoogle Scholar
  135. [135] Zheng Hao and Lapata Mirella. 2019. Sentence centrality revisited for unsupervised summarization. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 62366247.Google ScholarGoogle ScholarCross RefCross Ref
  136. [136] Zhong Ming, Yin Da, Yu Tao, Zaidi Ahmad, Mutuma Mutethia, Jha Rahul, Awadallah Ahmed Hassan, Celikyilmaz Asli, Liu Yang, Qiu Xipeng, and Radev Dragomir. 2021. QMSum: A new benchmark for query-based multi-domain meeting summarization. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 59055921.Google ScholarGoogle ScholarCross RefCross Ref
  137. [137] Zhou Liang, Lin Chin-Yew, Munteanu Dragos Stefan, and Hovy Eduard. 2006. ParaEval: Using paraphrases to evaluate summaries automatically. In Proceedings of the Human Language Technology Conference of the NAACL, Main Conference. Association for Computational Linguistics, New York City, 447454.Google ScholarGoogle ScholarDigital LibraryDigital Library
  138. [138] Zhu Chenguang, Liu Yang, Mei Jie, and Zeng Michael. 2021. MediaSum: A large-scale media interview dataset for dialogue summarization. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 59275934.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. An Empirical Survey on Long Document Summarization: Datasets, Models, and Metrics

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Computing Surveys
        ACM Computing Surveys  Volume 55, Issue 8
        August 2023
        789 pages
        ISSN:0360-0300
        EISSN:1557-7341
        DOI:10.1145/3567473
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 23 December 2022
        • Online AM: 29 June 2022
        • Accepted: 14 June 2022
        • Received: 1 January 2022
        Published in csur Volume 55, Issue 8

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • survey
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format