survey

An Empirical Survey on Long Document Summarization: Datasets, Models, and Metrics

Authors:
Huan Yee Koh

Monash University, Australia

Monash University, Australia

0000-0002-0488-2616
View Profile

,
Jiaxin Ju

Monash University, Australia

Monash University, Australia

0000-0003-3503-5708
View Profile

,
Ming Liu

Deakin University, Australia

Deakin University, Australia

0000-0002-2160-6111
View Profile

,
Shirui Pan

Monash University, Australia

Monash University, Australia

0000-0003-0794-527X
View Profile

Authors Info & Claims

ACM Computing Surveys Volume 55 Issue 8Article No.: 154pp 1–35https://doi.org/10.1145/3545176

Published:23 December 2022Publication History

ACM Computing Surveys

Abstract

Long documents such as academic articles and business reports have been the standard format to detail out important issues and complicated subjects that require extra attention. An automatic summarization system that can effectively condense long documents into short and concise texts to encapsulate the most important information would thus be significant in aiding the reader’s comprehension. Recently, with the advent of neural architectures, significant research efforts have been made to advance automatic text summarization systems, and numerous studies on the challenges of extending these systems to the long document domain have emerged. In this survey, we provide a comprehensive overview of the research on long document summarization and a systematic evaluation across the three principal components of its research setting: benchmark datasets, summarization models, and evaluation metrics. For each component, we organize the literature within the context of long document summarization and conduct an empirical analysis to broaden the perspective on current research progress. The empirical analysis includes a study on the intrinsic characteristics of benchmark datasets, a multi-dimensional analysis of summarization models, and a review of the summarization evaluation metrics. Based on the overall findings, we conclude by proposing possible directions for future exploration in this rapidly growing field.

Supplemental Material

Available for Download

pdf

3545176.supp.pdf (107.9 KB)

Supplementary material

REFERENCES

[1] Beltagy Iz, Lo Kyle, and Cohan Arman. 2019. SciBERT: A pretrained language model for scientific text. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 3615–3620.Google ScholarCross Ref
[2] Beltagy Iz, Peters Matthew E., and Cohan Arman. 2020. Longformer: The long-document transformer. arXiv:2004.05150. Retrieved from https://arxiv.org/abs/2004.05150.Google Scholar
[3] Bhandari Manik, Gour Pranav, Ashfaq Atabak, Liu Pengfei, and Neubig Graham. 2020. Re-evaluating evaluation in text summarization. arXiv:2010.07100. Retrieved from https://arxiv.org/abs/2010.07100.Google Scholar
[4] Bommasani Rishi and Cardie Claire. 2020. Intrinsic evaluation of summarization datasets. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 8075–8096.Google ScholarCross Ref
[5] Boorugu Ravali and Ramesh G.. 2020. A survey on NLP based text summarization for summarizing product reviews. In Proceedings of the 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA). IEEE, 352–356.Google ScholarCross Ref
[6] Boudin Florian and Morin Emmanuel. 2013. Keyphrase extraction for n-best reranking in multi-sentence compression. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 298–305.Google Scholar
[7] Brown Tom B., Mann Benjamin, Ryder Nick, Subbiah Melanie, Kaplan Jared, Dhariwal Prafulla, Neelakantan Arvind, Shyam Pranav, Sastry Girish, Askell Amanda, et al. 2020. Language models are few-shot learners. arXiv:2005.14165. Retrieved from https://arxiv.org/abs/2005.14165.Google Scholar
[8] Cao Ziqiang, Wei Furu, Li Wenjie, and Li Sujian. 2018. Faithful to the original: Fact aware neural abstractive summarization. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.Google ScholarCross Ref
[9] Carletta Jean, Ashby Simone, Bourban Sebastien, Flynn Mike, Guillemot Mael, Hain Thomas, Kadlec Jaroslav, Karaiskos Vasilis, Kraaij Wessel, Kronenthal Melissa, et al. 2005. The AMI meeting corpus: A pre-announcement. In Proceedings of the International Workshop on Machine Learning for Multimodal Interaction. Springer, 28–39.Google Scholar
[10] Celikyilmaz Asli, Bosselut Antoine, He Xiaodong, and Choi Yejin. 2018. Deep communicating agents for abstractive summarization. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 1662–1675.Google ScholarCross Ref
[11] Chaganty Arun, Mussmann Stephen, and Liang Percy. 2018. The price of debiasing automatic metrics in natural language evalaution. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 643–653.Google ScholarCross Ref
[12] Chali Yllias, Hasan Sadid A., and Joty Shafiq R.. 2009. A SVM-based ensemble approach to multi-document summarization. In Proceedings of the Canadian Conference on Artificial Intelligence. Springer, 199–202.Google ScholarDigital Library
[13] Chen Danqi, Bolton Jason, and Manning Christopher D.. 2016. A thorough examination of the CNN/daily mail reading comprehension task. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2358–2367.Google ScholarCross Ref
[14] Chen Wang, Li Piji, and King Irwin. 2021. A training-free and reference-free summarization evaluation metric via centrality-weighted relevance and self-referenced redundancy. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, 404–414.Google ScholarCross Ref
[15] Cheng Jianpeng and Lapata Mirella. 2016. Neural summarization by extracting sentences and words. arXiv:1603.07252. Retrieved from https://arxiv.org/abs/1603.07252.Google Scholar
[16] Chintagunta Bharath, Katariya Namit, Amatriain Xavier, and Kannan Anitha. 2021. Medically Aware GPT-3 as a data generator for medical dialogue summarization. In Proceedings of the 2nd Workshop on Natural Language Processing for Medical Conversations. 66–76.Google ScholarCross Ref
[17] Christensen Janara, Soderland Stephen, Etzioni Oren, et al. 2013. Towards coherent multi-document summarization. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1163–1173.Google Scholar
[18] Clark Elizabeth, Celikyilmaz Asli, and Smith Noah A.. 2019. Sentence mover’s similarity: Automatic evaluation for multi-sentence texts. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2748–2760.Google ScholarCross Ref
[19] Clifton Ann, Reddy Sravana, Yu Yongze, Pappu Aasish, Rezapour Rezvaneh, Bonab Hamed, Eskevich Maria, Jones Gareth, Karlgren Jussi, Carterette Ben, and Jones Rosie. 2020. 100,000 Podcasts: A spoken english document corpus. In Proceedings of the 28th International Conference on Computational Linguistics. International Committee on Computational Linguistics, 5903–5917.Google ScholarCross Ref
[20] Cohan Arman, Dernoncourt Franck, Kim Doo Soon, Bui Trung, Kim Seokhwan, Chang Walter, and Goharian Nazli. 2018. A discourse-aware attention model for abstractive summarization of long documents. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). 615–621.Google ScholarCross Ref
[21] Cui Peng and Hu Le. 2021. Sliding selector network with dynamic memory for extractive summarization of long documents. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 5881–5891.Google ScholarCross Ref
[22] Cui Peng, Hu Le, and Liu Yuanchao. 2020. Enhancing extractive text summarization with topic-aware graph neural networks. In Proceedings of the 28th International Conference on Computational Linguistics. 5360–5371.Google ScholarCross Ref
[23] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171–4186.Google Scholar
[24] DeYoung Jay, Beltagy Iz, Zuylen Madeleine van, Kuehl Bailey, and Wang Lucy. 2021. MS\({}^{\wedge }\)2: Multi-document summarization of medical studies. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 7494–7513.Google ScholarCross Ref
[25] Dong Yue, Romascanu Andrei Mircea, and Cheung Jackie Chi Kit. 2021. Discourse-aware unsupervised summarization for long scientific documents. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. 1089–1102.Google ScholarCross Ref
[26] Dou Zi-Yi, Liu Pengfei, Hayashi Hiroaki, Jiang Zhengbao, and Neubig Graham. 2021. GSum: A general framework for guided neural abstractive summarization. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 4830–4842.Google ScholarCross Ref
[27] Durmus Esin, He He, and Diab Mona. 2020. FEQA: A question answering evaluation framework for faithfulness assessment in abstractive summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 5055–5070.Google ScholarCross Ref
[28] El-Kassas Wafaa S., Salama Cherif R., Rafea Ahmed A., and Mohamed Hoda K.. 2021. Automatic text summarization: A comprehensive survey. Expert Systems with Applications 165 (2021), 113679.Google ScholarCross Ref
[29] Erkan Günes and Radev Dragomir R.. 2004. Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research 22 (2004), 457–479.Google ScholarCross Ref
[30] Feng Xiachong, Feng Xiaocheng, Qin Libo, Qin Bing, and Liu Ting. 2021. Language model as an annotator: Exploring dialoGPT for dialogue summarization. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, 1479–1491.Google ScholarCross Ref
[31] Gabriel Saadia, Celikyilmaz Asli, Jha Rahul, Choi Yejin, and Gao Jianfeng. 2021. GO FIGURE: A meta evaluation of factuality in summarization. In Proceedings of the Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Association for Computational Linguistics, 478–487.Google ScholarCross Ref
[32] Gambhir Mahak and Gupta Vishal. 2017. Recent automatic text summarization techniques: A survey. Artificial Intelligence Review 47, 1 (2017), 1–66.Google ScholarDigital Library
[33] Ganesan Kavita. 2018. ROUGE 2.0: Updated and improved measures for evaluation of summarization tasks. arXiv:1803.01937. Retrieved from https://arxiv.org/abs/1803.01937.Google Scholar
[34] Gao Tianyu, Fisch Adam, and Chen Danqi. 2021. Making pre-trained language models better few-shot learners. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, 3816–3830.Google ScholarCross Ref
[35] Gao Yang, Zhao Wei, and Eger Steffen. 2020. SUPERT: Towards new frontiers in unsupervised evaluation metrics for multi-document summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 1347–1354.Google ScholarCross Ref
[36] Gehrmann Sebastian, Deng Yuntian, and Rush Alexander M.. 2018. Bottom-up abstractive summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 4098–4109.Google ScholarCross Ref
[37] Gidiotis Alexios and Tsoumakas Grigorios. 2020. A divide-and-conquer approach to the summarization of long documents. IEEE/ACM Transactions on Audio, Speech, and Language Processing 28 (2020), 3029–3040.Google ScholarDigital Library
[38] Gong Yihong and Liu Xin. 2001. Generic text summarization using relevance measure and latent semantic analysis. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 19–25.Google ScholarDigital Library
[39] Goodrich Ben, Rao Vinay, Liu Peter J, and Saleh Mohammad. 2019. Assessing the factual accuracy of generated text. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 166–175.Google ScholarDigital Library
[40] Goyal Tanya and Durrett Greg. 2020. Evaluating factuality in generation with dependency-level entailment. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, 3592–3603.Google ScholarCross Ref
[41] Graham Yvette. 2015. Re-evaluating automatic summarization with BLEU and 192 shades of ROUGE. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 128–137.Google ScholarCross Ref
[42] Grenander Matt, Dong Yue, Cheung Jackie Chi Kit, and Louis Annie. 2019. Countering the effects of lead bias in news summarization via multi-stage training and auxiliary losses. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 6019–6024.Google ScholarCross Ref
[43] Grusky Max, Naaman Mor, and Artzi Yoav. 2018. Newsroom: A dataset of 1.3 million summaries with diverse extractive strategies. arXiv:1804.11283. Retrieved from https://arxiv.org/abs/1804.11283.Google Scholar
[44] Hashimoto Tatsunori B., Zhang Hugh, and Liang Percy. 2019. Unifying human and statistical evaluation for natural language generation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 1689–1701.Google ScholarCross Ref
[45] He Junxian, Kryściński Wojciech, McCann Bryan, Rajani Nazneen, and Xiong Caiming. 2020. Ctrlsum: Towards generic controllable text summarization. arXiv:2012.04281. Retrieved from https://arxiv.org/abs/2012.04281.Google Scholar
[46] Hessel Jack, Holtzman Ari, Forbes Maxwell, Bras Ronan Le, and Choi Yejin. 2021. CLIPScore: A reference-free evaluation metric for image captioning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 7514–7528.Google ScholarCross Ref
[47] Hsu Wan-Ting, Lin Chieh-Kai, Lee Ming-Ying, Min Kerui, Tang Jing, and Sun Min. 2018. A unified model for extractive and abstractive summarization using inconsistency loss. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 132–141.Google ScholarCross Ref
[48] Huang Dandan, Cui Leyang, Yang Sen, Bao Guangsheng, Wang Kun, Xie Jun, and Zhang Yue. 2020. What have we achieved on text summarization? In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 446–469.Google ScholarCross Ref
[49] Huang Luyang, Cao Shuyang, Parulian Nikolaus, Ji Heng, and Wang Lu. 2021. Efficient attentions for long document summarization. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1419–1436.Google ScholarCross Ref
[50] Jaidka Kokil, Chandrasekaran Muthu Kumar, Rustagi Sajal, and Kan Min-Yen. 2016. Overview of the CL-Scisumm 2016 shared task. In Proceedings of the Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL). 93–102.Google Scholar
[51] Janin Adam, Baron Don, Edwards Jane, Ellis Dan, Gelbart David, Morgan Nelson, Peskin Barbara, Pfau Thilo, Shriberg Elizabeth, Stolcke Andreas, et al. 2003. The ICSI meeting corpus. In Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, Proceedings (ICASSP’03). Vol. 1. IEEE, I–I.Google ScholarCross Ref
[52] Ji Yangfeng and Eisenstein Jacob. 2014. Representation learning for text-level discourse parsing. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (volume 1: Long Papers). 13–24.Google ScholarCross Ref
[53] Jones Karen Sparck. 1972. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation (1972).Google Scholar
[54] Ju Jiaxin, Liu Ming, Gao Longxiang, and Pan Shirui. 2020. Monash-Summ@ LongSumm 20 SciSummPip: An unsupervised scientific paper summarization pipeline. In Proceedings of the First Workshop on Scholarly Document Processing. 318–327.Google ScholarCross Ref
[55] Ju Jiaxin, Liu Ming, Koh Huan Yee, Jin Yuan, Du Lan, and Pan Shirui. 2021. Leveraging information bottleneck for scientific document summarization. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP’21. 4091–4098.Google ScholarCross Ref
[56] Kim Byeongchang, Kim Hyunwoo, and Kim Gunhee. 2019. Abstractive summarization of reddit posts with multi-level memory networks. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2519–2531.Google Scholar
[57] Knight Kevin and Marcu Daniel. 2002. Summarization beyond sentence extraction: A probabilistic approach to sentence compression. Artificial Intelligence 139, 1 (2002), 91–107.Google ScholarDigital Library
[58] Kobayashi Hayato, Noguchi Masaki, and Yatsuka Taichi. 2015. Summarization based on embedding distributions. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1984–1989.Google ScholarCross Ref
[59] Kornilova Anastassia and Eidelman Vlad. 2019. BillSum: A corpus for automatic summarization of US legislation. EMNLP-IJCNLP (2019), 48.Google Scholar
[60] Koupaee Mahnaz and Wang William Yang. 2018. WikiHow: A large scale text summarization dataset. arXiv:1810.09305. Retrieved from https://arxiv.org/abs/1810.09305.Google Scholar
[61] Kryściński Wojciech, Keskar Nitish Shirish, McCann Bryan, Xiong Caiming, and Socher Richard. 2019. Neural text summarization: A critical evaluation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 540–551.Google ScholarCross Ref
[62] Kryscinski Wojciech, McCann Bryan, Xiong Caiming, and Socher Richard. 2020. Evaluating the factual consistency of abstractive text summarization. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 9332–9346.Google ScholarCross Ref
[63] Kryściński Wojciech, Rajani Nazneen, Agarwal Divyansh, Xiong Caiming, and Radev Dragomir. 2021. BookSum: A collection of datasets for long-form narrative summarization. arXiv:2105.08209. Retrieved from https://arxiv.org/abs/2105.08209.Google Scholar
[64] Kupiec Julian, Pedersen Jan, and Chen Francine. 1995. A trainable document summarizer. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 68–73.Google ScholarDigital Library
[65] Kusner Matt, Sun Yu, Kolkin Nicholas, and Weinberger Kilian. 2015. From word embeddings to document distances. In Proceedings of the International Conference on Machine Learning. PMLR, 957–966.Google Scholar
[66] Lewis Mike, Liu Yinhan, Goyal Naman, Ghazvininejad Marjan, Mohamed Abdelrahman, Levy Omer, Stoyanov Veselin, and Zettlemoyer Luke. 2020. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 7871–7880.Google ScholarCross Ref
[67] Lewis Patrick, Perez Ethan, Piktus Aleksandra, Petroni Fabio, Karpukhin Vladimir, Goyal Naman, Küttler Heinrich, Lewis Mike, Yih Wen-tau, Rocktäschel Tim, et al. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. In Proceedings of the Advances in Neural Information Processing Systems.Google Scholar
[68] Li Jiazheng, Yang Linyi, Smyth Barry, and Dong Ruihai. 2020. MAEC: A multimodal aligned earnings conference call dataset for financial risk prediction. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 3063–3070.Google ScholarDigital Library
[69] Liang Xinnian, Wu Shuangzhi, Li Mu, and Li Zhoujun. 2021. Improving unsupervised extractive summarization with facet-aware modeling. In Proceedings of the Findings of the Association for Computational Linguistics: ACL-IJCNLP (2021). 1685–1697.Google ScholarCross Ref
[70] Lin Chin-Yew. 2004. Rouge: A package for automatic evaluation of summaries. In Proceedings of the Text Summarization Branches Out. 74–81.Google Scholar
[71] Liu Chunyi, Wang Peng, Xu Jiang, Li Zang, and Ye Jieping. 2019. Automatic dialogue summary generation for customer service. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1957–1965.Google ScholarDigital Library
[72] Liu Hui and Wan Xiaojun. 2021. Video paragraph captioning as a text summarization task. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 55–60.Google ScholarCross Ref
[73] Liu Peter J., Saleh Mohammad, Pot Etienne, Goodrich Ben, Sepassi Ryan, Kaiser Lukasz, and Shazeer Noam. 2018. Generating wikipedia by summarizing long sequences. In Proceedings of the International Conference on Learning Representations.Google Scholar
[74] Liu Yang and Lapata Mirella. 2019. Text summarization with pretrained encoders. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 3730–3740.Google ScholarCross Ref
[75] Loukas Lefteris, Fergadiotis Manos, Androutsopoulos Ion, and Malakasiotis Prodromos. 2021. EDGAR-CORPUS: Billions of tokens make the world go round. arXiv:2109.14394. Retrieved from https://arxiv.org/abs/2109.14394.Google Scholar
[76] Lyu Chenyang, Shang Lifeng, Graham Yvette, Foster Jennifer, Jiang Xin, and Liu Qun. 2021. Improving unsupervised question answering via summarization-informed question generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 4134–4148.Google ScholarCross Ref
[77] Manakul Potsawee and Gales Mark. 2021. Long-span summarization via local attention and content selection. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 6026–6041.Google ScholarCross Ref
[78] Mani Inderjeet and Bloedorn Eric. 1998. Machine learning of generic and user-focused summarization. In Proceedings of the AAAI/IAAI. 821–826.Google Scholar
[79] Mann William C. and Thompson Sandra A.. 1988. Rhetorical structure theory: Toward a functional theory of text organization. Text-Interdisciplinary Journal for the Study of Discourse 8, 3 (1988), 243–281.Google ScholarCross Ref
[80] Mao Yuning, Liu Liyuan, Zhu Qi, Ren Xiang, and Han Jiawei. 2020. Facet-aware evaluation for extractive summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 4941–4957.Google ScholarCross Ref
[81] Maynez Joshua, Narayan Shashi, Bohnet Bernd, and McDonald Ryan. 2020. On faithfulness and factuality in abstractive summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 1906–1919.Google ScholarCross Ref
[82] Meng Rui, Thaker Khushboo, Zhang Lei, Dong Yue, Yuan Xingdi, Wang Tong, and He Daqing. 2021. Bringing structure into summaries: A faceted summarization dataset for long scientific documents. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Association for Computational Linguistics, 1080–1089.Google ScholarCross Ref
[83] Mihalcea Rada and Tarau Paul. 2004. Textrank: Bringing order into text. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 404–411.Google Scholar
[84] Nallapati Ramesh, Zhou Bowen, Santos Cicero dos, Gulçehre Ça glar, and Xiang Bing. 2016. Abstractive text summarization using sequence-to-sequence RNNs and beyond. In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning. 280–290.Google ScholarCross Ref
[85] Nan Feng, Santos Cicero Nogueira dos, Zhu Henghui, Ng Patrick, McKeown Kathleen, Nallapati Ramesh, Zhang Dejiao, Wang Zhiguo, Arnold Andrew O., and Xiang Bing. 2021. Improving factual consistency of abstractive summarization via question answering. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics.Google ScholarCross Ref
[86] Narasimhan Medhini, Rohrbach Anna, and Darrell Trevor. 2021. CLIP-It! language-guided video summarization. In Proceedings of the 35th Conference on Neural Information Processing Systems.Google Scholar
[87] Narayan Shashi, Cohen Shay B., and Lapata Mirella. 2018. Don’t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 1797–1807.Google ScholarCross Ref
[88] Ng Jun Ping and Abrecht Viktoria. 2015. Better summarization evaluation with word embeddings for ROUGE. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1925–1930.Google ScholarCross Ref
[89] Pagnoni Artidoro, Balachandran Vidhisha, and Tsvetkov Yulia. 2021. Understanding factuality in abstractive summarization with FRANK: A benchmark for factuality metrics. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 4812–4829.Google ScholarCross Ref
[90] Paulus Romain, Xiong Caiming, and Socher Richard. 2018. A deep reinforced model for abstractive summarization. In Proceedings of the International Conference on Learning Representations.Google Scholar
[91] Petroni Fabio, Piktus Aleksandra, Fan Angela, Lewis Patrick, Yazdani Majid, Cao Nicola De, Thorne James, Jernite Yacine, Karpukhin Vladimir, Maillard Jean, et al. 2021. KILT: A benchmark for knowledge intensive language tasks. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2523–2544.Google ScholarCross Ref
[92] Peyrard Maxime. 2019. A simple theoretical model of importance for summarization. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 1059–1073.Google ScholarCross Ref
[93] Peyrard Maxime. 2019. Studying summarization evaluation metrics in the appropriate scoring range. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 5093–5100.Google ScholarCross Ref
[94] Pilault Jonathan, Li Raymond, Subramanian Sandeep, and Pal Christopher. 2020. On extractive and abstractive neural document summarization with transformer language models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 9308–9319.Google ScholarCross Ref
[95] Qazvinian Vahed and Radev Dragomir. 2008. Scientific paper summarization using citation summary networks. In Proceedings of the 22nd International Conference on Computational Linguistics (Coling’08). 689–696.Google ScholarDigital Library
[96] Raffel Colin, Shazeer Noam, Roberts Adam, Lee Katherine, Narang Sharan, Matena Michael, Zhou Yanqi, Li Wei, and Liu Peter J.. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research 21 (2020), 1–67.Google Scholar
[97] Rameshkumar Revanth and Bailey Peter. 2020. Storytelling with dialogue: A critical role dungeons and dragons dataset. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 5121–5134.Google ScholarCross Ref
[98] Reimers Nils and Gurevych Iryna. 2019. Sentence-BERT: Sentence embeddings using siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 3982–3992.Google ScholarCross Ref
[99] Rohde Tobias, Wu Xiaoxia, and Liu Yinhan. 2021. Hierarchical learning for generation with long source sequences. arXiv:2104.07545. Retrieved from https://arxiv.org/abs/2104.07545.Google Scholar
[100] Rothe Sascha, Maynez Joshua, and Narayan Shashi. 2021. A thorough evaluation of task-specific pretraining for summarization. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 140–145.Google ScholarCross Ref
[101] Rush Alexander M., Chopra Sumit, and Weston Jason. 2015. A neural attention model for abstractive sentence summarization. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 379–389.Google ScholarCross Ref
[102] Sandhaus Evan. 2008. The new york times annotated corpus. Linguistic Data Consortium, Philadelphia 6, 12 (2008), e26752.Google Scholar
[103] See Abigail, Liu Peter J., and Manning Christopher D.. 2017. Get to the point: Summarization with pointer-generator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1073–1083.Google ScholarCross Ref
[104] Sharma Eva, Li Chen, and Wang Lu. 2019. BIGPATENT: A large-scale dataset for abstractive and coherent summarization. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2204–2213.Google ScholarCross Ref
[105] Shi Tian, Keneshloo Yaser, Ramakrishnan Naren, and Reddy Chandan K.. 2021. Neural abstractive text summarization with sequence-to-sequence models. ACM Transactions on Data Science 2, 1 (2021), 1–37.Google ScholarDigital Library
[106] Shivakumar K. and Soumya Rab. 2015. Text summarization using clustering technique and SVM technique. International Journal of Applied Engineering Research 10, 12 (2015), 28873–28881.Google Scholar
[107] Sun Edward, Hou Yufang, Wang Dakuo, Zhang Yunfeng, and Wang Nancy XR. 2021. D2S: Document-to-slide generation via query-based text summarization. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1405–1418.Google ScholarCross Ref
[108] Talmor Alon, Elazar Yanai, Goldberg Yoav, and Berant Jonathan. 2020. oLMpics-on what language model pre-training captures. Transactions of the Association for Computational Linguistics 8 (2020), 743–758.Google ScholarCross Ref
[109] Tay Yi, Dehghani Mostafa, Abnar Samira, Shen Yikang, Bahri Dara, Pham Philip, Rao Jinfeng, Yang Liu, Ruder Sebastian, and Metzler Donald. 2020. Long range arena: A benchmark for efficient transformers. In Proceedings of the International Conference on Learning Representations.Google Scholar
[110] Tay Yi, Dehghani Mostafa, Bahri Dara, and Metzler Donald. 2022. Efficient transformers: A survey. ACM Computing Surveys (2022). Just Accepted.Google Scholar
[111] Tejaswin Priyam, Naik Dhruv, and Liu Pengfei. 2021. How well do you know your summarization datasets? arXiv:2106.11388. Retrieved from https://arxiv.org/abs/2106.11388.Google Scholar
[112] Vanderwende Lucy, Suzuki Hisami, Brockett Chris, and Nenkova Ani. 2007. Beyond SumBasic: Task-focused summarization with sentence simplification and lexical expansion. Information Processing & Management 43, 6 (2007), 1606–1618.Google ScholarDigital Library
[113] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N, Kaiser Łukasz, and Polosukhin Illia. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 6000–6010.Google ScholarDigital Library
[114] Wang Alex, Cho Kyunghyun, and Lewis Mike. 2020. Asking and answering questions to evaluate the factual consistency of summaries. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 5008–5020.Google ScholarCross Ref
[115] Wilber Matt, Timkey William, and Schijndel Marten van. 2021. To point or not to point: Understanding how abstractive summarizers paraphrase text. In Proceedings of the Findings of the Association for Computational Linguistics: ACL-IJCNLP’21. Association for Computational Linguistics, 3362–3376.Google ScholarCross Ref
[116] Wu Chien-Sheng, Liu Linqing, Liu Wenhao, Stenetorp Pontus, and Xiong Caiming. 2021. Controllable abstractive dialogue summarization with sketch supervision. In Proceedings of the Findings of the Association for Computational Linguistics: ACL-IJCNLP’21. Association for Computational Linguistics, 5108–5122.Google ScholarCross Ref
[117] Wu Hanlu, Ma Tengfei, Wu Lingfei, Manyumwa Tariro, and Ji Shouling. 2020. Unsupervised reference-free summary quality evaluation via contrastive learning. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 3612–3621.Google ScholarCross Ref
[118] Xiao Wen and Carenini Giuseppe. 2019. Extractive summarization of long documents by combining global and local context. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 3011–3021.Google ScholarCross Ref
[119] Xu Jiacheng, Desai Shrey, and Durrett Greg. 2020. Understanding neural abstractive summarization models via uncertainty. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 6275–6281.Google ScholarCross Ref
[120] Xu Jiacheng, Gan Zhe, Cheng Yu, and Liu Jingjing. 2020. Discourse-aware neural extractive text summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 5021–5031.Google ScholarCross Ref
[121] Yasunaga Michihiro, Kasai Jungo, Zhang Rui, Fabbri Alexander R., Li Irene, Friedman Dan, and Radev Dragomir R.. 2019. Scisummnet: A large annotated corpus and content-impact models for scientific paper summarization with citation networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 7386–7393.Google ScholarDigital Library
[122] Yasunaga Michihiro, Zhang Rui, Meelu Kshitijh, Pareek Ayush, Srinivasan Krishnan, and Radev Dragomir. 2017. Graph-based neural multi-document summarization. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL’17). 452–462.Google ScholarCross Ref
[123] Yatskar Mark. 2019. A qualitative comparison of CoQA, SQuAD 2.0 and QuAC. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2318–2323.Google Scholar
[124] Yogatama Dani, Liu Fei, and Smith Noah A.. 2015. Extractive summarization by maximizing semantic volume. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1961–1966.Google ScholarCross Ref
[125] Yu Tiezheng, Dai Wenliang, Liu Zihan, and Fung Pascale. 2021. Vision guided generative pre-trained language models for multimodal abstractive summarization. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 3995–4007.Google ScholarCross Ref
[126] Yuan Weizhe, Liu Pengfei, and Neubig Graham. 2021. Can we automate scientific reviewing? arXiv:2102.00176. Retrieved from https://arxiv.org/abs/2102.00176.Google Scholar
[127] Zaheer Manzil, Guruganesh Guru, Dubey Kumar Avinava, Ainslie Joshua, Alberti Chris, Ontanon Santiago, Pham Philip, Ravula Anirudh, Wang Qifan, Yang Li, et al. 2020. Big bird: Transformers for longer sequences. In Proceedings of the NeurIPS.Google Scholar
[128] Zhang Fang-Fang, Yao Jin-ge, and Yan Rui. 2018. On the abstractiveness of neural document summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 785–790.Google ScholarCross Ref
[129] Zhang Jingqing, Zhao Yao, Saleh Mohammad, and Liu Peter. 2020. Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. In Proceedings of the International Conference on Machine Learning. PMLR, 11328–11339.Google Scholar
[130] Zhang Tianyi, Kishore Varsha, Wu Felix, Weinberger Kilian Q., and Artzi Yoav. 2019. BERTScore: Evaluating text generation with BERT. In Proceedings of the International Conference on Learning Representations.Google Scholar
[131] Zhang Yusen, Ni Ansong, Yu Tao, Zhang Rui, Zhu Chenguang, Deb Budhaditya, Celikyilmaz Asli, Awadallah Ahmed Hassan, and Radev Dragomir. 2021. An exploratory study on long dialogue summarization: What works and what’s next. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: Findings. (2021).Google Scholar
[132] Zhao Jinming, Liu Ming, Gao Longxiang, Jin Yuan, Du Lan, Zhao He, Zhang He, and Haffari Gholamreza. 2020. SummPip: Unsupervised multi-document summarization with sentence graph compression. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1949–1952.Google ScholarDigital Library
[133] Zhao Wei, Peyrard Maxime, Liu Fei, Gao Yang, Meyer Christian M., and Eger Steffen. 2019. MoverScore: Text generation evaluating with contextualized embeddings and earth mover distance. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 563–578.Google ScholarCross Ref
[134] Zhao Yao, Saleh Mohammad, and Liu Peter J.. 2020. Seal: Segment-wise extractive-abstractive long-form text summarization. arXiv:2006.10213. Retrieved from https://arxiv.org/abs/2006.10213.Google Scholar
[135] Zheng Hao and Lapata Mirella. 2019. Sentence centrality revisited for unsupervised summarization. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 6236–6247.Google ScholarCross Ref
[136] Zhong Ming, Yin Da, Yu Tao, Zaidi Ahmad, Mutuma Mutethia, Jha Rahul, Awadallah Ahmed Hassan, Celikyilmaz Asli, Liu Yang, Qiu Xipeng, and Radev Dragomir. 2021. QMSum: A new benchmark for query-based multi-domain meeting summarization. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 5905–5921.Google ScholarCross Ref
[137] Zhou Liang, Lin Chin-Yew, Munteanu Dragos Stefan, and Hovy Eduard. 2006. ParaEval: Using paraphrases to evaluate summaries automatically. In Proceedings of the Human Language Technology Conference of the NAACL, Main Conference. Association for Computational Linguistics, New York City, 447–454.Google ScholarDigital Library
[138] Zhu Chenguang, Liu Yang, Mei Jie, and Zeng Michael. 2021. MediaSum: A large-scale media interview dataset for dialogue summarization. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 5927–5934.Google ScholarCross Ref

Index Terms

An Empirical Survey on Long Document Summarization: Datasets, Models, and Metrics
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Information extraction
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Summarization

Recommendations

Exploiting neighborhood knowledge for single document summarization and keyphrase extraction

Document summarization and keyphrase extraction are two related tasks in the IR and NLP fields, and both of them aim at extracting condensed representations from a single text document. Existing methods for single document summarization and keyphrase ...
Read More
Comments-oriented document summarization: understanding documents with readers' feedback
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

Comments left by readers on Web documents contain valuable information that can be utilized in different information retrieval tasks including document search, visualization, and summarization. In this paper, we study the problem of comments-oriented ...
Read More
Recent advances in document summarization

The task of automatic document summarization aims at generating short summaries for originally long documents. A good summary should cover the most important information of the original document or a cluster of documents, while being coherent, non-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Computing Surveys Volume 55, Issue 8
August 2023
789 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/3567473
Editor:
Albert Zomaya
University of Sydney, Australia
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 December 2022
- Online AM: 29 June 2022
- Accepted: 14 June 2022
- Received: 1 January 2022
Published in csur Volume 55, Issue 8

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Document summarization
datasets
neural networks
language models
transformer
Qualifiers
- survey
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 11
  Total Citations
  View Citations
- 2,929
  Total Downloads
- Downloads (Last 12 months)1,516
- Downloads (Last 6 weeks)137
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

HTML Format

View this article in HTML Format .

View HTML Format

An Empirical Survey on Long Document Summarization: Datasets, Models, and Metrics

ACM Computing Surveys

Abstract

Supplemental Material

Available for Download

REFERENCES

Cited By

Index Terms

Recommendations

Exploiting neighborhood knowledge for single document summarization and keyphrase extraction

Comments-oriented document summarization: understanding documents with readers' feedback

Recent advances in document summarization