survey

Free Access

Just Accepted

From Detection to Application: Recent Advances in Understanding Scientific Tables and Figures

Authors:
Jiani Huang

Wuhan University, Wuhan, China

Wuhan University, Wuhan, China
Search about this author

,
Haihua Chen

University of North Texas, Denton, USA

University of North Texas, Denton, USA
Search about this author

,
Fengchang Yu

Wuhan University, Wuhan, China

Wuhan University, Wuhan, China
Search about this author

,
Wei Lu

Wuhan University, Wuhan, China

Wuhan University, Wuhan, China
Search about this author

Authors Info & Claims

ACM Computing SurveysAccepted on April 2024https://doi.org/10.1145/3657285

Online AM:12 April 2024Publication History

ACM Computing Surveys

Abstract

Tables and figures are usually used to present information in a structured and visual way in scientific documents. Understanding the tables and figures in scientific documents is significant for a series of downstream tasks, such as academic search, scientific knowledge graphs, and so on. Existing studies mainly focus on detecting figures and tables from scientific documents, interpreting their semantics, and integrating them into downstream tasks. However, a systematic and comprehensive literature review on the mining and application of tables and figures in academic papers is still missing. In this article, we introduce the research framework and the whole pipeline for understanding tables and figures, including detection, structural analysis, interpretation, and application. We deliver a thorough analysis of benchmark datasets, recent techniques, and their pros and cons. Additionally, a quantitative analysis of the effectiveness of different models on popular benchmarks is presented. We further outline several important applications that exploit the semantics of scientific tables and figures. Finally, we highlight the challenges and some potential directions for future research. We believe this is the first comprehensive survey in understanding scientific tables and figures that covers the landscape from detection to application.

References

Abdelrahman Abdallah, Alexander Berendeyev, Islam Nuradin, and Daniyar Nurseitov. 2022. Tncr: Table net detection and classification dataset. Neurocomputing 473(2022), 79–97.Google ScholarDigital Library
Madhav Agarwal, Ajoy Mondal, and C. V. Jawahar. 2021. CDeC-Net: Composite Deformable Cascade Network for Table Detection in Document Images. In 2020 25th International Conference on Pattern Recognition (ICPR). 9491–9498. https://doi.org/10.1109/ICPR48806.2021.9411922Google ScholarCross Ref
Shashank Agarwal and Hong Yu. 2009. FigSum: Automatically Generating Structured Text Summaries for Figures in Biomedical Literature. AMIA Annual Symposium Proceedings 2009 (2009), 6–10.Google Scholar
Md. Ajij, Sanjoy Pratihar, Diptendu Sinha Roy, and Thomas Hanne. 2022. Robust Detection of Tables in Documents Using Scores from Table Cell Cores. SN Computer Science 3, 2 (March 2022), 161. https://doi.org/10.1007/s42979-022-01041-zGoogle ScholarDigital Library
Ceyhun Burak Akgul, Daniel L. Rubin, Sandy Napel, Christopher F. Beaulieu, Hayit Greenspan, and Burak Acar. 2011. Content-Based Image Retrieval in Radiology: Current Status and Future Directions. Journal of Digital Imaging 24, 2 (Jan. 2011), 208–222. https://doi.org/10.1007/s10278-010-9290-9Google ScholarCross Ref
Rabah A. Al-Zaidy and C. Lee Giles. 2015. Automatic Extraction of Data from Bar Charts. (Oct. 2015), 30. https://doi.org/10.1145/2815833.2816956Google ScholarDigital Library
Sameer Antani, L Rodney Long, and George R Thoma. 2004. Content-based image retrieval for large biomedical image archives. In MEDINFO 2004. IOS Press, 829–833.Google Scholar
Brendan Artley. 2023. GenPlot: Increasing the Scale and Diversity of Chart Derendering Data. arXiv preprint arXiv:2306.11699(2023).Google Scholar
Sören Auer, Viktor Kovtun, Manuel Prinz, Anna Kasprzik, Anna Kasprzik, Markus Stocker, Maria-Esther Vidal, and Maria-Esther Vidal. 2018. Towards a Knowledge Graph for Science. (June 2018), 1. https://doi.org/10.1145/3227609.3227689Google ScholarDigital Library
Filip Bajić and Josip Job. 2023. Review of chart image detection and classification. International Journal on Document Analysis and Recognition (IJDAR) (2023), 1–22.Google Scholar
Iz Beltagy, Kyle Lo, and Arman Cohan. 2019. SciBERT: A Pretrained Language Model for Scientific Text. https://doi.org/10.48550/arXiv.1903.10676 arxiv:1903.10676 [cs]Google ScholarCross Ref
Sumit Bhatia and Prasenjit Mitra. 2012. Summarizing Figures, Tables, and Algorithms in Scientific Publications to Augment Search Results. ACM Transactions on Information Systems 30, 1 (March 2012), 3:1–3:24. https://doi.org/10.1145/2094072.2094075Google ScholarDigital Library
Jwalin Bhatt, Khurram Azeem Hashmi, Muhammad Zeshan Afzal, and Didier Stricker. 2021. A survey of graphical page object detection with deep neural networks. Applied Sciences 11, 12 (2021), 5344.Google ScholarCross Ref
Galal M. Binmakhashen and Sabri A. Mahmoud. 2019. Document Layout Analysis: A Comprehensive Survey. Comput. Surveys 52, 6 (Oct. 2019), 109:1–109:36. https://doi.org/10.1145/3355610Google ScholarDigital Library
Sanket Biswas, Ayan Banerjee, Josep Lladós, and Umapada Pal. 2022. DocSegTr: an instance-level end-to-end document image segmentation transformer. arXiv preprint arXiv:2201.11438(2022).Google Scholar
Joseph P. Bockhorst, John M. Conroy, Shashank Agarwal, Dianne P. O’Leary, and Hong Yu. 2012. Beyond Captions: Linking Figures with Abstract Sentences in Biomedical Articles. PLoS ONE 7, 7 (July 2012), e39618. https://doi.org/10.1371/journal.pone.0039618Google ScholarCross Ref
Sandra Carberry, Stephanie Elzer, Nancy Green, Kathleen F. McCoy, and Daniel Chester. 2004. Extending Document Summarization to Information Graphics. In Text Summarization Branches Out. 3–9.Google Scholar
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-End Object Detection with Transformers. In Computer Vision – ECCV 2020, Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Vol. 12346. Springer International Publishing, Cham, 213–229. https://doi.org/10.1007/978-3-030-58452-8_13Google ScholarDigital Library
Shuaichen Chang, David Palzer, Jialin Li, Eric Fosler-Lussier, and Ningchuan Xiao. 2022. MapQA: A dataset for question answering on choropleth maps. arXiv preprint arXiv:2211.08545(2022).Google Scholar
Ritwick Chaudhry, Sumit Shekhar, Utkarsh Gupta, Pranav Maneriker, Prann Bansal, and Ajay Joshi. 2020. LEAF-QA: Locate, Encode & Attend for Figure Question Answering. In 2020 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, Snowmass Village, CO, USA, 3501–3510. https://doi.org/10.1109/WACV45572.2020.9093269Google ScholarCross Ref
Jian Chen, Meng Ling, Rui Li, Petra Isenberg, Tobias Isenberg, Michael Sedlmair, Torsten Möller, Robert S. Laramee, Han-Wei Shen, Katharina Wünsche, and Qiru Wang. 2021. VIS30K: A Collection of Figures and Tables From IEEE Visualization Conference Publications. IEEE Transactions on Visualization and Computer Graphics 27, 9(Sept. 2021), 3826–3833. https://doi.org/10.1109/TVCG.2021.3054916Google ScholarDigital Library
Wenhu Chen, Hongmin Wang, Jianshu Chen, Yunkai Zhang, Hong Wang, Shiyang Li, Xiyou Zhou, and William Yang Wang. 2020. TabFact: A Large-Scale Dataset for Table-Based Fact Verification. https://doi.org/10.48550/arXiv.1909.02164 arxiv:1909.02164 [cs]Google ScholarCross Ref
Xi Chen, Wei Zeng, Yanna Lin, Hayder Mahdi AI-maneea, Jonathan Roberts, and Remco Chang. 2021. Composition and Configuration Patterns in Multiple-View Visualizations. IEEE Transactions on Visualization and Computer Graphics 27, 2(Feb. 2021), 1514–1524. https://doi.org/10.1109/TVCG.2020.3030338Google ScholarCross Ref
Zhe Chen, Michael Cafarella, and Eytan Adar. 2015. DiagramFlyer: A Search Engine for Data-Driven Diagrams. (May 2015), 183–186. https://doi.org/10.1145/2740908.2742831Google ScholarDigital Library
Beibei Cheng, Sameer Antani, R. Joe Stanley, and George R. Thoma. 2011. Automatic Segmentation of Subfigure Image Panels for Multimodal Biomedical Document Retrieval. 7874 (Jan. 2011), 294–304. https://doi.org/10.1117/12.873685Google ScholarCross Ref
Zewen Chi, Heyan Huang, Heng-Da Xu, Houjin Yu, Wanxuan Yin, and Xian-Ling Mao. 2019. Complicated Table Structure Recognition. (Aug. 2019). https://doi.org/10.48550/arXiv.1908.04729Google ScholarCross Ref
Sagnik Ray Choudhury, Prasenjit Mitra, Andi Kirk, Silvia Szep, Donald Pellegrino, Sue Jones, and C. Lee Giles. 2013. Figure Metadata Extraction from Digital Documents. In 2013 12th International Conference on Document Analysis and Recognition. 135–139. https://doi.org/10.1109/ICDAR.2013.34Google ScholarDigital Library
Sagnik Ray Choudhury, Suppawong Tuarob, Prasenjit Mitra, Lior Rokach, Andi Kirk, Silvia Szep, Donald Pellegrino, Sue Jones, and C.L. Giles. 2013. A Figure Search Engine Architecture for a Chemistry Digital Library. (July 2013), 369–370. https://doi.org/10.1145/2467696.2467757Google ScholarDigital Library
Arnab Ghosh Chowdhury, Martin ben Ahmed, and Martin Atzmueller. 2022. Towards Tabular Data Extraction From Richly-Structured Documents Using Supervised and Weakly-Supervised Learning. In 2022 IEEE 27th International Conference on Emerging Technologies and Factory Automation (ETFA). IEEE, 1–4.Google Scholar
Christopher Clark and Santosh Divvala. 2016. PDFFigures 2.0: Mining Figures from Research Papers. In 2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL). 143–152.Google Scholar
Christopher Clark and Santosh K. Divvala. 2015. Looking Beyond Text: Extracting Figures, Tables and Captions from Computer Science Papers.(April 2015).Google Scholar
Mathieu Cliche, David Rosenberg, Dhruv Madeka, and Connie Yee. 2017. Scatteract: Automated Extraction of Data from Scatter Plots. Vol. 10534. 135–150. https://doi.org/10.1007/978-3-319-71249-9_9 arxiv:1704.06687 [cs, stat]Google ScholarCross Ref
Wenjing Dai, Meng Wang, Zhibin Niu, and Jiawan Zhang. 2018. Chart Decoder: Generating Textual and Numeric Information from Chart Images Automatically. Journal of Visual Languages & Computing 48 (Oct. 2018), 101–109. https://doi.org/10.1016/j.jvlc.2018.08.005Google ScholarCross Ref
Kenny Davila, Bhargava Urala Kota, Srirangaraj Setlur, Venu Govindaraju, Christopher Tensmeyer, Sumit Shekhar, and Ritwick Chaudhry. 2019. ICDAR 2019 Competition on Harvesting Raw Tables from Infographics (CHART-Infographics). In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, Sydney, Australia, 1594–1599. https://doi.org/10.1109/ICDAR.2019.00203Google ScholarCross Ref
Kenny Davila, Srirangaraj Setlur, David Doermann, Bhargava Urala Kota, and Venu Govindaraju. 2020. Chart mining: A survey of methods for automated chart analysis. IEEE transactions on pattern analysis and machine intelligence 43, 11(2020), 3799–3819.Google Scholar
Kenny Davila, Chris Tensmeyer, Sumit Shekhar, Hrituraj Singh, Srirangaraj Setlur, and Venu Govindaraju. 2021. Icpr 2020-competition on harvesting raw tables from infographics. In International Conference on Pattern Recognition. Springer, 361–380.Google ScholarDigital Library
Kenny Davila, Fei Xu, Saleem Ahmed, David A Mendoza, Srirangaraj Setlur, and Venu Govindaraju. 2022. ICPR 2022: Challenge on Harvesting Raw Tables from Infographics (CHART-Infographics). In 2022 26th International Conference on Pattern Recognition (ICPR). IEEE, 4995–5001.Google Scholar
Dina Demner-Fushman, Sameer Antani, and George R. Thoma. 2007. Automatically Finding Images for Clinical Decision Support. In Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007). 139–144. https://doi.org/10.1109/ICDMW.2007.12Google ScholarDigital Library
Dazhen Deng, Yihong Wu, Xinhuan Shu, Jiang Wu, Siwei Fu, Weiwei Cui, and Yingcai Wu. 2022. VisImages: A Fine-Grained Expert-Annotated Visualization Dataset. IEEE Transactions on Visualization and Computer Graphics (2022), 1–1. https://doi.org/10.1109/TVCG.2022.3155440Google ScholarDigital Library
Yuntian Deng, Anssi Kanervisto, and Alexander Rush. 2016. What You Get Is What You See: A Visual Markup Decompiler. (Sept. 2016).Google Scholar
Yuntian Deng, David Rosenberg, and Gideon Mann. 2019. Challenges in End-to-End Neural Scientific Table Recognition. In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, Sydney, Australia, 894–901. https://doi.org/10.1109/ICDAR.2019.00148Google ScholarCross Ref
Harsh Desai, Pratik Kayal, and Mayank Singh. 2021. TabLeX: A Benchmark Dataset for Structure and Content Information Extraction from Scientific Tables. In Document Analysis and Recognition – ICDAR 2021, Josep Lladós, Daniel Lopresti, and Seiichi Uchida (Eds.). Vol. 12822. Springer International Publishing, Cham, 554–569. https://doi.org/10.1007/978-3-030-86331-9_36Google ScholarDigital Library
Siqi Du, Shengjun Tang, Weixi Wang, Xiaoming Li, and Renzhong Guo. 2023. Tree-GPT: Modular Large Language Model Expert System for Forest Remote Sensing Image Understanding and Interactive Analysis. https://doi.org/10.48550/arXiv.2310.04698 arxiv:2310.04698 [cs]Google ScholarCross Ref
David W. Embley, Matthew Hurst, Daniel Lopresti, and George Nagy. 2006. Table-Processing Paradigms: A Research Survey. International Journal of Document Analysis and Recognition (IJDAR) 8, 2-3(June 2006), 66–86. https://doi.org/10.1007/s10032-006-0017-xGoogle ScholarCross Ref
Sedigheh Eslami, Gerard de Melo, and Christoph Meinel. 2021. Does CLIP Benefit Visual Question Answering in the Medical Domain as Much as It Does in the General Domain? https://doi.org/10.48550/arXiv.2112.13906 arxiv:2112.13906 [cs]Google ScholarCross Ref
Keyur Faldu, Amit Sheth, Prashant Kikani, and Hemang Akbari. 2021. KI-BERT: Infusing Knowledge Context for Better Language and Domain Understanding. https://doi.org/10.48550/arXiv.2104.08145 arxiv:2104.08145 [cs]Google ScholarCross Ref
Ali Mazraeh Farahani, Peyman Adibi, Alireza Darvishy, Mohammad Saeed Ehsani, and Hans-Peter Hutter. 2023. Automatic chart understanding: a review. IEEE Access (2023).Google Scholar
Said Fathalla, Sahar Vahdati, Sören Auer, Christoph Lange, Christoph Lange, and Christoph Lange. 2017. Towards a Knowledge Graph Representing Research Findings by Semantifying Survey Articles. (Sept. 2017), 315–327. https://doi.org/10.1007/978-3-319-67008-9_25Google ScholarCross Ref
Jinglun Gao, Yin Zhou, and Kenneth E. Barner. 2012. View: Visual Information Extraction Widget for Improving Chart Images Accessibility. In 2012 19th IEEE International Conference on Image Processing. IEEE, 2865–2868.Google Scholar
Liangcai Gao, Yilun Huang, Hervé Déjean, Jean-Luc Meunier, Qinqin Yan, Yu Fang, Florian Kleber, and Eva Lang. 2019. ICDAR 2019 competition on table detection and recognition (cTDaR). In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, 1510–1515.Google Scholar
Andrea Gemelli, Emanuele Vivoli, and Simone Marinai. 2022. Graph Neural Networks and Representation Embedding for Table Extraction in PDF Documents. https://doi.org/10.48550/arXiv.2208.11203 arxiv:2208.11203 [cs]Google ScholarCross Ref
Azka Gilani, Shah Rukh Qasim, Imran Malik, and Faisal Shafait. 2017. Table Detection Using Deep Learning. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). IEEE, Kyoto, 771–776. https://doi.org/10.1109/ICDAR.2017.131Google ScholarCross Ref
Max Göbel, Tamir Hassan, Ermelinda Oro, and Giorgio Orsi. 2013. ICDAR 2013 table competition. In 2013 12th International Conference on Document Analysis and Recognition. IEEE, 1449–1453.Google Scholar
Zengyuan Guo, Yuechen Yu, Pengyuan Lv, Chengquan Zhang, Haojie Li, Zhihui Wang, Kun Yao, Jingtuo Liu, and Jingdong Wang. 2022. TRUST: An Accurate and End-to-End Table Structure Recognizer Using Splitting-Based Transformers. arxiv:2208.14687 [cs]Google Scholar
Yucheng Han, Chi Zhang, Xin Chen, Xu Yang, Zhibin Wang, Gang Yu, Bin Fu, and Hanwang Zhang. 2023. ChartLlama: A Multimodal LLM for Chart Understanding and Generation. https://doi.org/10.48550/arXiv.2311.16483 arxiv:2311.16483 [cs]Google ScholarCross Ref
Khurram Azeem Hashmi, Marcus Liwicki, Didier Stricker, Muhammad Adnan Afzal, Muhammad Ahtsham Afzal, and Muhammad Zeshan Afzal. 2021. Current Status and Performance Analysis of Table Recognition in Document Images with Deep Neural Networks. arXiv:2104.14272 [cs] (May 2021). arxiv:2104.14272 [cs]Google Scholar
Khurram Azeem Hashmi, Alain Pagani, Marcus Liwicki, Didier Stricker, and Muhammad Zeshan Afzal. 2021. CasTabDetectoRS: Cascade Network for Table Detection in Document Images with Recursive Feature Pyramid and Switchable Atrous Convolution. Journal of Imaging 7, 10 (Oct. 2021), 214. https://doi.org/10.3390/jimaging7100214Google ScholarCross Ref
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision. 2961–2969.Google ScholarCross Ref
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.Google ScholarCross Ref
Yelin He, Xianbiao Qi, Jiaquan Ye, Peng Gao, Yihao Chen, Bingcong Li, Xin Tang, and Rong Xiao. 2021. PingAn-VCGroup’s Solution for ICDAR 2021 Competition on Scientific Table Image Recognition to Latex. arXiv preprint arXiv:2105.01846(2021).Google Scholar
Yingxu He and Qiqi Sun. 2023. Towards Automatic Satellite Images Captions Generation Using Large Language Models. https://arxiv.org/abs/2310.11392v1.Google Scholar
Nidhi Hegde, Sujoy Paul, Gagan Madan, and Gaurav Aggarwal. 2023. Analyzing the Efficacy of an LLM-Only Approach for Image-Based Document Question Answering. https://arxiv.org/abs/2309.14389v1.Google Scholar
William R. Hersh, Henning Müller, and Jayashree Kalpathy-Cramer. 2009. The ImageCLEFmed Medical Image Retrieval Task Test Collection. Journal of Digital Imaging 22, 6 (Dec. 2009), 648–655. https://doi.org/10.1007/s10278-008-9154-8Google ScholarCross Ref
Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno, and Julian Martin Eisenschlos. 2020. TAPAS: Weakly Supervised Table Parsing via Pre-Training. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 4320–4333. https://doi.org/10.18653/v1/2020.acl-main.398 arxiv:2004.02349 [cs]Google ScholarCross Ref
Anwen Hu, Yaya Shi, Haiyang Xu, Jiabo Ye, Qinghao Ye, Ming Yan, Chenliang Li, Qi Qian, Ji Zhang, and Fei Huang. 2023. mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model. https://doi.org/10.48550/arXiv.2311.18248 arxiv:2311.18248 [cs]Google ScholarCross Ref
Kung-Hsiang Huang, Mingyang Zhou, Hou Pong Chan, Yi R. Fung, Zhenhailong Wang, Lingyu Zhang, Shih-Fu Chang, and Heng Ji. 2023. Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning. https://doi.org/10.48550/arXiv.2312.10160 arxiv:2312.10160 [cs]Google ScholarCross Ref
Yongshuai Huang, Ning Lu, Dapeng Chen, Yibo Li, Zecheng Xie, Shenggao Zhu, Liangcai Gao, and Wei Peng. 2023. Improving Table Structure Recognition with Visual-Alignment Sequential Coordinate Modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11134–11143.Google Scholar
Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, and Furu Wei. 2022. LayoutLMv3: Pre-Training for Document AI with Unified Text and Image Masking. https://doi.org/10.48550/arXiv.2204.08387 arxiv:2204.08387 [cs]Google ScholarCross Ref
Yilun Huang, Qinqin Yan, Yibo Li, Yifan Chen, Xiong Wang, Liangcai Gao, and Zhi Tang. 2019. A YOLO-Based Table Detection Method. In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, Sydney, Australia, 813–818. https://doi.org/10.1109/ICDAR.2019.00135Google ScholarCross Ref
Matthew Hurst. 2001. Layout and language: Challenges for table understanding on the web. In Proceedings of the International Workshop on Web Document Analysis. 27–30.Google Scholar
Matthew Francis Hurst. 2000. The interpretation of tables in texts. Ph. D. Dissertation. University of Edinburgh.Google Scholar
Mohamad Yaser Jaradeh, Allard Oelen, Kheir Eddine Farfar, Kheir Eddine Farfar, Manuel Prinz, Jennifer D’Souza, Jennifer D’Souza, Gábor Kismihók, Gábor Kismihók, Markus Stocker, and Sören Auer. 2019. Open Research Knowledge Graph: Next Generation Infrastructure for Semantic Scholarly Knowledge. (Sept. 2019), 243–246. https://doi.org/10.1145/3360901.3364435Google ScholarDigital Library
Aditya Jindal, Ankur Gupta, Jaya Srivastava, Preeti Menghwani, Vijit Malik, Vishesh Kaushik, and Ashutosh Modi. 2021. BreakingBERT@IITK at SemEval-2021 Task 9 : Statement Verification and Evidence Finding with Tables. arxiv:2104.03071 [cs]Google Scholar
Daekyoung Jung, Wonjae Kim, Hyunjoo Song, Jeong-in Hwang, Bongshin Lee, Bohyoung Kim, and Jinwook Seo. 2017. ChartSense: Interactive Data Extraction from Chart Images. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI ’17). Association for Computing Machinery, New York, NY, USA, 6706–6717. https://doi.org/10.1145/3025453.3025957Google ScholarDigital Library
Kushal Kafle, Brian Price, Scott Cohen, and Christopher Kanan. 2018. DVQA: Understanding Data Visualizations via Question Answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5648–5656.Google ScholarCross Ref
Charles E. Kahn and Cheng Thao. 2007. GoldMiner: A Radiology Image Search Engine. AJR. American journal of roentgenology 188, 6 (June 2007), 1475–1478. https://doi.org/10.2214/AJR.06.1740Google ScholarCross Ref
Samira Ebrahimi Kahou, Vincent Michalski, Adam Atkinson, Akos Kadar, Adam Trischler, and Yoshua Bengio. 2018. FigureQA: An Annotated Figure Dataset for Visual Reasoning. https://doi.org/10.48550/arXiv.1710.07300 arxiv:1710.07300 [cs]Google ScholarCross Ref
Sampanna Yashwant Kahu, William A. Ingram, Edward A. Fox, and Jian Wu. 2021. ScanBank: A Benchmark Dataset for Figure Extraction from Scanned Electronic Theses and Dissertations. https://doi.org/10.48550/arXiv.2106.15320 arxiv:2106.15320 [cs]Google ScholarCross Ref
Amar Viswanathan Kannan, Dmitriy Fradkin, Ioannis Akrotirianakis, Tugba Kulahcioglu, Arquimedes Canedo, Aditi Roy, Shih-Yuan Yu, Malawade Arnav, and Mohammad Abdullah Al Faruque. 2020. Multimodal Knowledge Graph for Deep Learning Papers and Code. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (CIKM ’20). Association for Computing Machinery, New York, NY, USA, 3417–3420. https://doi.org/10.1145/3340531.3417439Google ScholarDigital Library
Zeba Karishma, Shaurya Rohatgi, Kavya Shrinivas Puranik, Jian Wu, and C. Lee Giles. 2023. ACL-Fig: A Dataset for Scientific Figure Classification. https://doi.org/10.48550/arXiv.2301.12293 arxiv:2301.12293 [cs]Google ScholarCross Ref
Jerrold J. Katz and Jerry A. Fodor. 1963. The Structure of a Semantic Theory. Language 39, 2 (1963), 170–210. https://doi.org/10.2307/411200Google ScholarCross Ref
I. Kavasidis, C. Pino, S. Palazzo, F. Rundo, D. Giordano, P. Messina, and C. Spampinato. 2019. A Saliency-Based Convolutional Neural Network for Table and Chart Detection in Digitized Documents. In Image Analysis and Processing – ICIAP 2019 (Lecture Notes in Computer Science), Elisa Ricci, Samuel Rota Bulò, Cees Snoek, Oswald Lanz, Stefano Messelodi, and Nicu Sebe (Eds.). Springer International Publishing, Cham, 292–302. https://doi.org/10.1007/978-3-030-30645-8_27Google ScholarDigital Library
Pratik Kayal, Mrinal Anand, Harsh Desai, and Mayank Singh. 2021. ICDAR 2021 Competition on Scientific Table Image Recognition to LaTeX. In Document Analysis and Recognition – ICDAR 2021, Josep Lladós, Daniel Lopresti, and Seiichi Uchida (Eds.). Vol. 12824. Springer International Publishing, Cham, 754–766. https://doi.org/10.1007/978-3-030-86337-1_50Google ScholarDigital Library
Elvis Koci, Maik Thiele, Josephine Rehak, Oscar Romero, and Wolfgang Lehner. 2019. DECO: A dataset of annotated spreadsheets for layout and table recognition. In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, 1280–1285.Google ScholarCross Ref
Benno Kruit, Hongyu He, and Jacopo Urbani. 2020. Tab2Know: Building a Knowledge Base from Tables in Scientific Papers. In The Semantic Web – ISWC 2020, Jeff Z. Pan, Valentina Tamma, Claudia d’Amato, Krzysztof Janowicz, Bo Fu, Axel Polleres, Oshani Seneviratne, and Lalana Kagal (Eds.). Vol. 12506. Springer International Publishing, Cham, 349–365. https://doi.org/10.1007/978-3-030-62419-4_20Google ScholarDigital Library
Saar Kuzi, ChengXiang Zhai, Yin Tian, and Haichuan Tang. 2020. FigExplorer: A System for Retrieval and Exploration of Figures from Collections of Research Articles. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’20). Association for Computing Machinery, New York, NY, USA, 2133–2136. https://doi.org/10.1145/3397271.3401400Google ScholarDigital Library
Jay Lal, Aditya Mitkari, Mahesh Bhosale, and David Doermann. 2023. LineFormer: Line Chart Data Extraction Using Instance Segmentation. In International Conference on Document Analysis and Recognition. Springer, 387–400.Google Scholar
Po-Shen Lee and Bill Howe. 2015. Detecting and Dismantling Composite Visualizations in the Scientific Literature. (Jan. 2015), 247–266. https://doi.org/10.1007/978-3-319-27677-9_16Google ScholarCross Ref
Po-Shen Lee, Jevin D. West, and Bill Howe. 2018. Viziometrics: Analyzing Visual Information in the Scientific Literature. IEEE Transactions on Big Data 4, 1 (March 2018), 117–129. https://doi.org/10.1109/TBDATA.2017.2689038Google ScholarCross Ref
Suhyeon Lee, Won Jun Kim, Jinho Chang, and Jong Chul Ye. 2023. LLM-CXR: Instruction-Finetuned LLM for CXR Image Understanding and Generation. https://arxiv.org/abs/2305.11490v4.Google Scholar
Shih-Hsiung Lee and Hung-Chun Chen. 2021. U-SSD: Improved SSD Based on U-Net Architecture for End-to-End Table Detection in Document Images. Applied Sciences 11, 23 (Jan. 2021), 11446. https://doi.org/10.3390/app112311446Google ScholarCross Ref
Sheng Long Lee, Mohammad Reza Zare, and Mohammad Reza Zare. 2018. Biomedical Compound Figure Detection Using Deep Learning and Fusion Techniques. Iet Image Processing 12, 6 (Jan. 2018), 1031–1037. https://doi.org/10.1049/iet-ipr.2017.0800Google ScholarCross Ref
Chenxia Li, Ruoyu Guo, Jun Zhou, Mengtao An, Yuning Du, Lingfeng Zhu, Yi Liu, Xiaoguang Hu, and Dianhai Yu. 2022. PP-StructureV2: A Stronger Document Analysis System. arxiv:2210.05391 [cs]Google Scholar
Huichao Li, Lingze Zeng, Weiyu Zhang, Jianing Zhang, Ju Fan, and Meihui Zhang. 2022. A Two-Phase Approach for Recognizing Tables with Complex Structures. In Database Systems for Advanced Applications, Arnab Bhattacharya, Janice Lee Mong Li, Divyakant Agrawal, P. Krishna Reddy, Mukesh Mohania, Anirban Mondal, Vikram Goyal, and Rage Uday Kiran (Eds.). Vol. 13245. Springer International Publishing, Cham, 587–595. https://doi.org/10.1007/978-3-031-00123-9_47Google ScholarDigital Library
Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, and Furu Wei. 2022. DiT: Self-Supervised Pre-Training for Document Image Transformer. https://doi.org/10.48550/arXiv.2203.02378 arxiv:2203.02378 [cs]Google ScholarCross Ref
Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou, and Zhoujun Li. 2019. Tablebank: A Benchmark Dataset for Table Detection and Recognition. arXiv preprint arXiv:1903.01949(2019). arxiv:1903.01949Google Scholar
Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou, and Zhoujun Li. 2020. Tablebank: Table benchmark for image-based table detection and recognition. In Proceedings of The 12th language resources and evaluation conference. 1918–1925.Google Scholar
Minghao Li, Yiheng Xu, Lei Cui, Shaohan Huang, Furu Wei, Zhoujun Li, and Ming Zhou. 2020. DocBank: A Benchmark Dataset for Document Layout Analysis. https://doi.org/10.48550/arXiv.2006.01038 arxiv:2006.01038 [cs]Google ScholarCross Ref
Xiao-Hui Li. 2022. Table Structure Recognition and Form Parsing by End-to-End Object Detection and Relation Parsing. Pattern Recognition (2022).Google Scholar
Weihong Lin. 2022. TSRFormer: Table Structure Recognition with Transformers. (2022).Google Scholar
Fuxiao Liu, Xiaoyang Wang, Wenlin Yao, Jianshu Chen, Kaiqiang Song, Sangwoo Cho, Yaser Yacoob, and Dong Yu. 2023. MMC: Advancing Multimodal Chart Understanding with Large-Scale Instruction Tuning. https://doi.org/10.48550/arXiv.2311.10774 arxiv:2311.10774 [cs]Google ScholarCross Ref
Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. 2023. Visual instruction tuning. arXiv preprint arXiv:2304.08485(2023).Google Scholar
Hao Liu, Xin Li, Bing Liu, Deqiang Jiang, Yinsong Liu, and Bo Ren. 2022. Neural Collaborative Graph Machines for Table Structure Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4533–4542.Google Scholar
Hao Liu, Xin Li, Bing Liu, Deqiang Jiang, Yinsong Liu, and Bo Ren. 2022. Neural Collaborative Graph Machines for Table Structure Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4533–4542.Google Scholar
Hao Liu, Xin Li, Bing Liu, Deqiang Jiang, Yinsong Liu, Bo Ren, and Rongrong Ji. 2021. Show, Read and Reason: Table Structure Recognition with Flexible Context Aggregator. In Proceedings of the 29th ACM International Conference on Multimedia (MM ’21). Association for Computing Machinery, New York, NY, USA, 1084–1092. https://doi.org/10.1145/3474085.3481534Google ScholarDigital Library
Jixiong Liu, Yoan Chabot, Raphaël Troncy, Viet-Phi Huynh, Thomas Labbé, and Pierre Monnin. 2023. From tabular data to knowledge graphs: A survey of semantic table interpretation tasks and methods. Journal of Web Semantics 76 (2023), 100761.Google ScholarDigital Library
Ying Liu, Kun Bai, Prasenjit Mitra, and C. Lee Giles. 2007. TableSeer: Automatic Table Metadata Extraction and Searching in Digital Libraries. In Proceedings of the 2007 Conference on Digital Libraries - JCDL ’07. ACM Press, Vancouver, BC, Canada, 91. https://doi.org/10.1145/1255175.1255193Google ScholarDigital Library
Ying Liu, Kun Bai, Prasenjit Mitra, C Lee Giles, et al. 2007. Tablerank: A ranking algorithm for table search and retrieval. In Proceedings of the National Conference on Artificial Intelligence, Vol. 22. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999, 317.Google Scholar
Yan Liu, Xiaoqing Lu, Yeyang Qin, Zhi Tang, and Jianbo Xu. 2013. Review of Chart Recognition in Document Images. In Visualization and Data Analysis 2013, Vol. 8654. SPIE, 384–391. https://doi.org/10.1117/12.2008467Google ScholarCross Ref
Yingli Liu, Changkai Si, Kai Jin, Tao Shen, and Meng Hu. 2021. FCENet: An Instance Segmentation Model for Extracting Figures and Captions From Material Documents. IEEE Access 9(2021), 551–564. 3.367 https://doi.org/10.1109/ACCESS.2020.3046496Google ScholarCross Ref
Rujiao Long, Wen Wang, Nan Xue, Feiyu Gao, Zhibo Yang, Yongpan Wang, and Gui-Song Xia. 2021. Parsing Table Structures in the Wild. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 944–952.Google Scholar
Luis D. Lopez, Jingyi Yu, Cecilia N. Arighi, Hongzhan Huang, Hagit Shatkay, and Cathy Wu. 2011. An Automatic System for Extracting Figures and Captions in Biomedical PDF Documents. In 2011 IEEE International Conference on Bioinformatics and Biomedicine. 578–581. https://doi.org/10.1109/BIBM.2011.26Google ScholarDigital Library
Daniel Lopresti and George Nagy. 2000. A Tabular Survey of Automated Table Processing. In Graphics Recognition Recent Advances, Gerhard Goos, Juris Hartmanis, Jan van Leeuwen, Atul K. Chhabra, and Dov Dori (Eds.). Vol. 1941. Springer Berlin Heidelberg, Berlin, Heidelberg, 93–120. https://doi.org/10.1007/3-540-40953-X_9Google ScholarCross Ref
Ning Lu, Wenwen Yu, Xianbiao Qi, Yihao Chen, Ping Gong, Rong Xiao, and Xiang Bai. 2021. MASTER: Multi-Aspect Non-Local Network for Scene Text Recognition. Pattern Recognition 117(Sept. 2021), 107980. https://doi.org/10.1016/j.patcog.2021.107980Google ScholarCross Ref
Yi Luan, Luheng He, Mari Ostendorf, and Hannaneh Hajishirzi. 2018. Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 3219–3232. https://doi.org/10.18653/v1/D18-1360Google ScholarCross Ref
Junyu Luo, Zekun Li, Jinpeng Wang, and Chin-Yew Lin. 2021. Chartocr: Data extraction from charts images via a deep hybrid framework. In Proceedings of the IEEE/CVF winter conference on applications of computer vision. 1917–1925.Google ScholarCross Ref
Nam Tuan Ly and Atsuhiro Takasu. 2023. An End-to-End Local Attention Based Model for Table Recognition. In International Conference on Document Analysis and Recognition. Springer, 20–36.Google Scholar
Nam Tuan Ly, Atsuhiro Takasu, Phuc Nguyen, and Hideaki Takeda. 2023. Rethinking Image-based Table Recognition Using Weakly Supervised Methods. arXiv preprint arXiv:2303.07641(2023).Google Scholar
Pengyuan Lyu, Weihong Ma, Hongyi Wang, Yuechen Yu, Chengquan Zhang, Kun Yao, Yang Xue, and Jingdong Wang. 2023. GridFormer: Towards Accurate Table Structure Recognition via Grid Prediction. In Proceedings of the 31st ACM International Conference on Multimedia. 7747–7757.Google ScholarDigital Library
Chixiang Ma, Weihong Lin, Lei Sun, and Qiang Huo. 2023. Robust Table Detection and Structure Recognition from Heterogeneous Document Images. Pattern Recognition 133(Jan. 2023), 109006. https://doi.org/10.1016/j.patcog.2022.109006Google ScholarDigital Library
Paula Maddigan and Teo Susnjak. 2023. Chat2VIS: Generating Data Visualizations via Natural Language Using ChatGPT, Codex and GPT-3 Large Language Models. IEEE Access 11(2023), 45181–45193. https://doi.org/10.1109/ACCESS.2023.3274199Google ScholarCross Ref
Ahmed Masry, Do Xuan Long, Jia Qing Tan, Shafiq Joty, and Enamul Hoque. 2022. ChartQA: A benchmark for question answering about charts with visual and logical reasoning. arXiv preprint arXiv:2203.10244(2022).Google Scholar
Ahmed Masry and Enamul Hoque Prince. 2021. Integrating Image Data Extraction and Table Parsing Methods for Chart Question Answering. (2021), 5.Google Scholar
Mark E. Mattie, Lawrence Staib, Eric Stratmann, Hemant D. Tagare, James Duncan, and Perry L. Miller. 2000. PathMaster: Content-Based Cell Image Retrieval Using Automated Feature Extraction. Journal of the American Medical Informatics Association 7, 4 (July 2000), 404–415. https://doi.org/10.1136/jamia.2000.0070404Google ScholarCross Ref
Fanqing Meng, Wenqi Shao, Quanfeng Lu, Peng Gao, Kaipeng Zhang, Yu Qiao, and Ping Luo. 2024. ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-Training and Multitask Instruction Tuning. https://doi.org/10.48550/arXiv.2401.02384 arxiv:2401.02384 [cs]Google ScholarCross Ref
Nitesh Methani, Pritha Ganguly, Mitesh M. Khapra, and Pratyush Kumar. 2020. PlotQA: Reasoning over Scientific Plots. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 1527–1536.Google ScholarCross Ref
Nikola Milosevic, Cassie Gregson, Robert Hernandez, and Goran Nenadic. 2019. A framework for information extraction from tables in biomedical literature. International Journal on Document Analysis and Recognition (IJDAR) 22 (2019), 55–78.Google ScholarDigital Library
Ales Mishchenko and Natalia Vassilieva. 2011. Chart image understanding and numerical data extraction. In 2011 Sixth International Conference on Digital Information Management. IEEE, 115–120.Google ScholarCross Ref
Prerna Mishra, Santosh Kumar, and Mithilesh Kumar Chaube. 2022. Evaginating Scientific Charts: Recovering Direct and Derived Information Encodings from Chart Images. Journal of Visualization 25, 2 (April 2022), 343–359. https://doi.org/10.1007/s12650-021-00800-zGoogle ScholarDigital Library
Ajoy Mondal, Peter Lipps, and CV Jawahar. 2020. IIIT-AR-13K: A new dataset for graphical object detection in documents. In Document Analysis Systems: 14th IAPR International Workshop, DAS 2020, Wuhan, China, July 26–29, 2020, Proceedings 14. Springer, 216–230.Google ScholarCross Ref
Henning Müller, Nicolas Michoux, Nicolas Michoux, Nicolas Michoux, David Bandon, David Bandon, David Bandon, and Antoine Geissbuhler. 2004. A Review of Content-Based Image Retrieval Systems in Medical Applications—Clinical Benefits and Future Directions. International Journal of Medical Informatics 73, 1 (Feb. 2004), 1–23. https://doi.org/10.1016/j.ijmedinf.2003.11.024Google ScholarCross Ref
Rathin Radhakrishnan Nair, Nishant Sankaran, Ifeoma Nwogu, and Venu Govindaraju. 2016. Understanding Line Plots Using Bayesian Network. In 2016 12th IAPR Workshop on Document Analysis Systems (DAS). IEEE, 108–113.Google Scholar
Marcin Namysł, Alexander M Esser, Sven Behnke, and Joachim Köhler. 2023. Flexible Hybrid Table Recognition and Semantic Interpretation System. SN Computer Science 4, 3 (2023), 246.Google ScholarDigital Library
Ahmed Nassar, Nikolaos Livathinos, Maksym Lysak, and Peter Staar. 2022. TableFormer: Table Structure Understanding With Transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4614–4623.Google ScholarCross Ref
Danish Nazir, Khurram Azeem Hashmi, Alain Pagani, Marcus Liwicki, Didier Stricker, and Muhammad Zeshan Afzal. 2021. HybridTabNet: Towards Better Table Detection in Scanned Document Images. Applied Sciences 11, 18 (Jan. 2021), 8396. https://doi.org/10.3390/app11188396Google ScholarCross Ref
Allard Oelen, Markus Stocker, and Sören Auer. 2020. Creating a Scholarly Knowledge Graph from Survey Article Tables. In Digital Libraries at Times of Massive Societal Transition, Emi Ishita, Natalie Lee San Pang, and Lihong Zhou (Eds.). Springer International Publishing, Cham, 373–389. https://doi.org/10.1007/978-3-030-64452-9_35Google ScholarDigital Library
Kemal Oksuz, Baris Can Cam, Emre Akbas, and Sinan Kalkan. 2018. Localization Recall Precision (LRP): A New Performance Metric for Object Detection. In Computer Vision – ECCV 2018, Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Vol. 11211. Springer International Publishing, Cham, 521–537. https://doi.org/10.1007/978-3-030-01234-2_31Google ScholarDigital Library
Rafael Padilla, Sergio L. Netto, and Eduardo A. B. da Silva. 2020. A Survey on Performance Metrics for Object-Detection Algorithms. In 2020 International Conference on Systems, Signals and Image Processing (IWSSIP). 237–242. https://doi.org/10.1109/IWSSIP48289.2020.9145130Google ScholarCross Ref
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics. 311–318.Google Scholar
Hai-Hong Phan. 2021. An Integrated Approach for Table Detection and Structure Recognition. Journal of Research and Development on Information and Communication Technology 2021, 1 (May 2021), 41–50. https://doi.org/10.32913/mic-ict-research.v2021.n1.974Google ScholarCross Ref
Ihsin Tsaiyun Phillips. 1996. User’s reference manual for the UW english/technical document image database III. UW-III English/technical document image database manual (1996).Google Scholar
Jorge Poco and Jeffrey Heer. 2017. Reverse-Engineering Visualizations: Recovering Visual Encodings from Chart Images. Computer Graphics Forum 36, 3 (June 2017), 353–363. https://doi.org/10.1111/cgf.13193Google ScholarDigital Library
Devashish Prasad, Ayan Gadpal, Kshitij Kapadni, Manish Visave, and Kavita Sultanpure. 2020. CascadeTabNet: An Approach for End to End Table Detection and Structure Recognition From Image-Based Documents. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 572–573.Google ScholarCross Ref
Jay Pujara, Pedro Szekely, Huan Sun, and Muhao Chen. 2021. From Tables to Knowledge: Recent Advances in Table Understanding. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. ACM, Virtual Event Singapore, 4060–4061. https://doi.org/10.1145/3447548.3470809Google ScholarDigital Library
Shah Rukh Qasim, Jan Kieseler, Yutaro Iiyama, and Maurizio Pierini. 2019. Learning representations of irregular particle-detector geometry with distance-weighted graph networks. The European Physical Journal C 79, 7 (2019), 1–11.Google ScholarCross Ref
Shah Rukh Qasim, Hassan Mahmood, and Faisal Shafait. 2019. Rethinking Table Recognition Using Graph Neural Networks. https://doi.org/10.48550/arXiv.1905.13391 arxiv:1905.13391 [cs]Google ScholarCross Ref
Liang Qiao, Zaisheng Li, Zhanzhan Cheng, Peng Zhang, Shiliang Pu, Yi Niu, Wenqi Ren, Wenming Tan, and Fei Wu. 2021. LGPMA: Complicated Table Structure Recognition with Local and Global Pyramid Mask Alignment. In Document Analysis and Recognition – ICDAR 2021, Josep Lladós, Daniel Lopresti, and Seiichi Uchida (Eds.). Vol. 12821. Springer International Publishing, Cham, 99–114. https://doi.org/10.1007/978-3-030-86549-8_7Google ScholarDigital Library
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning. PMLR, 8748–8763.Google Scholar
Sachin Raja, Ajoy Mondal, and C. V. Jawahar. 2020. Table Structure Recognition Using Top-Down and Bottom-Up Cues. In Computer Vision – ECCV 2020, Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Vol. 12373. Springer International Publishing, Cham, 70–86. https://doi.org/10.1007/978-3-030-58604-1_5Google ScholarDigital Library
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems 28 (2015).Google Scholar
Pau Riba, Anjan Dutta, Lutz Goldmann, Alicia Fornés, Oriol Ramos, and Josep Lladós. 2019. Table Detection in Invoice Documents by Graph Neural Networks. In 2019 International Conference on Document Analysis and Recognition (ICDAR). 122–127. https://doi.org/10.1109/ICDAR.2019.00028Google ScholarCross Ref
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer, 234–241.Google Scholar
Ranajit Saha, Ajoy Mondal, and C V Jawahar. 2019. Graphical Object Detection in Document Images. In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, Sydney, Australia, 51–58. https://doi.org/10/gngxg6Google Scholar
Naveen Saini, Sriparna Saha, Pushpak Bhattacharyya, and Himanshu Tuteja. 2020. Textual Entailment–Based Figure Summarization for Biomedical Articles. ACM Transactions on Multimedia Computing, Communications, and Applications 16, 1s(April 2020), 35:1–35:24. https://doi.org/10.1145/3357334Google ScholarDigital Library
Robert J. Sandusky, Carol Tenopir, and Margaret M. Casado. 2007. Figure and Table Retrieval from Scholarly Journal Articles: User Needs for Teaching and Research. Proceedings of the American Society for Information Science and Technology 44, 1 (2007), 1–13. https://doi.org/10.1002/meet.1450440390Google ScholarCross Ref
Sebastian Schreiber, Stefan Agne, Ivo Wolf, Andreas Dengel, and Sheraz Ahmed. 2017. DeepDeSRT: Deep Learning for Detection and Structure Recognition of Tables in Document Images. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Vol. 01. 1162–1167. https://doi.org/10.1109/ICDAR.2017.192Google ScholarCross Ref
KC Shahira and A Lijiya. 2021. Towards assisting the visually impaired: a review on techniques for decoding the visual data from chart images. IEEE Access 9(2021), 52926–52943.Google ScholarCross Ref
Xiangyang Shi, Yue Wu, Yue Wu, Yue Wu, Huaigu Cao, Huaigu Cao, Gully A. P. C. Burns, and Prem Natarajan. 2019. Layout-Aware Subfigure Decomposition for Complex Figures in the Biomedical Literature. (May 2019), 1343–1347. https://doi.org/10.1109/icassp.2019.8683824Google ScholarCross Ref
Shoaib Ahmed Siddiqui, Imran Ali Fateh, Syed Tahseen Raza Rizvi, Andreas Dengel, and Sheraz Ahmed. 2019. DeepTabStR: Deep Learning Based Table Structure Recognition. In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, Sydney, Australia, 1403–1409. https://doi.org/10.1109/ICDAR.2019.00226Google ScholarCross Ref
Shoaib Ahmed Siddiqui, Muhammad Imran Malik, Stefan Agne, Andreas Dengel, and Sheraz Ahmed. 2018. DeCNT: Deep Deformable CNN for Table Detection. IEEE Access 6(2018), 74151–74161. https://doi.org/10/gf8qz9Google ScholarCross Ref
Noah Siegel, Zachary Horvitz, Roie Levin, Santosh Divvala, and Ali Farhadi. 2016. FigureSeer: Parsing Result-Figures in Research Papers. In Computer Vision – ECCV 2016, Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Vol. 9911. Springer International Publishing, Cham, 664–680. https://doi.org/10.1007/978-3-319-46478-7_41Google ScholarCross Ref
Noah Siegel, Nicholas Lourie, Russell Power, and Waleed Ammar. 2018. Extracting Scientific Figures with Distantly Supervised Neural Networks. In Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries (JCDL ’18). Association for Computing Machinery, New York, NY, USA, 223–232. https://doi.org/10.1145/3197026.3197040Google ScholarDigital Library
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556(2014).Google Scholar
Hrituraj Singh and Sumit Shekhar. 2020. STL-CQA: Structure-Based Transformers with Localization and Encoding for Chart Question Answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 3275–3284. https://doi.org/10.18653/v1/2020.emnlp-main.264Google ScholarCross Ref
Brandon Smock, Rohith Pesala, and Robin Abraham. 2021. PubTables-1M: Towards Comprehensive Table Extraction from Unstructured Documents. (Sept. 2021). https://doi.org/10.48550/arXiv.2110.00061Google ScholarCross Ref
Carlos Soto and Shinjae Yoo. 2019. Visual Detection with Context for Document Layout Analysis. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 3464–3470. https://doi.org/10.18653/v1/D19-1348Google ScholarCross Ref
Nishant Subramani, Alexandre Matton, Malcolm Greaves, and Adrian Lam. 2021. A Survey of Deep Learning Approaches for OCR and Document Understanding. https://doi.org/10.48550/arXiv.2011.13534 arxiv:2011.13534 [cs]Google ScholarCross Ref
Hemant D. Tagare, C. Carl Jaffe, and James Duncan. 1997. Medical Image Databases: A Content-Based Retrieval Approach. Journal of the American Medical Informatics Association 4, 3 (May 1997), 184–198. https://doi.org/10.1136/jamia.1997.0040184Google ScholarCross Ref
Mario Taschwer and Oge Marques. 2018. Automatic Separation of Compound Figures in Scientific Articles. Multimedia Tools and Applications 77, 1 (Jan. 2018), 519–548. https://doi.org/10.1007/s11042-016-4237-xGoogle ScholarDigital Library
Chris Tensmeyer, Vlad I. Morariu, Brian Price, Scott Cohen, and Tony Martinez. 2019. Deep Splitting and Merging for Table Structure Decomposition. In 2019 International Conference on Document Analysis and Recognition (ICDAR). 114–121. https://doi.org/10.1109/ICDAR.2019.00027Google ScholarCross Ref
Yuan Tian, Weiwei Cui, Dazhen Deng, Xinjing Yi, Yurun Yang, Haidong Zhang, and Yingcai Wu. 2023. ChartGPT: Leveraging LLMs to Generate Charts from Abstract Natural Language. https://arxiv.org/abs/2311.01920v1.Google Scholar
Dominika Tkaczyk, Pawel Szostek, and Lukasz Bolikowski. 2014. GROTOAP2-the methodology of creating a large ground truth dataset of scientific articles. D-Lib Magazine 20, 11/12 (2014).Google ScholarCross Ref
Satoshi Tsutsui and David J. Crandall. 2017. A Data Driven Approach for Compound Figure Separation Using Convolutional Neural Networks. (Nov. 2017), 533–540. https://doi.org/10.1109/icdar.2017.93Google ScholarCross Ref
Johan Van Benthem. 2008. A brief history of natural logic. (2008).Google Scholar
Honglin Wan, Zongfeng Zhong, Tianping Li, Huaxiang Zhang, and Jiande Sun. 2022. Contextual Transformer Sequence-Based Recognition Network for Medical Examination Reports. Applied Intelligence (Dec. 2022). https://doi.org/10.1007/s10489-022-04420-4Google ScholarDigital Library
Nancy X. R. Wang, Diwakar Mahajan, Marina Danilevsky, and Sara Rosenthal. 2021. SemEval-2021 Task 9: Fact Verification and Evidence Finding for Tabular Data in Scientific Documents (SEM-TAB-FACTS). https://doi.org/10.48550/arXiv.2105.13995 arxiv:2105.13995 [cs]Google ScholarCross Ref
Sheng Wang, Zihao Zhao, Xi Ouyang, Qian Wang, and Dinggang Shen. 2023. Chatcad: Interactive computer-aided diagnosis on medical image using large language models. arXiv preprint arXiv:2302.07257(2023).Google Scholar
Ziao Wang, Yuhang Li, Junda Wu, Jaehyeon Soon, and Xiaofeng Zhang. 2023. FinVis-GPT: A Multimodal Large Language Model for Financial Chart Analysis. https://arxiv.org/abs/2308.01430v1.Google Scholar
Zilong Wang, Hao Zhang, Chun-Liang Li, Julian Martin Eisenschlos, Vincent Perot, Zifeng Wang, Lesly Miculicich, Yasuhisa Fujii, Jingbo Shang, Chen-Yu Lee, and Tomas Pfister. 2024. Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding. arxiv:2401.04398 [cs]Google Scholar
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 35 (2022), 24824–24837.Google Scholar
Aoyu Wu, Yun Wang, Xinhuan Shu, Dominik Moritz, Weiwei Cui, Haidong Zhang, Dongmei Zhang, and Huamin Qu. 2021. Ai4vis: Survey on artificial intelligence approaches for data visualization. IEEE Transactions on Visualization and Computer Graphics (2021).Google ScholarDigital Library
Wenyuan Xue, Baosheng Yu, Wen Wang, Dacheng Tao, and Qingyong Li. 2021. TGRNet: A Table Graph Reconstruction Network for Table Structure Recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1295–1304.Google ScholarCross Ref
Fan Yang, Lei Hu, Xinwu Liu, Shuangping Huang, and Zhenghui Gu. 2023. A large-scale dataset for end-to-end table recognition in the wild. Scientific Data 10, 1 (2023), 110.Google ScholarCross Ref
Liping Yang, Ming Gong, and Vijayan K. Asari. 2020. Diagram Image Retrieval and Analysis: Challenges and Opportunities. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 180–181.Google ScholarCross Ref
Jiabo Ye, Anwen Hu, Haiyang Xu, Qinghao Ye, Ming Yan, Yuhao Dan, Chenlin Zhao, Guohai Xu, Chenliang Li, Junfeng Tian, Qian Qi, Ji Zhang, and Fei Huang. 2023. mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding. https://arxiv.org/abs/2307.02499v1.Google Scholar
Jiabo Ye, Anwen Hu, Haiyang Xu, Qinghao Ye, Ming Yan, Guohai Xu, Chenliang Li, Junfeng Tian, Qi Qian, Ji Zhang, Qin Jin, Liang He, Xin Alex Lin, and Fei Huang. 2023. UReader: Universal OCR-Free Visually-Situated Language Understanding with Multimodal Large Language Model. https://doi.org/10.48550/arXiv.2310.05126 arxiv:2310.05126 [cs]Google ScholarCross Ref
Burcu Yildiz, Katharina Kaiser, and Silvia Miksch. 2005. pdf2table: A method to extract table information from pdf files. In IICAI, Vol. 2005. Citeseer, 1773–1785.Google Scholar
Daekeun You, Emilia Apostolova, Sameer Antani, Dina Demner-Fushman, and George R Thoma. 2009. Figure content analysis for improved biomedical article retrieval. In Document Recognition and Retrieval XVI, Vol. 7247. SPIE, 276–285.Google Scholar
Daekeun You, Emilia Apostolova, Sameer Antani, Dina Demner-Fushman, and George R. Thoma. 2009. Figure Content Analysis for Improved Biomedical Article Retrieval. In Document Recognition and Retrieval XVI, Vol. 7247. SPIE, 276–285. https://doi.org/10.1117/12.805976Google ScholarCross Ref
Fengchang Yu, Jiani Huang, Zhuoran Luo, Li Zhang, and Wei Lu. 2023. An effective method for figures and tables detection in academic literature. Information Processing & Management 60, 3 (2023), 103286.Google ScholarDigital Library
Hong Yu. 2006. Towards Answering Biological Questions with Experimental Evidence: Automatically Identifying Text That Summarize Image Content in Full-Text Articles. AMIA Annual Symposium Proceedings 2006 (2006), 834–838.Google Scholar
Hong Yu and Minsuk Lee. 2006. Accessing Bioscience Images from Abstract Sentences. Bioinformatics 22, 14 (July 2006), e547–e556. https://doi.org/10.1093/bioinformatics/btl261Google ScholarDigital Library
Hong Yu, Feifan Liu, and Balaji Polepalli Ramesh. 2010. Automatic Figure Ranking and User Interfacing for Intelligent Figure Search. PLOS ONE 5, 10 (2010), e12983. https://doi.org/10.1371/journal.pone.0012983Google ScholarCross Ref
Abhay Zala, Han Lin, Jaemin Cho, and Mohit Bansal. 2023. DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning. arxiv:2310.12128 [cs]Google Scholar
Richard Zanibbi, Dorothea Blostein, and JamesR. Cordy. 2004. A Survey of Table Recognition: Models, Observations, Transformations, and Inferences. Document Analysis and Recognition 7, 1 (March 2004). https://doi.org/10.1007/s10032-004-0120-9Google ScholarDigital Library
Peng Zhang, Can Li, Liang Qiao, Zhanzhan Cheng, Shiliang Pu, Yi Niu, and Fei Wu. 2021. VSR: A Unified Framework for Document Layout Analysis Combining Vision, Semantics and Relations. In Document Analysis and Recognition – ICDAR 2021 (Lecture Notes in Computer Science), Josep Lladós, Daniel Lopresti, and Seiichi Uchida (Eds.). Springer International Publishing, Cham, 115–130. https://doi.org/10.1007/978-3-030-86549-8_8Google ScholarDigital Library
Shuo Zhang, Zhuyun Dai, Krisztian Balog, and Jamie Callan. 2020. Summarizing and Exploring Tabular Data in Conversational Search. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, New York, NY, USA, 1537–1540.Google ScholarDigital Library
Zhenrong Zhang, Jianshu Zhang, Jun Du, and Fengren Wang. 2022. Split, Embed and Merge: An Accurate Table Structure Recognizer. Pattern Recognition 126(June 2022), 108565. https://doi.org/10.1016/j.patcog.2022.108565Google ScholarDigital Library
Xinyi Zheng, Doug Burdick, Lucian Popa, Xu Zhong, and Nancy Xin Ru Wang. 2020. Global Table Extractor (GTE): A Framework for Joint Table Identification and Cell Structure Recognition Using Visual Context. https://doi.org/10.48550/arXiv.2005.00589 arxiv:2005.00589 [cs]Google ScholarCross Ref
Xinyi Zheng, Douglas Burdick, Lucian Popa, Xu Zhong, and Nancy Xin Ru Wang. 2021. Global Table Extractor (GTE): A Framework for Joint Table Identification and Cell Structure Recognition Using Visual Context. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 697–706.Google ScholarCross Ref
Xu Zhong, Elaheh ShafieiBavani, and Antonio Jimeno Yepes. 2020. Image-Based Table Recognition: Data, Model, and Evaluation. In Computer Vision – ECCV 2020, Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Vol. 12366. Springer International Publishing, Cham, 564–580. https://doi.org/10.1007/978-3-030-58589-1_34Google ScholarDigital Library
Xu Zhong, Jianbin Tang, and Antonio Jimeno Yepes. 2019. PubLayNet: Largest Dataset Ever for Document Layout Analysis. In 2019 International Conference on Document Analysis and Recognition (ICDAR). 1015–1022. https://doi.org/10.1109/ICDAR.2019.00166Google ScholarCross Ref
Mingyang Zhou, Yi Fung, Long Chen, Christopher Thomas, Heng Ji, and Shih-Fu Chang. 2023. Enhanced Chart Understanding via Visual Language Pre-Training on Plot Table Pairs. In Findings of the Association for Computational Linguistics: ACL 2023, Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (Eds.). Association for Computational Linguistics, Toronto, Canada, 1314–1326. https://doi.org/10.18653/v1/2023.findings-acl.85Google ScholarCross Ref
Junnan Zhu, Haoran Li, Tianshang Liu, Yu Zhou, Jiajun Zhang, and Chengqing Zong. 2018. MSMO: Multimodal Summarization with Multimodal Output. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 4154–4164. https://doi.org/10.18653/v1/D18-1448Google ScholarCross Ref

Index Terms

From Detection to Application: Recent Advances in Understanding Scientific Tables and Figures
1. Applied computing
  1. Document management and text processing
    1. Document capture
      1. Graphics recognition and interpretation
2. General and reference
  1. Document types
    1. Surveys and overviews

Recommendations

Literature Explorer: effective retrieval of scientific documents through nonparametric thematic topic detection
Abstract
Scientific researchers are facing a rapidly growing volume of literatures nowadays. While these publications offer rich and valuable information, the scale of the datasets makes it difficult for the researchers to manage and search for desired ...
Read More
Understanding persistent scientific collaboration

Common sense suggests that persistence is key to success. In academia, successful researchers have been found more likely to be persistent in publishing, but little attention has been given to how persistence in maintaining collaborative relationships ...
Read More
Acknowledgments in scientific publications: Presence in Spanish science and text patterns across disciplines

The acknowledgments in scientific publications are an important feature in the scholarly communication process. This research analyzes funding acknowledgment presence in scientific publications and introduces a novel approach for discovering text ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Computing Surveys Just Accepted
ISSN:0360-0300
EISSN:1557-7341
Table of Contents

Copyright © 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Online AM: 12 April 2024
- Accepted: 2 April 2024
- Revised: 26 January 2024
- Received: 21 March 2023
Check for updates
Author Tags
scientific documents
figure understanding
table understanding
Qualifiers
- survey
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 247
  Total Downloads
- Downloads (Last 12 months)247
- Downloads (Last 6 weeks)247
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

From Detection to Application: Recent Advances in Understanding Scientific Tables and Figures

ACM Computing Surveys

Abstract

References

Cited By

Index Terms

Recommendations

Literature Explorer: effective retrieval of scientific documents through nonparametric thematic topic detection

Understanding persistent scientific collaboration

Acknowledgments in scientific publications: Presence in Spanish science and text patterns across disciplines

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media