Abstract
Tables and figures are usually used to present information in a structured and visual way in scientific documents. Understanding the tables and figures in scientific documents is significant for a series of downstream tasks, such as academic search, scientific knowledge graphs, and so on. Existing studies mainly focus on detecting figures and tables from scientific documents, interpreting their semantics, and integrating them into downstream tasks. However, a systematic and comprehensive literature review on the mining and application of tables and figures in academic papers is still missing. In this article, we introduce the research framework and the whole pipeline for understanding tables and figures, including detection, structural analysis, interpretation, and application. We deliver a thorough analysis of benchmark datasets, recent techniques, and their pros and cons. Additionally, a quantitative analysis of the effectiveness of different models on popular benchmarks is presented. We further outline several important applications that exploit the semantics of scientific tables and figures. Finally, we highlight the challenges and some potential directions for future research. We believe this is the first comprehensive survey in understanding scientific tables and figures that covers the landscape from detection to application.
- Abdelrahman Abdallah, Alexander Berendeyev, Islam Nuradin, and Daniyar Nurseitov. 2022. Tncr: Table net detection and classification dataset. Neurocomputing 473(2022), 79–97.Google ScholarDigital Library
- Madhav Agarwal, Ajoy Mondal, and C. V. Jawahar. 2021. CDeC-Net: Composite Deformable Cascade Network for Table Detection in Document Images. In 2020 25th International Conference on Pattern Recognition (ICPR). 9491–9498. https://doi.org/10.1109/ICPR48806.2021.9411922Google ScholarCross Ref
- Shashank Agarwal and Hong Yu. 2009. FigSum: Automatically Generating Structured Text Summaries for Figures in Biomedical Literature. AMIA Annual Symposium Proceedings 2009 (2009), 6–10.Google Scholar
- Md. Ajij, Sanjoy Pratihar, Diptendu Sinha Roy, and Thomas Hanne. 2022. Robust Detection of Tables in Documents Using Scores from Table Cell Cores. SN Computer Science 3, 2 (March 2022), 161. https://doi.org/10.1007/s42979-022-01041-zGoogle ScholarDigital Library
- Ceyhun Burak Akgul, Daniel L. Rubin, Sandy Napel, Christopher F. Beaulieu, Hayit Greenspan, and Burak Acar. 2011. Content-Based Image Retrieval in Radiology: Current Status and Future Directions. Journal of Digital Imaging 24, 2 (Jan. 2011), 208–222. https://doi.org/10.1007/s10278-010-9290-9Google ScholarCross Ref
- Rabah A. Al-Zaidy and C. Lee Giles. 2015. Automatic Extraction of Data from Bar Charts. (Oct. 2015), 30. https://doi.org/10.1145/2815833.2816956Google ScholarDigital Library
- Sameer Antani, L Rodney Long, and George R Thoma. 2004. Content-based image retrieval for large biomedical image archives. In MEDINFO 2004. IOS Press, 829–833.Google Scholar
- Brendan Artley. 2023. GenPlot: Increasing the Scale and Diversity of Chart Derendering Data. arXiv preprint arXiv:2306.11699(2023).Google Scholar
- Sören Auer, Viktor Kovtun, Manuel Prinz, Anna Kasprzik, Anna Kasprzik, Markus Stocker, Maria-Esther Vidal, and Maria-Esther Vidal. 2018. Towards a Knowledge Graph for Science. (June 2018), 1. https://doi.org/10.1145/3227609.3227689Google ScholarDigital Library
- Filip Bajić and Josip Job. 2023. Review of chart image detection and classification. International Journal on Document Analysis and Recognition (IJDAR) (2023), 1–22.Google Scholar
- Iz Beltagy, Kyle Lo, and Arman Cohan. 2019. SciBERT: A Pretrained Language Model for Scientific Text. https://doi.org/10.48550/arXiv.1903.10676 arxiv:1903.10676 [cs]Google ScholarCross Ref
- Sumit Bhatia and Prasenjit Mitra. 2012. Summarizing Figures, Tables, and Algorithms in Scientific Publications to Augment Search Results. ACM Transactions on Information Systems 30, 1 (March 2012), 3:1–3:24. https://doi.org/10.1145/2094072.2094075Google ScholarDigital Library
- Jwalin Bhatt, Khurram Azeem Hashmi, Muhammad Zeshan Afzal, and Didier Stricker. 2021. A survey of graphical page object detection with deep neural networks. Applied Sciences 11, 12 (2021), 5344.Google ScholarCross Ref
- Galal M. Binmakhashen and Sabri A. Mahmoud. 2019. Document Layout Analysis: A Comprehensive Survey. Comput. Surveys 52, 6 (Oct. 2019), 109:1–109:36. https://doi.org/10.1145/3355610Google ScholarDigital Library
- Sanket Biswas, Ayan Banerjee, Josep Lladós, and Umapada Pal. 2022. DocSegTr: an instance-level end-to-end document image segmentation transformer. arXiv preprint arXiv:2201.11438(2022).Google Scholar
- Joseph P. Bockhorst, John M. Conroy, Shashank Agarwal, Dianne P. O’Leary, and Hong Yu. 2012. Beyond Captions: Linking Figures with Abstract Sentences in Biomedical Articles. PLoS ONE 7, 7 (July 2012), e39618. https://doi.org/10.1371/journal.pone.0039618Google ScholarCross Ref
- Sandra Carberry, Stephanie Elzer, Nancy Green, Kathleen F. McCoy, and Daniel Chester. 2004. Extending Document Summarization to Information Graphics. In Text Summarization Branches Out. 3–9.Google Scholar
- Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-End Object Detection with Transformers. In Computer Vision – ECCV 2020, Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Vol. 12346. Springer International Publishing, Cham, 213–229. https://doi.org/10.1007/978-3-030-58452-8_13Google ScholarDigital Library
- Shuaichen Chang, David Palzer, Jialin Li, Eric Fosler-Lussier, and Ningchuan Xiao. 2022. MapQA: A dataset for question answering on choropleth maps. arXiv preprint arXiv:2211.08545(2022).Google Scholar
- Ritwick Chaudhry, Sumit Shekhar, Utkarsh Gupta, Pranav Maneriker, Prann Bansal, and Ajay Joshi. 2020. LEAF-QA: Locate, Encode & Attend for Figure Question Answering. In 2020 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, Snowmass Village, CO, USA, 3501–3510. https://doi.org/10.1109/WACV45572.2020.9093269Google ScholarCross Ref
- Jian Chen, Meng Ling, Rui Li, Petra Isenberg, Tobias Isenberg, Michael Sedlmair, Torsten Möller, Robert S. Laramee, Han-Wei Shen, Katharina Wünsche, and Qiru Wang. 2021. VIS30K: A Collection of Figures and Tables From IEEE Visualization Conference Publications. IEEE Transactions on Visualization and Computer Graphics 27, 9(Sept. 2021), 3826–3833. https://doi.org/10.1109/TVCG.2021.3054916Google ScholarDigital Library
- Wenhu Chen, Hongmin Wang, Jianshu Chen, Yunkai Zhang, Hong Wang, Shiyang Li, Xiyou Zhou, and William Yang Wang. 2020. TabFact: A Large-Scale Dataset for Table-Based Fact Verification. https://doi.org/10.48550/arXiv.1909.02164 arxiv:1909.02164 [cs]Google ScholarCross Ref
- Xi Chen, Wei Zeng, Yanna Lin, Hayder Mahdi AI-maneea, Jonathan Roberts, and Remco Chang. 2021. Composition and Configuration Patterns in Multiple-View Visualizations. IEEE Transactions on Visualization and Computer Graphics 27, 2(Feb. 2021), 1514–1524. https://doi.org/10.1109/TVCG.2020.3030338Google ScholarCross Ref
- Zhe Chen, Michael Cafarella, and Eytan Adar. 2015. DiagramFlyer: A Search Engine for Data-Driven Diagrams. (May 2015), 183–186. https://doi.org/10.1145/2740908.2742831Google ScholarDigital Library
- Beibei Cheng, Sameer Antani, R. Joe Stanley, and George R. Thoma. 2011. Automatic Segmentation of Subfigure Image Panels for Multimodal Biomedical Document Retrieval. 7874 (Jan. 2011), 294–304. https://doi.org/10.1117/12.873685Google ScholarCross Ref
- Zewen Chi, Heyan Huang, Heng-Da Xu, Houjin Yu, Wanxuan Yin, and Xian-Ling Mao. 2019. Complicated Table Structure Recognition. (Aug. 2019). https://doi.org/10.48550/arXiv.1908.04729Google ScholarCross Ref
- Sagnik Ray Choudhury, Prasenjit Mitra, Andi Kirk, Silvia Szep, Donald Pellegrino, Sue Jones, and C. Lee Giles. 2013. Figure Metadata Extraction from Digital Documents. In 2013 12th International Conference on Document Analysis and Recognition. 135–139. https://doi.org/10.1109/ICDAR.2013.34Google ScholarDigital Library
- Sagnik Ray Choudhury, Suppawong Tuarob, Prasenjit Mitra, Lior Rokach, Andi Kirk, Silvia Szep, Donald Pellegrino, Sue Jones, and C.L. Giles. 2013. A Figure Search Engine Architecture for a Chemistry Digital Library. (July 2013), 369–370. https://doi.org/10.1145/2467696.2467757Google ScholarDigital Library
- Arnab Ghosh Chowdhury, Martin ben Ahmed, and Martin Atzmueller. 2022. Towards Tabular Data Extraction From Richly-Structured Documents Using Supervised and Weakly-Supervised Learning. In 2022 IEEE 27th International Conference on Emerging Technologies and Factory Automation (ETFA). IEEE, 1–4.Google Scholar
- Christopher Clark and Santosh Divvala. 2016. PDFFigures 2.0: Mining Figures from Research Papers. In 2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL). 143–152.Google Scholar
- Christopher Clark and Santosh K. Divvala. 2015. Looking Beyond Text: Extracting Figures, Tables and Captions from Computer Science Papers.(April 2015).Google Scholar
- Mathieu Cliche, David Rosenberg, Dhruv Madeka, and Connie Yee. 2017. Scatteract: Automated Extraction of Data from Scatter Plots. Vol. 10534. 135–150. https://doi.org/10.1007/978-3-319-71249-9_9 arxiv:1704.06687 [cs, stat]Google ScholarCross Ref
- Wenjing Dai, Meng Wang, Zhibin Niu, and Jiawan Zhang. 2018. Chart Decoder: Generating Textual and Numeric Information from Chart Images Automatically. Journal of Visual Languages & Computing 48 (Oct. 2018), 101–109. https://doi.org/10.1016/j.jvlc.2018.08.005Google ScholarCross Ref
- Kenny Davila, Bhargava Urala Kota, Srirangaraj Setlur, Venu Govindaraju, Christopher Tensmeyer, Sumit Shekhar, and Ritwick Chaudhry. 2019. ICDAR 2019 Competition on Harvesting Raw Tables from Infographics (CHART-Infographics). In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, Sydney, Australia, 1594–1599. https://doi.org/10.1109/ICDAR.2019.00203Google ScholarCross Ref
- Kenny Davila, Srirangaraj Setlur, David Doermann, Bhargava Urala Kota, and Venu Govindaraju. 2020. Chart mining: A survey of methods for automated chart analysis. IEEE transactions on pattern analysis and machine intelligence 43, 11(2020), 3799–3819.Google Scholar
- Kenny Davila, Chris Tensmeyer, Sumit Shekhar, Hrituraj Singh, Srirangaraj Setlur, and Venu Govindaraju. 2021. Icpr 2020-competition on harvesting raw tables from infographics. In International Conference on Pattern Recognition. Springer, 361–380.Google ScholarDigital Library
- Kenny Davila, Fei Xu, Saleem Ahmed, David A Mendoza, Srirangaraj Setlur, and Venu Govindaraju. 2022. ICPR 2022: Challenge on Harvesting Raw Tables from Infographics (CHART-Infographics). In 2022 26th International Conference on Pattern Recognition (ICPR). IEEE, 4995–5001.Google Scholar
- Dina Demner-Fushman, Sameer Antani, and George R. Thoma. 2007. Automatically Finding Images for Clinical Decision Support. In Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007). 139–144. https://doi.org/10.1109/ICDMW.2007.12Google ScholarDigital Library
- Dazhen Deng, Yihong Wu, Xinhuan Shu, Jiang Wu, Siwei Fu, Weiwei Cui, and Yingcai Wu. 2022. VisImages: A Fine-Grained Expert-Annotated Visualization Dataset. IEEE Transactions on Visualization and Computer Graphics (2022), 1–1. https://doi.org/10.1109/TVCG.2022.3155440Google ScholarDigital Library
- Yuntian Deng, Anssi Kanervisto, and Alexander Rush. 2016. What You Get Is What You See: A Visual Markup Decompiler. (Sept. 2016).Google Scholar
- Yuntian Deng, David Rosenberg, and Gideon Mann. 2019. Challenges in End-to-End Neural Scientific Table Recognition. In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, Sydney, Australia, 894–901. https://doi.org/10.1109/ICDAR.2019.00148Google ScholarCross Ref
- Harsh Desai, Pratik Kayal, and Mayank Singh. 2021. TabLeX: A Benchmark Dataset for Structure and Content Information Extraction from Scientific Tables. In Document Analysis and Recognition – ICDAR 2021, Josep Lladós, Daniel Lopresti, and Seiichi Uchida (Eds.). Vol. 12822. Springer International Publishing, Cham, 554–569. https://doi.org/10.1007/978-3-030-86331-9_36Google ScholarDigital Library
- Siqi Du, Shengjun Tang, Weixi Wang, Xiaoming Li, and Renzhong Guo. 2023. Tree-GPT: Modular Large Language Model Expert System for Forest Remote Sensing Image Understanding and Interactive Analysis. https://doi.org/10.48550/arXiv.2310.04698 arxiv:2310.04698 [cs]Google ScholarCross Ref
- David W. Embley, Matthew Hurst, Daniel Lopresti, and George Nagy. 2006. Table-Processing Paradigms: A Research Survey. International Journal of Document Analysis and Recognition (IJDAR) 8, 2-3(June 2006), 66–86. https://doi.org/10.1007/s10032-006-0017-xGoogle ScholarCross Ref
- Sedigheh Eslami, Gerard de Melo, and Christoph Meinel. 2021. Does CLIP Benefit Visual Question Answering in the Medical Domain as Much as It Does in the General Domain? https://doi.org/10.48550/arXiv.2112.13906 arxiv:2112.13906 [cs]Google ScholarCross Ref
- Keyur Faldu, Amit Sheth, Prashant Kikani, and Hemang Akbari. 2021. KI-BERT: Infusing Knowledge Context for Better Language and Domain Understanding. https://doi.org/10.48550/arXiv.2104.08145 arxiv:2104.08145 [cs]Google ScholarCross Ref
- Ali Mazraeh Farahani, Peyman Adibi, Alireza Darvishy, Mohammad Saeed Ehsani, and Hans-Peter Hutter. 2023. Automatic chart understanding: a review. IEEE Access (2023).Google Scholar
- Said Fathalla, Sahar Vahdati, Sören Auer, Christoph Lange, Christoph Lange, and Christoph Lange. 2017. Towards a Knowledge Graph Representing Research Findings by Semantifying Survey Articles. (Sept. 2017), 315–327. https://doi.org/10.1007/978-3-319-67008-9_25Google ScholarCross Ref
- Jinglun Gao, Yin Zhou, and Kenneth E. Barner. 2012. View: Visual Information Extraction Widget for Improving Chart Images Accessibility. In 2012 19th IEEE International Conference on Image Processing. IEEE, 2865–2868.Google Scholar
- Liangcai Gao, Yilun Huang, Hervé Déjean, Jean-Luc Meunier, Qinqin Yan, Yu Fang, Florian Kleber, and Eva Lang. 2019. ICDAR 2019 competition on table detection and recognition (cTDaR). In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, 1510–1515.Google Scholar
- Andrea Gemelli, Emanuele Vivoli, and Simone Marinai. 2022. Graph Neural Networks and Representation Embedding for Table Extraction in PDF Documents. https://doi.org/10.48550/arXiv.2208.11203 arxiv:2208.11203 [cs]Google ScholarCross Ref
- Azka Gilani, Shah Rukh Qasim, Imran Malik, and Faisal Shafait. 2017. Table Detection Using Deep Learning. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). IEEE, Kyoto, 771–776. https://doi.org/10.1109/ICDAR.2017.131Google ScholarCross Ref
- Max Göbel, Tamir Hassan, Ermelinda Oro, and Giorgio Orsi. 2013. ICDAR 2013 table competition. In 2013 12th International Conference on Document Analysis and Recognition. IEEE, 1449–1453.Google Scholar
- Zengyuan Guo, Yuechen Yu, Pengyuan Lv, Chengquan Zhang, Haojie Li, Zhihui Wang, Kun Yao, Jingtuo Liu, and Jingdong Wang. 2022. TRUST: An Accurate and End-to-End Table Structure Recognizer Using Splitting-Based Transformers. arxiv:2208.14687 [cs]Google Scholar
- Yucheng Han, Chi Zhang, Xin Chen, Xu Yang, Zhibin Wang, Gang Yu, Bin Fu, and Hanwang Zhang. 2023. ChartLlama: A Multimodal LLM for Chart Understanding and Generation. https://doi.org/10.48550/arXiv.2311.16483 arxiv:2311.16483 [cs]Google ScholarCross Ref
- Khurram Azeem Hashmi, Marcus Liwicki, Didier Stricker, Muhammad Adnan Afzal, Muhammad Ahtsham Afzal, and Muhammad Zeshan Afzal. 2021. Current Status and Performance Analysis of Table Recognition in Document Images with Deep Neural Networks. arXiv:2104.14272 [cs] (May 2021). arxiv:2104.14272 [cs]Google Scholar
- Khurram Azeem Hashmi, Alain Pagani, Marcus Liwicki, Didier Stricker, and Muhammad Zeshan Afzal. 2021. CasTabDetectoRS: Cascade Network for Table Detection in Document Images with Recursive Feature Pyramid and Switchable Atrous Convolution. Journal of Imaging 7, 10 (Oct. 2021), 214. https://doi.org/10.3390/jimaging7100214Google ScholarCross Ref
- Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision. 2961–2969.Google ScholarCross Ref
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.Google ScholarCross Ref
- Yelin He, Xianbiao Qi, Jiaquan Ye, Peng Gao, Yihao Chen, Bingcong Li, Xin Tang, and Rong Xiao. 2021. PingAn-VCGroup’s Solution for ICDAR 2021 Competition on Scientific Table Image Recognition to Latex. arXiv preprint arXiv:2105.01846(2021).Google Scholar
- Yingxu He and Qiqi Sun. 2023. Towards Automatic Satellite Images Captions Generation Using Large Language Models. https://arxiv.org/abs/2310.11392v1.Google Scholar
- Nidhi Hegde, Sujoy Paul, Gagan Madan, and Gaurav Aggarwal. 2023. Analyzing the Efficacy of an LLM-Only Approach for Image-Based Document Question Answering. https://arxiv.org/abs/2309.14389v1.Google Scholar
- William R. Hersh, Henning Müller, and Jayashree Kalpathy-Cramer. 2009. The ImageCLEFmed Medical Image Retrieval Task Test Collection. Journal of Digital Imaging 22, 6 (Dec. 2009), 648–655. https://doi.org/10.1007/s10278-008-9154-8Google ScholarCross Ref
- Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno, and Julian Martin Eisenschlos. 2020. TAPAS: Weakly Supervised Table Parsing via Pre-Training. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 4320–4333. https://doi.org/10.18653/v1/2020.acl-main.398 arxiv:2004.02349 [cs]Google ScholarCross Ref
- Anwen Hu, Yaya Shi, Haiyang Xu, Jiabo Ye, Qinghao Ye, Ming Yan, Chenliang Li, Qi Qian, Ji Zhang, and Fei Huang. 2023. mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model. https://doi.org/10.48550/arXiv.2311.18248 arxiv:2311.18248 [cs]Google ScholarCross Ref
- Kung-Hsiang Huang, Mingyang Zhou, Hou Pong Chan, Yi R. Fung, Zhenhailong Wang, Lingyu Zhang, Shih-Fu Chang, and Heng Ji. 2023. Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning. https://doi.org/10.48550/arXiv.2312.10160 arxiv:2312.10160 [cs]Google ScholarCross Ref
- Yongshuai Huang, Ning Lu, Dapeng Chen, Yibo Li, Zecheng Xie, Shenggao Zhu, Liangcai Gao, and Wei Peng. 2023. Improving Table Structure Recognition with Visual-Alignment Sequential Coordinate Modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11134–11143.Google Scholar
- Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, and Furu Wei. 2022. LayoutLMv3: Pre-Training for Document AI with Unified Text and Image Masking. https://doi.org/10.48550/arXiv.2204.08387 arxiv:2204.08387 [cs]Google ScholarCross Ref
- Yilun Huang, Qinqin Yan, Yibo Li, Yifan Chen, Xiong Wang, Liangcai Gao, and Zhi Tang. 2019. A YOLO-Based Table Detection Method. In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, Sydney, Australia, 813–818. https://doi.org/10.1109/ICDAR.2019.00135Google ScholarCross Ref
- Matthew Hurst. 2001. Layout and language: Challenges for table understanding on the web. In Proceedings of the International Workshop on Web Document Analysis. 27–30.Google Scholar
- Matthew Francis Hurst. 2000. The interpretation of tables in texts. Ph. D. Dissertation. University of Edinburgh.Google Scholar
- Mohamad Yaser Jaradeh, Allard Oelen, Kheir Eddine Farfar, Kheir Eddine Farfar, Manuel Prinz, Jennifer D’Souza, Jennifer D’Souza, Gábor Kismihók, Gábor Kismihók, Markus Stocker, and Sören Auer. 2019. Open Research Knowledge Graph: Next Generation Infrastructure for Semantic Scholarly Knowledge. (Sept. 2019), 243–246. https://doi.org/10.1145/3360901.3364435Google ScholarDigital Library
- Aditya Jindal, Ankur Gupta, Jaya Srivastava, Preeti Menghwani, Vijit Malik, Vishesh Kaushik, and Ashutosh Modi. 2021. BreakingBERT@IITK at SemEval-2021 Task 9 : Statement Verification and Evidence Finding with Tables. arxiv:2104.03071 [cs]Google Scholar
- Daekyoung Jung, Wonjae Kim, Hyunjoo Song, Jeong-in Hwang, Bongshin Lee, Bohyoung Kim, and Jinwook Seo. 2017. ChartSense: Interactive Data Extraction from Chart Images. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI ’17). Association for Computing Machinery, New York, NY, USA, 6706–6717. https://doi.org/10.1145/3025453.3025957Google ScholarDigital Library
- Kushal Kafle, Brian Price, Scott Cohen, and Christopher Kanan. 2018. DVQA: Understanding Data Visualizations via Question Answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5648–5656.Google ScholarCross Ref
- Charles E. Kahn and Cheng Thao. 2007. GoldMiner: A Radiology Image Search Engine. AJR. American journal of roentgenology 188, 6 (June 2007), 1475–1478. https://doi.org/10.2214/AJR.06.1740Google ScholarCross Ref
- Samira Ebrahimi Kahou, Vincent Michalski, Adam Atkinson, Akos Kadar, Adam Trischler, and Yoshua Bengio. 2018. FigureQA: An Annotated Figure Dataset for Visual Reasoning. https://doi.org/10.48550/arXiv.1710.07300 arxiv:1710.07300 [cs]Google ScholarCross Ref
- Sampanna Yashwant Kahu, William A. Ingram, Edward A. Fox, and Jian Wu. 2021. ScanBank: A Benchmark Dataset for Figure Extraction from Scanned Electronic Theses and Dissertations. https://doi.org/10.48550/arXiv.2106.15320 arxiv:2106.15320 [cs]Google ScholarCross Ref
- Amar Viswanathan Kannan, Dmitriy Fradkin, Ioannis Akrotirianakis, Tugba Kulahcioglu, Arquimedes Canedo, Aditi Roy, Shih-Yuan Yu, Malawade Arnav, and Mohammad Abdullah Al Faruque. 2020. Multimodal Knowledge Graph for Deep Learning Papers and Code. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (CIKM ’20). Association for Computing Machinery, New York, NY, USA, 3417–3420. https://doi.org/10.1145/3340531.3417439Google ScholarDigital Library
- Zeba Karishma, Shaurya Rohatgi, Kavya Shrinivas Puranik, Jian Wu, and C. Lee Giles. 2023. ACL-Fig: A Dataset for Scientific Figure Classification. https://doi.org/10.48550/arXiv.2301.12293 arxiv:2301.12293 [cs]Google ScholarCross Ref
- Jerrold J. Katz and Jerry A. Fodor. 1963. The Structure of a Semantic Theory. Language 39, 2 (1963), 170–210. https://doi.org/10.2307/411200Google ScholarCross Ref
- I. Kavasidis, C. Pino, S. Palazzo, F. Rundo, D. Giordano, P. Messina, and C. Spampinato. 2019. A Saliency-Based Convolutional Neural Network for Table and Chart Detection in Digitized Documents. In Image Analysis and Processing – ICIAP 2019 (Lecture Notes in Computer Science), Elisa Ricci, Samuel Rota Bulò, Cees Snoek, Oswald Lanz, Stefano Messelodi, and Nicu Sebe (Eds.). Springer International Publishing, Cham, 292–302. https://doi.org/10.1007/978-3-030-30645-8_27Google ScholarDigital Library
- Pratik Kayal, Mrinal Anand, Harsh Desai, and Mayank Singh. 2021. ICDAR 2021 Competition on Scientific Table Image Recognition to LaTeX. In Document Analysis and Recognition – ICDAR 2021, Josep Lladós, Daniel Lopresti, and Seiichi Uchida (Eds.). Vol. 12824. Springer International Publishing, Cham, 754–766. https://doi.org/10.1007/978-3-030-86337-1_50Google ScholarDigital Library
- Elvis Koci, Maik Thiele, Josephine Rehak, Oscar Romero, and Wolfgang Lehner. 2019. DECO: A dataset of annotated spreadsheets for layout and table recognition. In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, 1280–1285.Google ScholarCross Ref
- Benno Kruit, Hongyu He, and Jacopo Urbani. 2020. Tab2Know: Building a Knowledge Base from Tables in Scientific Papers. In The Semantic Web – ISWC 2020, Jeff Z. Pan, Valentina Tamma, Claudia d’Amato, Krzysztof Janowicz, Bo Fu, Axel Polleres, Oshani Seneviratne, and Lalana Kagal (Eds.). Vol. 12506. Springer International Publishing, Cham, 349–365. https://doi.org/10.1007/978-3-030-62419-4_20Google ScholarDigital Library
- Saar Kuzi, ChengXiang Zhai, Yin Tian, and Haichuan Tang. 2020. FigExplorer: A System for Retrieval and Exploration of Figures from Collections of Research Articles. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’20). Association for Computing Machinery, New York, NY, USA, 2133–2136. https://doi.org/10.1145/3397271.3401400Google ScholarDigital Library
- Jay Lal, Aditya Mitkari, Mahesh Bhosale, and David Doermann. 2023. LineFormer: Line Chart Data Extraction Using Instance Segmentation. In International Conference on Document Analysis and Recognition. Springer, 387–400.Google Scholar
- Po-Shen Lee and Bill Howe. 2015. Detecting and Dismantling Composite Visualizations in the Scientific Literature. (Jan. 2015), 247–266. https://doi.org/10.1007/978-3-319-27677-9_16Google ScholarCross Ref
- Po-Shen Lee, Jevin D. West, and Bill Howe. 2018. Viziometrics: Analyzing Visual Information in the Scientific Literature. IEEE Transactions on Big Data 4, 1 (March 2018), 117–129. https://doi.org/10.1109/TBDATA.2017.2689038Google ScholarCross Ref
- Suhyeon Lee, Won Jun Kim, Jinho Chang, and Jong Chul Ye. 2023. LLM-CXR: Instruction-Finetuned LLM for CXR Image Understanding and Generation. https://arxiv.org/abs/2305.11490v4.Google Scholar
- Shih-Hsiung Lee and Hung-Chun Chen. 2021. U-SSD: Improved SSD Based on U-Net Architecture for End-to-End Table Detection in Document Images. Applied Sciences 11, 23 (Jan. 2021), 11446. https://doi.org/10.3390/app112311446Google ScholarCross Ref
- Sheng Long Lee, Mohammad Reza Zare, and Mohammad Reza Zare. 2018. Biomedical Compound Figure Detection Using Deep Learning and Fusion Techniques. Iet Image Processing 12, 6 (Jan. 2018), 1031–1037. https://doi.org/10.1049/iet-ipr.2017.0800Google ScholarCross Ref
- Chenxia Li, Ruoyu Guo, Jun Zhou, Mengtao An, Yuning Du, Lingfeng Zhu, Yi Liu, Xiaoguang Hu, and Dianhai Yu. 2022. PP-StructureV2: A Stronger Document Analysis System. arxiv:2210.05391 [cs]Google Scholar
- Huichao Li, Lingze Zeng, Weiyu Zhang, Jianing Zhang, Ju Fan, and Meihui Zhang. 2022. A Two-Phase Approach for Recognizing Tables with Complex Structures. In Database Systems for Advanced Applications, Arnab Bhattacharya, Janice Lee Mong Li, Divyakant Agrawal, P. Krishna Reddy, Mukesh Mohania, Anirban Mondal, Vikram Goyal, and Rage Uday Kiran (Eds.). Vol. 13245. Springer International Publishing, Cham, 587–595. https://doi.org/10.1007/978-3-031-00123-9_47Google ScholarDigital Library
- Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, and Furu Wei. 2022. DiT: Self-Supervised Pre-Training for Document Image Transformer. https://doi.org/10.48550/arXiv.2203.02378 arxiv:2203.02378 [cs]Google ScholarCross Ref
- Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou, and Zhoujun Li. 2019. Tablebank: A Benchmark Dataset for Table Detection and Recognition. arXiv preprint arXiv:1903.01949(2019). arxiv:1903.01949Google Scholar
- Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou, and Zhoujun Li. 2020. Tablebank: Table benchmark for image-based table detection and recognition. In Proceedings of The 12th language resources and evaluation conference. 1918–1925.Google Scholar
- Minghao Li, Yiheng Xu, Lei Cui, Shaohan Huang, Furu Wei, Zhoujun Li, and Ming Zhou. 2020. DocBank: A Benchmark Dataset for Document Layout Analysis. https://doi.org/10.48550/arXiv.2006.01038 arxiv:2006.01038 [cs]Google ScholarCross Ref
- Xiao-Hui Li. 2022. Table Structure Recognition and Form Parsing by End-to-End Object Detection and Relation Parsing. Pattern Recognition (2022).Google Scholar
- Weihong Lin. 2022. TSRFormer: Table Structure Recognition with Transformers. (2022).Google Scholar
- Fuxiao Liu, Xiaoyang Wang, Wenlin Yao, Jianshu Chen, Kaiqiang Song, Sangwoo Cho, Yaser Yacoob, and Dong Yu. 2023. MMC: Advancing Multimodal Chart Understanding with Large-Scale Instruction Tuning. https://doi.org/10.48550/arXiv.2311.10774 arxiv:2311.10774 [cs]Google ScholarCross Ref
- Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. 2023. Visual instruction tuning. arXiv preprint arXiv:2304.08485(2023).Google Scholar
- Hao Liu, Xin Li, Bing Liu, Deqiang Jiang, Yinsong Liu, and Bo Ren. 2022. Neural Collaborative Graph Machines for Table Structure Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4533–4542.Google Scholar
- Hao Liu, Xin Li, Bing Liu, Deqiang Jiang, Yinsong Liu, and Bo Ren. 2022. Neural Collaborative Graph Machines for Table Structure Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4533–4542.Google Scholar
- Hao Liu, Xin Li, Bing Liu, Deqiang Jiang, Yinsong Liu, Bo Ren, and Rongrong Ji. 2021. Show, Read and Reason: Table Structure Recognition with Flexible Context Aggregator. In Proceedings of the 29th ACM International Conference on Multimedia (MM ’21). Association for Computing Machinery, New York, NY, USA, 1084–1092. https://doi.org/10.1145/3474085.3481534Google ScholarDigital Library
- Jixiong Liu, Yoan Chabot, Raphaël Troncy, Viet-Phi Huynh, Thomas Labbé, and Pierre Monnin. 2023. From tabular data to knowledge graphs: A survey of semantic table interpretation tasks and methods. Journal of Web Semantics 76 (2023), 100761.Google ScholarDigital Library
- Ying Liu, Kun Bai, Prasenjit Mitra, and C. Lee Giles. 2007. TableSeer: Automatic Table Metadata Extraction and Searching in Digital Libraries. In Proceedings of the 2007 Conference on Digital Libraries - JCDL ’07. ACM Press, Vancouver, BC, Canada, 91. https://doi.org/10.1145/1255175.1255193Google ScholarDigital Library
- Ying Liu, Kun Bai, Prasenjit Mitra, C Lee Giles, et al. 2007. Tablerank: A ranking algorithm for table search and retrieval. In Proceedings of the National Conference on Artificial Intelligence, Vol. 22. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999, 317.Google Scholar
- Yan Liu, Xiaoqing Lu, Yeyang Qin, Zhi Tang, and Jianbo Xu. 2013. Review of Chart Recognition in Document Images. In Visualization and Data Analysis 2013, Vol. 8654. SPIE, 384–391. https://doi.org/10.1117/12.2008467Google ScholarCross Ref
- Yingli Liu, Changkai Si, Kai Jin, Tao Shen, and Meng Hu. 2021. FCENet: An Instance Segmentation Model for Extracting Figures and Captions From Material Documents. IEEE Access 9(2021), 551–564. 3.367 https://doi.org/10.1109/ACCESS.2020.3046496Google ScholarCross Ref
- Rujiao Long, Wen Wang, Nan Xue, Feiyu Gao, Zhibo Yang, Yongpan Wang, and Gui-Song Xia. 2021. Parsing Table Structures in the Wild. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 944–952.Google Scholar
- Luis D. Lopez, Jingyi Yu, Cecilia N. Arighi, Hongzhan Huang, Hagit Shatkay, and Cathy Wu. 2011. An Automatic System for Extracting Figures and Captions in Biomedical PDF Documents. In 2011 IEEE International Conference on Bioinformatics and Biomedicine. 578–581. https://doi.org/10.1109/BIBM.2011.26Google ScholarDigital Library
- Daniel Lopresti and George Nagy. 2000. A Tabular Survey of Automated Table Processing. In Graphics Recognition Recent Advances, Gerhard Goos, Juris Hartmanis, Jan van Leeuwen, Atul K. Chhabra, and Dov Dori (Eds.). Vol. 1941. Springer Berlin Heidelberg, Berlin, Heidelberg, 93–120. https://doi.org/10.1007/3-540-40953-X_9Google ScholarCross Ref
- Ning Lu, Wenwen Yu, Xianbiao Qi, Yihao Chen, Ping Gong, Rong Xiao, and Xiang Bai. 2021. MASTER: Multi-Aspect Non-Local Network for Scene Text Recognition. Pattern Recognition 117(Sept. 2021), 107980. https://doi.org/10.1016/j.patcog.2021.107980Google ScholarCross Ref
- Yi Luan, Luheng He, Mari Ostendorf, and Hannaneh Hajishirzi. 2018. Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 3219–3232. https://doi.org/10.18653/v1/D18-1360Google ScholarCross Ref
- Junyu Luo, Zekun Li, Jinpeng Wang, and Chin-Yew Lin. 2021. Chartocr: Data extraction from charts images via a deep hybrid framework. In Proceedings of the IEEE/CVF winter conference on applications of computer vision. 1917–1925.Google ScholarCross Ref
- Nam Tuan Ly and Atsuhiro Takasu. 2023. An End-to-End Local Attention Based Model for Table Recognition. In International Conference on Document Analysis and Recognition. Springer, 20–36.Google Scholar
- Nam Tuan Ly, Atsuhiro Takasu, Phuc Nguyen, and Hideaki Takeda. 2023. Rethinking Image-based Table Recognition Using Weakly Supervised Methods. arXiv preprint arXiv:2303.07641(2023).Google Scholar
- Pengyuan Lyu, Weihong Ma, Hongyi Wang, Yuechen Yu, Chengquan Zhang, Kun Yao, Yang Xue, and Jingdong Wang. 2023. GridFormer: Towards Accurate Table Structure Recognition via Grid Prediction. In Proceedings of the 31st ACM International Conference on Multimedia. 7747–7757.Google ScholarDigital Library
- Chixiang Ma, Weihong Lin, Lei Sun, and Qiang Huo. 2023. Robust Table Detection and Structure Recognition from Heterogeneous Document Images. Pattern Recognition 133(Jan. 2023), 109006. https://doi.org/10.1016/j.patcog.2022.109006Google ScholarDigital Library
- Paula Maddigan and Teo Susnjak. 2023. Chat2VIS: Generating Data Visualizations via Natural Language Using ChatGPT, Codex and GPT-3 Large Language Models. IEEE Access 11(2023), 45181–45193. https://doi.org/10.1109/ACCESS.2023.3274199Google ScholarCross Ref
- Ahmed Masry, Do Xuan Long, Jia Qing Tan, Shafiq Joty, and Enamul Hoque. 2022. ChartQA: A benchmark for question answering about charts with visual and logical reasoning. arXiv preprint arXiv:2203.10244(2022).Google Scholar
- Ahmed Masry and Enamul Hoque Prince. 2021. Integrating Image Data Extraction and Table Parsing Methods for Chart Question Answering. (2021), 5.Google Scholar
- Mark E. Mattie, Lawrence Staib, Eric Stratmann, Hemant D. Tagare, James Duncan, and Perry L. Miller. 2000. PathMaster: Content-Based Cell Image Retrieval Using Automated Feature Extraction. Journal of the American Medical Informatics Association 7, 4 (July 2000), 404–415. https://doi.org/10.1136/jamia.2000.0070404Google ScholarCross Ref
- Fanqing Meng, Wenqi Shao, Quanfeng Lu, Peng Gao, Kaipeng Zhang, Yu Qiao, and Ping Luo. 2024. ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-Training and Multitask Instruction Tuning. https://doi.org/10.48550/arXiv.2401.02384 arxiv:2401.02384 [cs]Google ScholarCross Ref
- Nitesh Methani, Pritha Ganguly, Mitesh M. Khapra, and Pratyush Kumar. 2020. PlotQA: Reasoning over Scientific Plots. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 1527–1536.Google ScholarCross Ref
- Nikola Milosevic, Cassie Gregson, Robert Hernandez, and Goran Nenadic. 2019. A framework for information extraction from tables in biomedical literature. International Journal on Document Analysis and Recognition (IJDAR) 22 (2019), 55–78.Google ScholarDigital Library
- Ales Mishchenko and Natalia Vassilieva. 2011. Chart image understanding and numerical data extraction. In 2011 Sixth International Conference on Digital Information Management. IEEE, 115–120.Google ScholarCross Ref
- Prerna Mishra, Santosh Kumar, and Mithilesh Kumar Chaube. 2022. Evaginating Scientific Charts: Recovering Direct and Derived Information Encodings from Chart Images. Journal of Visualization 25, 2 (April 2022), 343–359. https://doi.org/10.1007/s12650-021-00800-zGoogle ScholarDigital Library
- Ajoy Mondal, Peter Lipps, and CV Jawahar. 2020. IIIT-AR-13K: A new dataset for graphical object detection in documents. In Document Analysis Systems: 14th IAPR International Workshop, DAS 2020, Wuhan, China, July 26–29, 2020, Proceedings 14. Springer, 216–230.Google ScholarCross Ref
- Henning Müller, Nicolas Michoux, Nicolas Michoux, Nicolas Michoux, David Bandon, David Bandon, David Bandon, and Antoine Geissbuhler. 2004. A Review of Content-Based Image Retrieval Systems in Medical Applications—Clinical Benefits and Future Directions. International Journal of Medical Informatics 73, 1 (Feb. 2004), 1–23. https://doi.org/10.1016/j.ijmedinf.2003.11.024Google ScholarCross Ref
- Rathin Radhakrishnan Nair, Nishant Sankaran, Ifeoma Nwogu, and Venu Govindaraju. 2016. Understanding Line Plots Using Bayesian Network. In 2016 12th IAPR Workshop on Document Analysis Systems (DAS). IEEE, 108–113.Google Scholar
- Marcin Namysł, Alexander M Esser, Sven Behnke, and Joachim Köhler. 2023. Flexible Hybrid Table Recognition and Semantic Interpretation System. SN Computer Science 4, 3 (2023), 246.Google ScholarDigital Library
- Ahmed Nassar, Nikolaos Livathinos, Maksym Lysak, and Peter Staar. 2022. TableFormer: Table Structure Understanding With Transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4614–4623.Google ScholarCross Ref
- Danish Nazir, Khurram Azeem Hashmi, Alain Pagani, Marcus Liwicki, Didier Stricker, and Muhammad Zeshan Afzal. 2021. HybridTabNet: Towards Better Table Detection in Scanned Document Images. Applied Sciences 11, 18 (Jan. 2021), 8396. https://doi.org/10.3390/app11188396Google ScholarCross Ref
- Allard Oelen, Markus Stocker, and Sören Auer. 2020. Creating a Scholarly Knowledge Graph from Survey Article Tables. In Digital Libraries at Times of Massive Societal Transition, Emi Ishita, Natalie Lee San Pang, and Lihong Zhou (Eds.). Springer International Publishing, Cham, 373–389. https://doi.org/10.1007/978-3-030-64452-9_35Google ScholarDigital Library
- Kemal Oksuz, Baris Can Cam, Emre Akbas, and Sinan Kalkan. 2018. Localization Recall Precision (LRP): A New Performance Metric for Object Detection. In Computer Vision – ECCV 2018, Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Vol. 11211. Springer International Publishing, Cham, 521–537. https://doi.org/10.1007/978-3-030-01234-2_31Google ScholarDigital Library
- Rafael Padilla, Sergio L. Netto, and Eduardo A. B. da Silva. 2020. A Survey on Performance Metrics for Object-Detection Algorithms. In 2020 International Conference on Systems, Signals and Image Processing (IWSSIP). 237–242. https://doi.org/10.1109/IWSSIP48289.2020.9145130Google ScholarCross Ref
- Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics. 311–318.Google Scholar
- Hai-Hong Phan. 2021. An Integrated Approach for Table Detection and Structure Recognition. Journal of Research and Development on Information and Communication Technology 2021, 1 (May 2021), 41–50. https://doi.org/10.32913/mic-ict-research.v2021.n1.974Google ScholarCross Ref
- Ihsin Tsaiyun Phillips. 1996. User’s reference manual for the UW english/technical document image database III. UW-III English/technical document image database manual (1996).Google Scholar
- Jorge Poco and Jeffrey Heer. 2017. Reverse-Engineering Visualizations: Recovering Visual Encodings from Chart Images. Computer Graphics Forum 36, 3 (June 2017), 353–363. https://doi.org/10.1111/cgf.13193Google ScholarDigital Library
- Devashish Prasad, Ayan Gadpal, Kshitij Kapadni, Manish Visave, and Kavita Sultanpure. 2020. CascadeTabNet: An Approach for End to End Table Detection and Structure Recognition From Image-Based Documents. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 572–573.Google ScholarCross Ref
- Jay Pujara, Pedro Szekely, Huan Sun, and Muhao Chen. 2021. From Tables to Knowledge: Recent Advances in Table Understanding. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. ACM, Virtual Event Singapore, 4060–4061. https://doi.org/10.1145/3447548.3470809Google ScholarDigital Library
- Shah Rukh Qasim, Jan Kieseler, Yutaro Iiyama, and Maurizio Pierini. 2019. Learning representations of irregular particle-detector geometry with distance-weighted graph networks. The European Physical Journal C 79, 7 (2019), 1–11.Google ScholarCross Ref
- Shah Rukh Qasim, Hassan Mahmood, and Faisal Shafait. 2019. Rethinking Table Recognition Using Graph Neural Networks. https://doi.org/10.48550/arXiv.1905.13391 arxiv:1905.13391 [cs]Google ScholarCross Ref
- Liang Qiao, Zaisheng Li, Zhanzhan Cheng, Peng Zhang, Shiliang Pu, Yi Niu, Wenqi Ren, Wenming Tan, and Fei Wu. 2021. LGPMA: Complicated Table Structure Recognition with Local and Global Pyramid Mask Alignment. In Document Analysis and Recognition – ICDAR 2021, Josep Lladós, Daniel Lopresti, and Seiichi Uchida (Eds.). Vol. 12821. Springer International Publishing, Cham, 99–114. https://doi.org/10.1007/978-3-030-86549-8_7Google ScholarDigital Library
- Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning. PMLR, 8748–8763.Google Scholar
- Sachin Raja, Ajoy Mondal, and C. V. Jawahar. 2020. Table Structure Recognition Using Top-Down and Bottom-Up Cues. In Computer Vision – ECCV 2020, Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Vol. 12373. Springer International Publishing, Cham, 70–86. https://doi.org/10.1007/978-3-030-58604-1_5Google ScholarDigital Library
- Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems 28 (2015).Google Scholar
- Pau Riba, Anjan Dutta, Lutz Goldmann, Alicia Fornés, Oriol Ramos, and Josep Lladós. 2019. Table Detection in Invoice Documents by Graph Neural Networks. In 2019 International Conference on Document Analysis and Recognition (ICDAR). 122–127. https://doi.org/10.1109/ICDAR.2019.00028Google ScholarCross Ref
- Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer, 234–241.Google Scholar
- Ranajit Saha, Ajoy Mondal, and C V Jawahar. 2019. Graphical Object Detection in Document Images. In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, Sydney, Australia, 51–58. https://doi.org/10/gngxg6Google Scholar
- Naveen Saini, Sriparna Saha, Pushpak Bhattacharyya, and Himanshu Tuteja. 2020. Textual Entailment–Based Figure Summarization for Biomedical Articles. ACM Transactions on Multimedia Computing, Communications, and Applications 16, 1s(April 2020), 35:1–35:24. https://doi.org/10.1145/3357334Google ScholarDigital Library
- Robert J. Sandusky, Carol Tenopir, and Margaret M. Casado. 2007. Figure and Table Retrieval from Scholarly Journal Articles: User Needs for Teaching and Research. Proceedings of the American Society for Information Science and Technology 44, 1 (2007), 1–13. https://doi.org/10.1002/meet.1450440390Google ScholarCross Ref
- Sebastian Schreiber, Stefan Agne, Ivo Wolf, Andreas Dengel, and Sheraz Ahmed. 2017. DeepDeSRT: Deep Learning for Detection and Structure Recognition of Tables in Document Images. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Vol. 01. 1162–1167. https://doi.org/10.1109/ICDAR.2017.192Google ScholarCross Ref
- KC Shahira and A Lijiya. 2021. Towards assisting the visually impaired: a review on techniques for decoding the visual data from chart images. IEEE Access 9(2021), 52926–52943.Google ScholarCross Ref
- Xiangyang Shi, Yue Wu, Yue Wu, Yue Wu, Huaigu Cao, Huaigu Cao, Gully A. P. C. Burns, and Prem Natarajan. 2019. Layout-Aware Subfigure Decomposition for Complex Figures in the Biomedical Literature. (May 2019), 1343–1347. https://doi.org/10.1109/icassp.2019.8683824Google ScholarCross Ref
- Shoaib Ahmed Siddiqui, Imran Ali Fateh, Syed Tahseen Raza Rizvi, Andreas Dengel, and Sheraz Ahmed. 2019. DeepTabStR: Deep Learning Based Table Structure Recognition. In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, Sydney, Australia, 1403–1409. https://doi.org/10.1109/ICDAR.2019.00226Google ScholarCross Ref
- Shoaib Ahmed Siddiqui, Muhammad Imran Malik, Stefan Agne, Andreas Dengel, and Sheraz Ahmed. 2018. DeCNT: Deep Deformable CNN for Table Detection. IEEE Access 6(2018), 74151–74161. https://doi.org/10/gf8qz9Google ScholarCross Ref
- Noah Siegel, Zachary Horvitz, Roie Levin, Santosh Divvala, and Ali Farhadi. 2016. FigureSeer: Parsing Result-Figures in Research Papers. In Computer Vision – ECCV 2016, Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Vol. 9911. Springer International Publishing, Cham, 664–680. https://doi.org/10.1007/978-3-319-46478-7_41Google ScholarCross Ref
- Noah Siegel, Nicholas Lourie, Russell Power, and Waleed Ammar. 2018. Extracting Scientific Figures with Distantly Supervised Neural Networks. In Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries (JCDL ’18). Association for Computing Machinery, New York, NY, USA, 223–232. https://doi.org/10.1145/3197026.3197040Google ScholarDigital Library
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556(2014).Google Scholar
- Hrituraj Singh and Sumit Shekhar. 2020. STL-CQA: Structure-Based Transformers with Localization and Encoding for Chart Question Answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 3275–3284. https://doi.org/10.18653/v1/2020.emnlp-main.264Google ScholarCross Ref
- Brandon Smock, Rohith Pesala, and Robin Abraham. 2021. PubTables-1M: Towards Comprehensive Table Extraction from Unstructured Documents. (Sept. 2021). https://doi.org/10.48550/arXiv.2110.00061Google ScholarCross Ref
- Carlos Soto and Shinjae Yoo. 2019. Visual Detection with Context for Document Layout Analysis. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 3464–3470. https://doi.org/10.18653/v1/D19-1348Google ScholarCross Ref
- Nishant Subramani, Alexandre Matton, Malcolm Greaves, and Adrian Lam. 2021. A Survey of Deep Learning Approaches for OCR and Document Understanding. https://doi.org/10.48550/arXiv.2011.13534 arxiv:2011.13534 [cs]Google ScholarCross Ref
- Hemant D. Tagare, C. Carl Jaffe, and James Duncan. 1997. Medical Image Databases: A Content-Based Retrieval Approach. Journal of the American Medical Informatics Association 4, 3 (May 1997), 184–198. https://doi.org/10.1136/jamia.1997.0040184Google ScholarCross Ref
- Mario Taschwer and Oge Marques. 2018. Automatic Separation of Compound Figures in Scientific Articles. Multimedia Tools and Applications 77, 1 (Jan. 2018), 519–548. https://doi.org/10.1007/s11042-016-4237-xGoogle ScholarDigital Library
- Chris Tensmeyer, Vlad I. Morariu, Brian Price, Scott Cohen, and Tony Martinez. 2019. Deep Splitting and Merging for Table Structure Decomposition. In 2019 International Conference on Document Analysis and Recognition (ICDAR). 114–121. https://doi.org/10.1109/ICDAR.2019.00027Google ScholarCross Ref
- Yuan Tian, Weiwei Cui, Dazhen Deng, Xinjing Yi, Yurun Yang, Haidong Zhang, and Yingcai Wu. 2023. ChartGPT: Leveraging LLMs to Generate Charts from Abstract Natural Language. https://arxiv.org/abs/2311.01920v1.Google Scholar
- Dominika Tkaczyk, Pawel Szostek, and Lukasz Bolikowski. 2014. GROTOAP2-the methodology of creating a large ground truth dataset of scientific articles. D-Lib Magazine 20, 11/12 (2014).Google ScholarCross Ref
- Satoshi Tsutsui and David J. Crandall. 2017. A Data Driven Approach for Compound Figure Separation Using Convolutional Neural Networks. (Nov. 2017), 533–540. https://doi.org/10.1109/icdar.2017.93Google ScholarCross Ref
- Johan Van Benthem. 2008. A brief history of natural logic. (2008).Google Scholar
- Honglin Wan, Zongfeng Zhong, Tianping Li, Huaxiang Zhang, and Jiande Sun. 2022. Contextual Transformer Sequence-Based Recognition Network for Medical Examination Reports. Applied Intelligence (Dec. 2022). https://doi.org/10.1007/s10489-022-04420-4Google ScholarDigital Library
- Nancy X. R. Wang, Diwakar Mahajan, Marina Danilevsky, and Sara Rosenthal. 2021. SemEval-2021 Task 9: Fact Verification and Evidence Finding for Tabular Data in Scientific Documents (SEM-TAB-FACTS). https://doi.org/10.48550/arXiv.2105.13995 arxiv:2105.13995 [cs]Google ScholarCross Ref
- Sheng Wang, Zihao Zhao, Xi Ouyang, Qian Wang, and Dinggang Shen. 2023. Chatcad: Interactive computer-aided diagnosis on medical image using large language models. arXiv preprint arXiv:2302.07257(2023).Google Scholar
- Ziao Wang, Yuhang Li, Junda Wu, Jaehyeon Soon, and Xiaofeng Zhang. 2023. FinVis-GPT: A Multimodal Large Language Model for Financial Chart Analysis. https://arxiv.org/abs/2308.01430v1.Google Scholar
- Zilong Wang, Hao Zhang, Chun-Liang Li, Julian Martin Eisenschlos, Vincent Perot, Zifeng Wang, Lesly Miculicich, Yasuhisa Fujii, Jingbo Shang, Chen-Yu Lee, and Tomas Pfister. 2024. Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding. arxiv:2401.04398 [cs]Google Scholar
- Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 35 (2022), 24824–24837.Google Scholar
- Aoyu Wu, Yun Wang, Xinhuan Shu, Dominik Moritz, Weiwei Cui, Haidong Zhang, Dongmei Zhang, and Huamin Qu. 2021. Ai4vis: Survey on artificial intelligence approaches for data visualization. IEEE Transactions on Visualization and Computer Graphics (2021).Google ScholarDigital Library
- Wenyuan Xue, Baosheng Yu, Wen Wang, Dacheng Tao, and Qingyong Li. 2021. TGRNet: A Table Graph Reconstruction Network for Table Structure Recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1295–1304.Google ScholarCross Ref
- Fan Yang, Lei Hu, Xinwu Liu, Shuangping Huang, and Zhenghui Gu. 2023. A large-scale dataset for end-to-end table recognition in the wild. Scientific Data 10, 1 (2023), 110.Google ScholarCross Ref
- Liping Yang, Ming Gong, and Vijayan K. Asari. 2020. Diagram Image Retrieval and Analysis: Challenges and Opportunities. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 180–181.Google ScholarCross Ref
- Jiabo Ye, Anwen Hu, Haiyang Xu, Qinghao Ye, Ming Yan, Yuhao Dan, Chenlin Zhao, Guohai Xu, Chenliang Li, Junfeng Tian, Qian Qi, Ji Zhang, and Fei Huang. 2023. mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding. https://arxiv.org/abs/2307.02499v1.Google Scholar
- Jiabo Ye, Anwen Hu, Haiyang Xu, Qinghao Ye, Ming Yan, Guohai Xu, Chenliang Li, Junfeng Tian, Qi Qian, Ji Zhang, Qin Jin, Liang He, Xin Alex Lin, and Fei Huang. 2023. UReader: Universal OCR-Free Visually-Situated Language Understanding with Multimodal Large Language Model. https://doi.org/10.48550/arXiv.2310.05126 arxiv:2310.05126 [cs]Google ScholarCross Ref
- Burcu Yildiz, Katharina Kaiser, and Silvia Miksch. 2005. pdf2table: A method to extract table information from pdf files. In IICAI, Vol. 2005. Citeseer, 1773–1785.Google Scholar
- Daekeun You, Emilia Apostolova, Sameer Antani, Dina Demner-Fushman, and George R Thoma. 2009. Figure content analysis for improved biomedical article retrieval. In Document Recognition and Retrieval XVI, Vol. 7247. SPIE, 276–285.Google Scholar
- Daekeun You, Emilia Apostolova, Sameer Antani, Dina Demner-Fushman, and George R. Thoma. 2009. Figure Content Analysis for Improved Biomedical Article Retrieval. In Document Recognition and Retrieval XVI, Vol. 7247. SPIE, 276–285. https://doi.org/10.1117/12.805976Google ScholarCross Ref
- Fengchang Yu, Jiani Huang, Zhuoran Luo, Li Zhang, and Wei Lu. 2023. An effective method for figures and tables detection in academic literature. Information Processing & Management 60, 3 (2023), 103286.Google ScholarDigital Library
- Hong Yu. 2006. Towards Answering Biological Questions with Experimental Evidence: Automatically Identifying Text That Summarize Image Content in Full-Text Articles. AMIA Annual Symposium Proceedings 2006 (2006), 834–838.Google Scholar
- Hong Yu and Minsuk Lee. 2006. Accessing Bioscience Images from Abstract Sentences. Bioinformatics 22, 14 (July 2006), e547–e556. https://doi.org/10.1093/bioinformatics/btl261Google ScholarDigital Library
- Hong Yu, Feifan Liu, and Balaji Polepalli Ramesh. 2010. Automatic Figure Ranking and User Interfacing for Intelligent Figure Search. PLOS ONE 5, 10 (2010), e12983. https://doi.org/10.1371/journal.pone.0012983Google ScholarCross Ref
- Abhay Zala, Han Lin, Jaemin Cho, and Mohit Bansal. 2023. DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning. arxiv:2310.12128 [cs]Google Scholar
- Richard Zanibbi, Dorothea Blostein, and JamesR. Cordy. 2004. A Survey of Table Recognition: Models, Observations, Transformations, and Inferences. Document Analysis and Recognition 7, 1 (March 2004). https://doi.org/10.1007/s10032-004-0120-9Google ScholarDigital Library
- Peng Zhang, Can Li, Liang Qiao, Zhanzhan Cheng, Shiliang Pu, Yi Niu, and Fei Wu. 2021. VSR: A Unified Framework for Document Layout Analysis Combining Vision, Semantics and Relations. In Document Analysis and Recognition – ICDAR 2021 (Lecture Notes in Computer Science), Josep Lladós, Daniel Lopresti, and Seiichi Uchida (Eds.). Springer International Publishing, Cham, 115–130. https://doi.org/10.1007/978-3-030-86549-8_8Google ScholarDigital Library
- Shuo Zhang, Zhuyun Dai, Krisztian Balog, and Jamie Callan. 2020. Summarizing and Exploring Tabular Data in Conversational Search. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, New York, NY, USA, 1537–1540.Google ScholarDigital Library
- Zhenrong Zhang, Jianshu Zhang, Jun Du, and Fengren Wang. 2022. Split, Embed and Merge: An Accurate Table Structure Recognizer. Pattern Recognition 126(June 2022), 108565. https://doi.org/10.1016/j.patcog.2022.108565Google ScholarDigital Library
- Xinyi Zheng, Doug Burdick, Lucian Popa, Xu Zhong, and Nancy Xin Ru Wang. 2020. Global Table Extractor (GTE): A Framework for Joint Table Identification and Cell Structure Recognition Using Visual Context. https://doi.org/10.48550/arXiv.2005.00589 arxiv:2005.00589 [cs]Google ScholarCross Ref
- Xinyi Zheng, Douglas Burdick, Lucian Popa, Xu Zhong, and Nancy Xin Ru Wang. 2021. Global Table Extractor (GTE): A Framework for Joint Table Identification and Cell Structure Recognition Using Visual Context. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 697–706.Google ScholarCross Ref
- Xu Zhong, Elaheh ShafieiBavani, and Antonio Jimeno Yepes. 2020. Image-Based Table Recognition: Data, Model, and Evaluation. In Computer Vision – ECCV 2020, Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Vol. 12366. Springer International Publishing, Cham, 564–580. https://doi.org/10.1007/978-3-030-58589-1_34Google ScholarDigital Library
- Xu Zhong, Jianbin Tang, and Antonio Jimeno Yepes. 2019. PubLayNet: Largest Dataset Ever for Document Layout Analysis. In 2019 International Conference on Document Analysis and Recognition (ICDAR). 1015–1022. https://doi.org/10.1109/ICDAR.2019.00166Google ScholarCross Ref
- Mingyang Zhou, Yi Fung, Long Chen, Christopher Thomas, Heng Ji, and Shih-Fu Chang. 2023. Enhanced Chart Understanding via Visual Language Pre-Training on Plot Table Pairs. In Findings of the Association for Computational Linguistics: ACL 2023, Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (Eds.). Association for Computational Linguistics, Toronto, Canada, 1314–1326. https://doi.org/10.18653/v1/2023.findings-acl.85Google ScholarCross Ref
- Junnan Zhu, Haoran Li, Tianshang Liu, Yu Zhou, Jiajun Zhang, and Chengqing Zong. 2018. MSMO: Multimodal Summarization with Multimodal Output. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 4154–4164. https://doi.org/10.18653/v1/D18-1448Google ScholarCross Ref
Index Terms
- From Detection to Application: Recent Advances in Understanding Scientific Tables and Figures
Recommendations
Literature Explorer: effective retrieval of scientific documents through nonparametric thematic topic detection
AbstractScientific researchers are facing a rapidly growing volume of literatures nowadays. While these publications offer rich and valuable information, the scale of the datasets makes it difficult for the researchers to manage and search for desired ...
Understanding persistent scientific collaboration
Common sense suggests that persistence is key to success. In academia, successful researchers have been found more likely to be persistent in publishing, but little attention has been given to how persistence in maintaining collaborative relationships ...
Acknowledgments in scientific publications: Presence in Spanish science and text patterns across disciplines
The acknowledgments in scientific publications are an important feature in the scholarly communication process. This research analyzes funding acknowledgment presence in scientific publications and introduces a novel approach for discovering text ...
Comments