Consistency-Preserving Visual Question Answering in Medical Imaging

Tascon-Morales, Sergio; Márquez-Neila, Pablo; Sznitman, Raphael

doi:10.1007/978-3-031-16452-1_37

Sergio Tascon-Morales¹²,
Pablo Márquez-Neila¹² &
Raphael Sznitman¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13438))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

6781 Accesses
6 Citations

Abstract

Visual Question Answering (VQA) models take an image and a natural-language question as input and infer the answer to the question. Recently, VQA systems in medical imaging have gained popularity thanks to potential advantages such as patient engagement and second opinions for clinicians. While most research efforts have been focused on improving architectures and overcoming data-related limitations, answer consistency has been overlooked even though it plays a critical role in establishing trustworthy models. In this work, we propose a novel loss function and corresponding training procedure that allows the inclusion of relations between questions into the training process. Specifically, we consider the case where implications between perception and reasoning questions are known a-priori. To show the benefits of our approach, we evaluate it on the clinically relevant task of Diabetic Macular Edema (DME) staging from fundus imaging. Our experiments show that our method outperforms state-of-the-art baselines, not only by improving model consistency, but also in terms of overall model accuracy. Our code and data are available at https://github.com/sergiotasconmorales/consistency_vqa.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Antol, S., et al.: VQA: visual question answering. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2425–2433 (2015)
Google Scholar
Cadene, R., Dancette, C., Cord, M., Parikh, D., et al.: RUBi: reducing unimodal biases for visual question answering. Adv. Neural. Inf. Process. Syst. 32, 841–852 (2019)
Google Scholar
Decenciere, E., et al.: TeleoOhta: machine learning and image processing methods for teleophthalmology. IRBM 34(2), 196–203 (2013)
Article Google Scholar
Goel, V., Chandak, M., Anand, A., Guha, P.: IQ-VQA: intelligent visual question answering. In: Del Bimbo, A., et al. (eds.) ICPR 2021. LNCS, vol. 12662, pp. 357–370. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-68790-8_28
Chapter Google Scholar
Gokhale, T., Banerjee, P., Baral, C., Yang, Y.: VQA-LOL: visual question answering under the lens of logic. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12366, pp. 379–396. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58589-1_23
Chapter Google Scholar
Gong, H., Chen, G., Liu, S., Yu, Y., Li, G.: Cross-modal self-attention with multi-task pre-training for medical visual question answering. In: Proceedings of the 2021 International Conference on Multimedia Retrieval, pp. 456–460 (2021)
Google Scholar
Goyal, Y., Khot, T., Summers-Stay, D., Batra, D., Parikh, D.: Making the V in VQA matter: Elevating the role of image understanding in visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6904–6913 (2017)
Google Scholar
Hasan, S.A., Ling, Y., Farri, O., Liu, J., Lungren, M., Müller, H.: Overview of the ImageCLEF 2018 medical domain visual question answering task. In: CLEF2018 Working Notes. CEUR Workshop Proceedings, CEUR-WS.org http://ceur-ws.org, Avignon, France, 10–14 September 2018
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Hudson, D.A., Manning, C.D.: GQA: a new dataset for compositional question answering over real-world images. arXiv preprint arXiv:1902.09506, vol. 3(8) (2019)
Liao, Z., Wu, Q., Shen, C., Van Den Hengel, A., Verjans, J.: AIML at VQA-Med 2020: knowledge inference via a skeleton-based sentence mapping approach for medical domain visual question answering (2020)
Google Scholar
Liu, F., Peng, Y., Rosen, M.P.: An effective deep transfer learning and information fusion framework for medical visual question answering. In: Crestani, F., et al. (eds.) CLEF 2019. LNCS, vol. 11696, pp. 238–247. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28577-7_20
Chapter Google Scholar
Nguyen, B.D., Do, T.-T., Nguyen, B.X., Do, T., Tjiputra, E., Tran, Q.D.: Overcoming data limitation in medical visual question answering. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11767, pp. 522–530. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32251-9_57
Chapter Google Scholar
Porwal, P., et al.: Indian diabetic retinopathy image dataset (IDRiD) (2018). https://dx.doi.org/10.21227/H25W98
Ray, A., Sikka, K., Divakaran, A., Lee, S., Burachas, G.: Sunny and dark outside?! Improving answer consistency in VQA through entailed question generation. arXiv preprint arXiv:1909.04696 (2019)
Ren, F., Cao, P., Zhao, D., Wan, C.: Diabetic macular edema grading in retinal images using vector quantization and semi-supervised learning. Technol. Health Care 26(S1), 389–397 (2018)
Article Google Scholar
Ribeiro, M.T., Guestrin, C., Singh, S.: Are red roses red? Evaluating consistency of question-answering models. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 6174–6184 (2019)
Google Scholar
Sarrouti, M.: NLM at VQA-Med 2020: visual question answering and generation in the medical domain. In: CLEF (Working Notes) (2020)
Google Scholar
Selvaraju, R.R., et al.: Squinting at VQA models: introspecting VQA models with sub-questions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10003–10011 (2020)
Google Scholar
Shah, M., Chen, X., Rohrbach, M., Parikh, D.: Cycle-consistency for robust visual question answering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6649–6658 (2019)
Google Scholar
Tan, H., Bansal, M.: LXMERT: learning cross-modality encoder representations from transformers. arXiv preprint arXiv:1908.07490 (2019)
Teney, D., Abbasnejad, E., Hengel, A.V.D.: On incorporating semantic prior knowledge in deep learning through embedding-space constraints. arXiv preprint arXiv:1909.13471 (2019)
Vu, M.H., Löfstedt, T., Nyholm, T., Sznitman, R.: A question-centric model for visual question answering in medical imaging. IEEE Trans. Med. Imaging 39(9), 2856–2868 (2020)
Article Google Scholar
Wang, P., Liao, R., Moyer, D., Berkowitz, S., Horng, S., Golland, P.: Image classification with consistent supporting evidence. In: Machine Learning for Health, pp. 168–180. PMLR (2021)
Google Scholar
Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057. PMLR (2015)
Google Scholar
Yuan, Y., Wang, S., Jiang, M., Chen, T.Y.: Perception matters: detecting perception failures of VQA models using metamorphic testing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16908–16917 (2021)
Google Scholar
Zhan, L.M., Liu, B., Fan, L., Chen, J., Wu, X.M.: Medical visual question answering via conditional reasoning. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 2345–2354 (2020)
Google Scholar

Download references

Acknowledgments

This work was partially funded by the Swiss National Science Foundation through the grant # 191983.

Author information

Authors and Affiliations

University of Bern, Bern, Switzerland
Sergio Tascon-Morales, Pablo Márquez-Neila & Raphael Sznitman

Authors

Sergio Tascon-Morales
View author publications
You can also search for this author in PubMed Google Scholar
Pablo Márquez-Neila
View author publications
You can also search for this author in PubMed Google Scholar
Raphael Sznitman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sergio Tascon-Morales .

Editor information

Editors and Affiliations

Rochester Institute of Technology, Rochester, NY, USA
Linwei Wang
Chinese University of Hong Kong, Hong Kong, Hong Kong
Qi Dou
University of Virginia, Charlottesville, VA, USA
P. Thomas Fletcher
National Center for Tumor Diseases (NCT/UCC), Dresden, Germany
Stefanie Speidel
Case Western Reserve University, Cleveland, OH, USA
Shuo Li

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 253 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tascon-Morales, S., Márquez-Neila, P., Sznitman, R. (2022). Consistency-Preserving Visual Question Answering in Medical Imaging. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds) Medical Image Computing and Computer Assisted Intervention – MICCAI 2022. MICCAI 2022. Lecture Notes in Computer Science, vol 13438. Springer, Cham. https://doi.org/10.1007/978-3-031-16452-1_37

Download citation

DOI: https://doi.org/10.1007/978-3-031-16452-1_37
Published: 16 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16451-4
Online ISBN: 978-3-031-16452-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)