Skip to main content

Consistency-Preserving Visual Question Answering in Medical Imaging

  • Conference paper
  • First Online:
Medical Image Computing and Computer Assisted Intervention – MICCAI 2022 (MICCAI 2022)

Abstract

Visual Question Answering (VQA) models take an image and a natural-language question as input and infer the answer to the question. Recently, VQA systems in medical imaging have gained popularity thanks to potential advantages such as patient engagement and second opinions for clinicians. While most research efforts have been focused on improving architectures and overcoming data-related limitations, answer consistency has been overlooked even though it plays a critical role in establishing trustworthy models. In this work, we propose a novel loss function and corresponding training procedure that allows the inclusion of relations between questions into the training process. Specifically, we consider the case where implications between perception and reasoning questions are known a-priori. To show the benefits of our approach, we evaluate it on the clinically relevant task of Diabetic Macular Edema (DME) staging from fundus imaging. Our experiments show that our method outperforms state-of-the-art baselines, not only by improving model consistency, but also in terms of overall model accuracy. Our code and data are available at https://github.com/sergiotasconmorales/consistency_vqa.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Antol, S., et al.: VQA: visual question answering. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2425–2433 (2015)

    Google Scholar 

  2. Cadene, R., Dancette, C., Cord, M., Parikh, D., et al.: RUBi: reducing unimodal biases for visual question answering. Adv. Neural. Inf. Process. Syst. 32, 841–852 (2019)

    Google Scholar 

  3. Decenciere, E., et al.: TeleoOhta: machine learning and image processing methods for teleophthalmology. IRBM 34(2), 196–203 (2013)

    Article  Google Scholar 

  4. Goel, V., Chandak, M., Anand, A., Guha, P.: IQ-VQA: intelligent visual question answering. In: Del Bimbo, A., et al. (eds.) ICPR 2021. LNCS, vol. 12662, pp. 357–370. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-68790-8_28

    Chapter  Google Scholar 

  5. Gokhale, T., Banerjee, P., Baral, C., Yang, Y.: VQA-LOL: visual question answering under the lens of logic. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12366, pp. 379–396. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58589-1_23

    Chapter  Google Scholar 

  6. Gong, H., Chen, G., Liu, S., Yu, Y., Li, G.: Cross-modal self-attention with multi-task pre-training for medical visual question answering. In: Proceedings of the 2021 International Conference on Multimedia Retrieval, pp. 456–460 (2021)

    Google Scholar 

  7. Goyal, Y., Khot, T., Summers-Stay, D., Batra, D., Parikh, D.: Making the V in VQA matter: Elevating the role of image understanding in visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6904–6913 (2017)

    Google Scholar 

  8. Hasan, S.A., Ling, Y., Farri, O., Liu, J., Lungren, M., Müller, H.: Overview of the ImageCLEF 2018 medical domain visual question answering task. In: CLEF2018 Working Notes. CEUR Workshop Proceedings, CEUR-WS.org http://ceur-ws.org, Avignon, France, 10–14 September 2018

  9. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  10. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  11. Hudson, D.A., Manning, C.D.: GQA: a new dataset for compositional question answering over real-world images. arXiv preprint arXiv:1902.09506, vol. 3(8) (2019)

  12. Liao, Z., Wu, Q., Shen, C., Van Den Hengel, A., Verjans, J.: AIML at VQA-Med 2020: knowledge inference via a skeleton-based sentence mapping approach for medical domain visual question answering (2020)

    Google Scholar 

  13. Liu, F., Peng, Y., Rosen, M.P.: An effective deep transfer learning and information fusion framework for medical visual question answering. In: Crestani, F., et al. (eds.) CLEF 2019. LNCS, vol. 11696, pp. 238–247. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28577-7_20

    Chapter  Google Scholar 

  14. Nguyen, B.D., Do, T.-T., Nguyen, B.X., Do, T., Tjiputra, E., Tran, Q.D.: Overcoming data limitation in medical visual question answering. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11767, pp. 522–530. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32251-9_57

    Chapter  Google Scholar 

  15. Porwal, P., et al.: Indian diabetic retinopathy image dataset (IDRiD) (2018). https://dx.doi.org/10.21227/H25W98

  16. Ray, A., Sikka, K., Divakaran, A., Lee, S., Burachas, G.: Sunny and dark outside?! Improving answer consistency in VQA through entailed question generation. arXiv preprint arXiv:1909.04696 (2019)

  17. Ren, F., Cao, P., Zhao, D., Wan, C.: Diabetic macular edema grading in retinal images using vector quantization and semi-supervised learning. Technol. Health Care 26(S1), 389–397 (2018)

    Article  Google Scholar 

  18. Ribeiro, M.T., Guestrin, C., Singh, S.: Are red roses red? Evaluating consistency of question-answering models. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 6174–6184 (2019)

    Google Scholar 

  19. Sarrouti, M.: NLM at VQA-Med 2020: visual question answering and generation in the medical domain. In: CLEF (Working Notes) (2020)

    Google Scholar 

  20. Selvaraju, R.R., et al.: Squinting at VQA models: introspecting VQA models with sub-questions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10003–10011 (2020)

    Google Scholar 

  21. Shah, M., Chen, X., Rohrbach, M., Parikh, D.: Cycle-consistency for robust visual question answering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6649–6658 (2019)

    Google Scholar 

  22. Tan, H., Bansal, M.: LXMERT: learning cross-modality encoder representations from transformers. arXiv preprint arXiv:1908.07490 (2019)

  23. Teney, D., Abbasnejad, E., Hengel, A.V.D.: On incorporating semantic prior knowledge in deep learning through embedding-space constraints. arXiv preprint arXiv:1909.13471 (2019)

  24. Vu, M.H., Löfstedt, T., Nyholm, T., Sznitman, R.: A question-centric model for visual question answering in medical imaging. IEEE Trans. Med. Imaging 39(9), 2856–2868 (2020)

    Article  Google Scholar 

  25. Wang, P., Liao, R., Moyer, D., Berkowitz, S., Horng, S., Golland, P.: Image classification with consistent supporting evidence. In: Machine Learning for Health, pp. 168–180. PMLR (2021)

    Google Scholar 

  26. Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057. PMLR (2015)

    Google Scholar 

  27. Yuan, Y., Wang, S., Jiang, M., Chen, T.Y.: Perception matters: detecting perception failures of VQA models using metamorphic testing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16908–16917 (2021)

    Google Scholar 

  28. Zhan, L.M., Liu, B., Fan, L., Chen, J., Wu, X.M.: Medical visual question answering via conditional reasoning. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 2345–2354 (2020)

    Google Scholar 

Download references

Acknowledgments

This work was partially funded by the Swiss National Science Foundation through the grant # 191983.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sergio Tascon-Morales .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 253 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tascon-Morales, S., Márquez-Neila, P., Sznitman, R. (2022). Consistency-Preserving Visual Question Answering in Medical Imaging. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds) Medical Image Computing and Computer Assisted Intervention – MICCAI 2022. MICCAI 2022. Lecture Notes in Computer Science, vol 13438. Springer, Cham. https://doi.org/10.1007/978-3-031-16452-1_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-16452-1_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-16451-4

  • Online ISBN: 978-3-031-16452-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics