Zero-Shot Visual Question Answering Using Knowledge Graph

Chen, Zhuo; Chen, Jiaoyan; Geng, Yuxia; Pan, Jeff Z.; Yuan, Zonggang; Chen, Huajun

doi:10.1007/978-3-030-88361-4_9

Zhuo Chen^17,18,
Jiaoyan Chen¹⁹,
Yuxia Geng^17,18,
Jeff Z. Pan²⁰,
Zonggang Yuan²¹ &
…
Huajun Chen^17,18

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12922))

Included in the following conference series:

International Semantic Web Conference

3859 Accesses
25 Citations

Abstract

Incorporating external knowledge to Visual Question Answering (VQA) has become a vital practical need. Existing methods mostly adopt pipeline approaches with different components for knowledge matching and extraction, feature learning, etc. However, such pipeline approaches suffer when some component does not perform well, which leads to error cascading and poor overall performance. Furthermore, the majority of existing approaches ignore the answer bias issue—many answers may have never appeared during training (i.e., unseen answers) in real-word application. To bridge these gaps, in this paper, we propose a Zero-shot VQA algorithm using knowledge graph and a mask-based learning mechanism for better incorporating external knowledge, and present new answer-based Zero-shot VQA splits for the F-VQA dataset. Experiments show that our method can achieve state-of-the-art performance in Zero-shot VQA with unseen answers, meanwhile dramatically augment existing end-to-end models on the normal F-VQA task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Visual Question Generation Under Multi-granularity Cross-Modal Interaction

Answer Distillation for Visual Question Answering

Multiple Interaction Learning with Question-Type Prior Knowledge for Constraining Answer Search Space in Visual Question Answering

Notes

1.
Our code and data are available at https://github.com/China-UK-ZSL/ZS-F-VQA.
2.
https://www.w3.org/TR/2014/REC-rdf11-mt-20140225/.

References

Agrawal, A., Kembhavi, A., Batra, D., Parikh, D.: C-VQA: a compositional split of the visual question answering (VQA) v1.0 dataset. CoRR arXiv:1704.08243 (2017)
Anderson, P., et al.: Bottom-up and top-down attention for image captioning and visual question answering. In: CVPR, pp. 6077–6086 (2018)
Google Scholar
Antol, S., et al.: VQA: visual question answering. In: ICCV, pp. 2425–2433 (2015)
Google Scholar
Bordes, A., Usunier, N., García-Durán, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: NIPS, pp. 2787–2795 (2013)
Google Scholar
Chen, J., Geng, Y., Chen, Z., Horrocks, I., Pan, J.Z., Chen, H.: Knowledge-aware zero-shot learning: survey and perspective. In: IJCAI Survey Track (2021)
Google Scholar
Chen, L., Gan, Z., Cheng, Y., Li, L., Carin, L., Liu, J.: Graph optimal transport for cross-domain alignment. In: ICML, vol. 119, pp. 1542–1553 (2020)
Google Scholar
Chen, W., Zha, H., Chen, Z., Xiong, W., Wang, H., Wang, W.Y.: HybridQA: a dataset of multi-hop question answering over tabular and textual data. In: EMNLP, pp. 1026–1036 (2020)
Google Scholar
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL, pp. 4171–4186 (2019)
Google Scholar
Farazi, M.R., Khan, S.H., Barnes, N.: From known to the unknown: transferring knowledge to answer questions about novel visual and semantic concepts. Image Vis. Comput. 103, 103985 (2020)
Article Google Scholar
Geng, Y., et al.: OntoZSL: ontology-enhanced zero-shot learning. In: WWW, pp. 3325–3336 (2021)
Google Scholar
Geng, Y., Chen, J., Chen, Z., Pan, J.Z., Yuan, Z., Chen, H.: K-ZSL: resources for knowledge-driven zero-shot learning. CoRR arXiv:2106.15047 (2021)
Hu, H., Chao, W., Sha, F.: Learning answer embeddings for visual question answering. In: CVPR, pp. 5428–5436 (2018)
Google Scholar
Kim, J., Jun, J., Zhang, B.: Bilinear attention networks. In: NeurIPS, pp. 1571–1581 (2018)
Google Scholar
Lu, J., Yang, J., Batra, D., Parikh, D.: Hierarchical question-image co-attention for visual question answering. In: NIPS, pp. 289–297 (2016)
Google Scholar
Malaviya, C., Bhagavatula, C., Bosselut, A., Choi, Y.: Commonsense knowledge base completion with structural and semantic context. In: AAAI, pp. 2925–2933 (2020)
Google Scholar
Marino, K., Rastegari, M., Farhadi, A., Mottaghi, R.: OK-VQA: A visual question answering benchmark requiring external knowledge. In: CVPR, pp. 3195–3204 (2019)
Google Scholar
Narasimhan, M., Lazebnik, S., Schwing, A.G.: Out of the box: reasoning with graph convolution nets for factual visual question answering. In: NeurIPS, pp. 2659–2670 (2018)
Google Scholar
Narasimhan, M., Schwing, A.G.: Straight to the facts: learning knowledge base retrieval for factual visual question answering. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11212, pp. 460–477. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01237-3_28
Chapter Google Scholar
Pan, J., et al. (eds.): Reasoning Web: Logical Foundation of Knowledge Graph Construction and Querying Answering. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-49493-7
Pan, J., Vetere, G., Gomez-Perez, J., Wu, H. (eds.): Exploiting Linked Data and Knowledge Graphs for Large Organisations. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-45654-6
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: EMNLP, pp. 1532–1543 (2014)
Google Scholar
Ramakrishnan, S.K., Pal, A., Sharma, G., Mittal, A.: An empirical evaluation of visual question answering for novel objects. In: CVPR, pp. 7312–7321 (2017)
Google Scholar
Shah, S., Mishra, A., Yadati, N., Talukdar, P.P.: KVQA: knowledge-aware visual question answering. In: AAAI, pp. 8876–8884 (2019)
Google Scholar
Shevchenko, V., Teney, D., Dick, A.R., van den Hengel, A.: Visual question answering with prior class semantics. CoRR arXiv:2005.01239 (2020)
Teney, D., van den Hengel, A.: Zero-shot visual question answering. CoRR arXiv:1611.05546 (2016)
Wang, P., Wu, Q., Shen, C., Dick, A.R., van den Hengel, A.: Explicit knowledge-based reasoning for visual question answering. In: IJCAI, pp. 1290–1296 (2017)
Google Scholar
Wang, P., Wu, Q., Shen, C., Dick, A.R., van den Hengel, A.: FVQA: fact-based visual question answering. IEEE TPAMI 40(10), 2413–2427 (2018)
Article Google Scholar
Wu, Q., Wang, P., Shen, C., Dick, A.R., van den Hengel, A.: Ask me anything: free-form visual question answering based on knowledge from external sources. In: CVPR, pp. 4622–4630 (2016)
Google Scholar
Yang, Z., He, X., Gao, J., Deng, L., Smola, A.J.: Stacked attention networks for image question answering. In: CVPR, pp. 21–29 (2016)
Google Scholar
Ye, Z., Chen, Q., Wang, W., Ling, Z.: Align, mask and select: a simple method for incorporating commonsense knowledge into language representation models. CoRR arXiv:1908.06725 (2019)
Zhu, Z., Yu, J., Wang, Y., Sun, Y., Hu, Y., Wu, Q.: Mucko: multi-layer cross-modal knowledge reasoning for fact-based visual question answering. In: IJCAI, pp. 1097–1103 (2020)
Google Scholar

Download references

Acknowledgments

This work is funded by 2018YFB1402800/NSFCU19B2027 /NSFC91846204. Jiaoyan Chen is founded by the SIRIUS Centre for Scalable Data Access (Research Council of Norway) and Samsung Research UK.

Author information

Authors and Affiliations

College of Computer Science and Hangzhou Innovation Center, Zhejiang University, Hangzhou, China
Zhuo Chen, Yuxia Geng & Huajun Chen
AZFT Joint Lab for Knowledge Engine, Hangzhou, China
Zhuo Chen, Yuxia Geng & Huajun Chen
Department of Computer Science, University of Oxford, Oxford, UK
Jiaoyan Chen
School of Informatics, The University of Edinburgh, Edinburgh, UK
Jeff Z. Pan
NAIE CTO Office, Huawei Technologies Co., Ltd., Shenzhen, China
Zonggang Yuan

Authors

Zhuo Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jiaoyan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yuxia Geng
View author publications
You can also search for this author in PubMed Google Scholar
Jeff Z. Pan
View author publications
You can also search for this author in PubMed Google Scholar
Zonggang Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Huajun Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huajun Chen .

Editor information

Editors and Affiliations

University of Würzburg, Würzburg, Germany
Andreas Hotho
Linköping University, Linköping, Sweden
Eva Blomqvist
University of Düsseldorf, Düsseldorf, Germany
Stefan Dietze
IBM Research - Thomas J. Watson Research, Hawthorne, CA, USA
Achille Fokoue
University of Texas, Austin, TX, USA
Ying Ding
Imperial College, London, UK
Payam Barnaghi
Australian National University, Canberra, ACT, Australia
Armin Haller
Fondazione Bruno Kessler, Povo, Trento, Italy
Mauro Dragoni
The Open University Walton Hall, Milton Keynes, UK
Harith Alani

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, Z., Chen, J., Geng, Y., Pan, J.Z., Yuan, Z., Chen, H. (2021). Zero-Shot Visual Question Answering Using Knowledge Graph. In: Hotho, A., et al. The Semantic Web – ISWC 2021. ISWC 2021. Lecture Notes in Computer Science(), vol 12922. Springer, Cham. https://doi.org/10.1007/978-3-030-88361-4_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-88361-4_9
Published: 30 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88360-7
Online ISBN: 978-3-030-88361-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the Semantic Web Science Association (opens in a new tab)

Zero-Shot Visual Question Answering Using Knowledge Graph

Abstract

Access this chapter

Similar content being viewed by others

Visual Question Generation Under Multi-granularity Cross-Modal Interaction

Answer Distillation for Visual Question Answering

Multiple Interaction Learning with Question-Type Prior Knowledge for Constraining Answer Search Space in Visual Question Answering

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Zero-Shot Visual Question Answering Using Knowledge Graph

Abstract

Access this chapter

Similar content being viewed by others

Visual Question Generation Under Multi-granularity Cross-Modal Interaction

Answer Distillation for Visual Question Answering

Multiple Interaction Learning with Question-Type Prior Knowledge for Constraining Answer Search Space in Visual Question Answering

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation