Abstract
Salient region detection in Visual Question Answering (VQA) is an attempt to simulate a human ability to quickly perceive a scene by selectively looking on image fragments instead of processing a whole scene. The conventional approach deals with a neural network application. However, the Convolutional Neural Networks (CNNs) have many disadvantages compared with traditional methods for salient region detection. We modified the basic algorithm of salient region detection for VQA task by selecting such image fragments, which have a high probability to be included in a questionnaire. The experiments have been conducted on images from MS-COCO dataset and provided good segmentation results.
Export citation and abstract BibTeX RIS
Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.