A Novel Attention-based Aggregation Function to Combine Vision and Language | IEEE Conference Publication | IEEE Xplore