Is Multimodal Vision Supervision Beneficial to Language? | IEEE Conference Publication | IEEE Xplore