Seeing past words: Testing the cross-modal capabilities of pretrained V&L models on counting tasks

Parcabalescu, L; Frank, Annette; Calixto, I

Seeing past words: Testing the cross-modal capabilities of pretrained V&L models on counting tasks

DSpace/Manakin Repository

Seeing past words: Testing the cross-modal capabilities of pretrained V&L models on counting tasks

Parcabalescu, L; Frank, Annette; Calixto, I

(2021) Proceedings of Beyond Language: Multimodal Semantic Representations (MMSR'21), pp. 32 - 44

(Part of book)

Abstract

We investigate the reasoning ability of pretrained vision and language (V&L) models in two tasks that require multimodal integration: (1) discriminating a correct image-sentence pair from an incorrect one, and (2) counting entities in an image. We evaluate three pretrained V&L models on these tasks: ViLBERT, ViLBERT 12-in-1 and LXMERT, ... read more

Download/Full Text

Open Access version via Utrecht University Repository

Publisher version

Publisher: Association for Computational Linguistics

(Peer reviewed)

See more statistics about this item