Interpreting Vision and Language Generative Models with Semantic Visual Priors

Cafagna, M; Rojas-Barahona, LM; van Deemter, K; Gatt, A

doi:https://doi.org/10.3389/frai.2023.1220476

Interpreting Vision and Language Generative Models with Semantic Visual Priors

DSpace/Manakin Repository

Interpreting Vision and Language Generative Models with Semantic Visual Priors

Cafagna, M; Rojas-Barahona, LM; van Deemter, K; Gatt, A

(2023) Frontiers in Artificial Intelligence, volume 6, pp. 1 - 18

(Article)

Abstract

When applied to Image-to-text models, explainability methods have two challenges. First, they often provide token-by-token explanations namely, they compute a visual explanation for each token of the generated sequence. This makes explanations expensive to compute and unable to comprehensively explain the model's output. Second, for models with visual inputs, explainability ... read more

Download/Full Text

Open Access version via Utrecht University Repository

Publisher version

Keywords: vision and language, multimodality, explainability, image captioning, visual questionanswering, natural language generation

DOI: https://doi.org/10.3389/frai.2023.1220476

ISSN: 2624-8212

Publisher: Frontiers Media

(Peer reviewed)

See more statistics about this item