What Vision-Language Models `See' when they See Scenes

DSpace/Manakin Repository

 
 
See more statistics about this item