Gabriel Roccabruna retweetledi

🤖#VLM shine on benchmarks📈! But do they truly understand scenes? CIVET takes full control of stimuli to systematically evaluate VLMs re object properties, position, & relations. Spoiler⚠️: There’s still a gap to bridge.
C U @emnlpmeeting
arxiv.org/pdf/2506.05146
#EMNLP2025
English













