
#EMNLP2024 Best Paper 1/5: An image speaks a thousand words, but can everyone listen? On image transcreation for cultural relevance
Simran Khanuja
532 posts

@simi_97k
NLP | PhD Student @LTIatCMU | Predoctoral Researcher @Google | Microsoft Research | BITS Pilani

#EMNLP2024 Best Paper 1/5: An image speaks a thousand words, but can everyone listen? On image transcreation for cultural relevance

📢 Submissions are now OPEN for our @CVPR Workshop: Multimodal Alignment for a Pluralistic Society (MAPS)! Help us build AI that reflects global diversity, culture, and human values. 🌍🌏🌎 📅 Mar 3 – Apr 10, 2026 📝 Short papers (4 pgs) 🔗 openreview.net/group?id=thecv… #CVPR2026

Introducing the Machine Translation for Vision (MTV) Challenge at #CVPR2026! Can your model localize (culturally adapt) images — not just translate text, but reimagine visuals for different cultures? 🌍



📢 Open-sourcing the Sarvam 30B and 105B models! Trained from scratch with all data, model research and inference optimisation done in-house, these models punch above their weight in most global benchmarks plus excel in Indian languages. Get the weights at Hugging Face and AIKosh. Thanks to the good folks at SGLang for day 0 support, vLLM support coming soon. Links, benchmark scores, examples, and more in our blog - sarvam.ai/blogs/sarvam-3…

Today we introduce humans&, a human-centric frontier AI lab. We believe AI can be reimagined, centering around people and their relationships with each other. At its best, AI should serve as a deeper connective tissue that strengthens organizations and communities

Studying multicultural text-to-image systems requires costly subjective human evaluation. Automated, quantified systems for "visual cultural attribution" would be very valuable. @arnav_y1, @_siddharth_y, @simi_97k, @gneubig and introduce CAIRe, an open-vocabulary metric for VCA!





