MohammadHossein Rezaei

37 posts

MohammadHossein Rezaei banner
MohammadHossein Rezaei

MohammadHossein Rezaei

@mhrezaeics

Post-training Research @ScaleAILabs | Ex Research Intern @StanfordNLP | CS @UArizona

New York, NY Katılım Kasım 2022
771 Takip Edilen126 Takipçiler
Sabitlenmiş Tweet
MohammadHossein Rezaei
MohammadHossein Rezaei@mhrezaeics·
Claude code running✅ Insane office view✅ => Life is good.
MohammadHossein Rezaei tweet media
English
1
1
14
201
MohammadHossein Rezaei retweetledi
Scale Labs
Scale Labs@ScaleAILabs·
Welcome to the home of all things @scale_AI research — focused on data, evaluation, safety, and post-training that moves frontier models forward. We’ll share benchmarks, insights, and work intended to be useful to the broader research community. labs.scale.com/?utm_source=hu…
English
0
12
46
22.8K
MohammadHossein Rezaei retweetledi
Bing Liu
Bing Liu@vbingliu·
🧠 Can your model think with images? Today we’re releasing VisualToolBench, a new benchmark for multimodal reasoning with tool use, that tests whether multimodal LLMs can think-with-images, not just think about them.
Bing Liu tweet media
English
1
4
28
1.2K
MohammadHossein Rezaei retweetledi
Tanishq Mathew Abraham, Ph.D.
Tanishq Mathew Abraham, Ph.D.@iScienceLuvr·
Online Rubrics Elicitation from Pairwise Comparisons "We introduce Online Rubrics Elicitation (OnlineRubrics), a method that dynamically curates evaluation criteria in an online manner through pairwise comparisons of responses from current and reference policies. This online process enables continuous identification and mitigation of errors as training proceeds. Empirically, this approach yields consistent improvements of up to 8% over training exclusively with static rubrics across AlpacaEval, GPQA, ArenaHard as well as the validation sets of expert questions and rubrics."
Tanishq Mathew Abraham, Ph.D. tweet media
English
1
1
13
2.6K
MohammadHossein Rezaei retweetledi
Afra Feyza Akyürek
Afra Feyza Akyürek@afeyzaakyurek·
✨ We just published a new @scale_AI paper! ✨ Rubrics are great for quantifying quality in open-ended tasks but they struggle to capture emergent behaviors as RL training evolves. We propose a framework for online construction of rubrics via pairwise comparisons of model responses!
Bing Liu@vbingliu

🔄RLHF → RLVR → Rubrics → OnlineRubrics 👤 Human feedback = noisy & coarse 🧮 Verifiable rewards = too narrow 📋 Static rubrics = rigid, easy to hack, miss emergent behaviors 💡We introduce OnlineRubrics: elicited rubrics that evolve as models train. arxiv.org/abs/2510.07284

English
2
2
22
4.1K
MohammadHossein Rezaei
MohammadHossein Rezaei@mhrezaeics·
Diyi Yang@Diyi_Yang

Check out 🔥 EgoNormia: a benchmark for physical social norm understanding egonormia.org Can we really trust VLMs to make decisions that align with human norms? 👩‍⚖️ With EgoNormia, a 1800 ego-centric video 🥽 QA benchmark, we show that this is surprisingly challenging 🤖 🌐 arxiv.org/abs/2502.20490 Our amazing team: MohammadHossein Rezaei* (U of A), Yicheng Fu* , Phil Cuvin* (U of T), @cjziems , @StevenyzZhang , @_Hao_Zhu

ZXX
0
0
1
165
MohammadHossein Rezaei
MohammadHossein Rezaei@mhrezaeics·
Check out EgoNormia.org at #acl2025!
MohammadHossein Rezaei tweet media
Stanford NLP Group@stanfordnlp

.@stanfordnlp papers at @aclmeeting in Vienna next week: • HumT DumT: Measuring and controlling human-like language in LLMs @chengmyra1 @sunnyyuych @jurafsky • Controllable and Reliable Knowledge-Intensive Task Agents with Declarative GenieWorksheets @harshitj__ @ShichengGLiu@MonicaSLam • Distilling an End-to-End Voice Assistant Without Instruction Training Data @WilliamBarrHeld @StevenyzZhang@Diyi_Yang • SynthesizeMe! Inducing Persona-Guided Prompts for Personalized Reward Models in LLMs @michaelryan207 @oshaikh13@Diyi_Yang • Attacking Vision-Language Computer Agents via Pop-ups @StevenyzZhang@Diyi_Yang • Mind the Gap: Static and Interactive Evaluations of Large Audio Models @EllaMinzhiLi @WilliamBarrHeld … @Diyi_Yang • Drop Dropout on Single Epoch Language Model Pretraining @houjun_liu @AngledLuffa @chrmanning • ACLED-DS: A Large Multilingual Expert-Annotated Event Dataset for the Real World @sina_semnani@MonicaSLam • SPHERE: An Evaluation Card for Human-AI Systems @dorazhao9@Diyi_Yang • EgoNormia: Benchmarking Physical Social Norm Understanding @mhrezaeics@Diyi_Yang • Tell, Don’t Show: Leveraging Language Models’ Abstractive Retellings to Model Literary Themes @lucy3_li @camgriffi@ddemszky

English
2
2
4
2.9K
Abhilasha Ravichander
Abhilasha Ravichander@lasha_nlp·
Life update: I’m excited to share that I’ll be starting as faculty at the Max Planck Institute for Software Systems(@mpi_sws_) this Fall!🎉 I’ll be recruiting PhD students in the upcoming cycle, as well as research interns throughout the year: lasharavichander.github.io/contact.html
Abhilasha Ravichander tweet media
English
83
46
594
61.9K
MohammadHossein Rezaei
MohammadHossein Rezaei@mhrezaeics·
If you’re at NAACL today, I’ll be presenting this poster in Hall 3 from 2:00 – 3:30 PM. Paper link: aclanthology.org/2025.naacl-lon…
MohammadHossein Rezaei@mhrezaeics

1/🚨 Thrilled to share that our paper (w/ @eduardo_nlp), "Making Language Models Robust Against Negation," has been accepted to the #NAACL2025 main conference! 🎉 #Negation has always been a challenge for language models. Here's our self-supervised method to tackle this issue:

English
0
1
7
525
MohammadHossein Rezaei
MohammadHossein Rezaei@mhrezaeics·
I'm excited to share that I’ll be joining @Scale_AI as a Research Intern this summer (2025)! Many thanks to everyone who supported me throughout the process.
English
2
0
9
541
MohammadHossein Rezaei retweetledi
Yijia Shao
Yijia Shao@EchoShao8899·
Physical AI is getting a lot of attention after Jensen Huang's keynote. But can AI make decisions that align with human norms in physical scenarios? Unfortunately, not really. Check out EgoNormia project to learn more!
Diyi Yang@Diyi_Yang

Check out 🔥 EgoNormia: a benchmark for physical social norm understanding egonormia.org Can we really trust VLMs to make decisions that align with human norms? 👩‍⚖️ With EgoNormia, a 1800 ego-centric video 🥽 QA benchmark, we show that this is surprisingly challenging 🤖 🌐 arxiv.org/abs/2502.20490 Our amazing team: MohammadHossein Rezaei* (U of A), Yicheng Fu* , Phil Cuvin* (U of T), @cjziems , @StevenyzZhang , @_Hao_Zhu

English
1
10
60
11.3K
MohammadHossein Rezaei
MohammadHossein Rezaei@mhrezaeics·
🔥 Excited to share EgoNormia! A benchmark for physical social norm understanding. Can we really trust VLMs to make decisions that align with human norms? 🌐 Check out our website for the answer: egonormia.org Proud to be part of this amazing team! 🚀
Diyi Yang@Diyi_Yang

Check out 🔥 EgoNormia: a benchmark for physical social norm understanding egonormia.org Can we really trust VLMs to make decisions that align with human norms? 👩‍⚖️ With EgoNormia, a 1800 ego-centric video 🥽 QA benchmark, we show that this is surprisingly challenging 🤖 🌐 arxiv.org/abs/2502.20490 Our amazing team: MohammadHossein Rezaei* (U of A), Yicheng Fu* , Phil Cuvin* (U of T), @cjziems , @StevenyzZhang , @_Hao_Zhu

English
0
1
4
569