MohammadHossein Rezaei

5

15

2.3K

MohammadHossein Rezaei@mhrezaeics·20 Mar

@Radii2323 Did you evaluate on ProteinGYM?

English

0

3

282

Parsa Idehpour@Radii2323·20 Mar

today we launched bioreason-pro, try using it: app.bioreason.net

English

11

40

192

52.2K

MohammadHossein Rezaei@mhrezaeics·18 Mar

Claude code running✅ Insane office view✅ => Life is good.

English

14

201

MohammadHossein Rezaei retweetledi

Scale Labs@ScaleAILabs·9 Mar

Welcome to the home of all things @scale_AI research — focused on data, evaluation, safety, and post-training that moves frontier models forward. We’ll share benchmarks, insights, and work intended to be useful to the broader research community. labs.scale.com/?utm_source=hu…

English

12

46

22.8K

MohammadHossein Rezaei@mhrezaeics·21 Oca

Updates: 📌 Graduated with a B.S. in Computer Science from @uarizona 📌 Honored as Overall Outstanding Senior by @UAZScience and delivered the convocation keynote: youtu.be/zUaTPknUfPA?t=… 📌 Moved to NYC and joined @scale_AI as a Machine Learning Research Engineer, Post-training

YouTube

English

15

621

MohammadHossein Rezaei@mhrezaeics·7 Ara

@Mersad_Abbasi @jimmybajimmyba What are you building?

English

37

Mersad Abbasi@Mersad_Abbasi·7 Ara

casual 3 am fireside chat. “It will be a shame to hit the energy wall before superintelligence hits the exponential curve” -@jimmybajimmyba

Chris Park@chrisparkX

live at xAI hackathon!📍 over 500 handpicked devs. only a few hours in & many great projects built already. 1st place winners for each track include an all expensed trip to starship launch. 🚀🚀🚀

English

0

3

599

MohammadHossein Rezaei@mhrezaeics·2 Ara

I'll be at #NeurIPS this week. Would love to catch up and meet new friends!

English

5

143

MohammadHossein Rezaei retweetledi

Bing Liu@vbingliu·16 Eki

🧠 Can your model think with images? Today we’re releasing VisualToolBench, a new benchmark for multimodal reasoning with tool use, that tests whether multimodal LLMs can think-with-images, not just think about them.

English

4

28

1.2K

MohammadHossein Rezaei retweetledi

Tanishq Mathew Abraham, Ph.D.@iScienceLuvr·9 Eki

Online Rubrics Elicitation from Pairwise Comparisons "We introduce Online Rubrics Elicitation (OnlineRubrics), a method that dynamically curates evaluation criteria in an online manner through pairwise comparisons of responses from current and reference policies. This online process enables continuous identification and mitigation of errors as training proceeds. Empirically, this approach yields consistent improvements of up to 8% over training exclusively with static rubrics across AlpacaEval, GPQA, ArenaHard as well as the validation sets of expert questions and rubrics."

Tanishq Mathew Abraham, Ph.D. tweet media

English

13

2.6K

MohammadHossein Rezaei@mhrezaeics·9 Eki

@afeyzaakyurek @scale_AI I was lucky to have you as my mentor! Truly grateful for all your guidance and support throughout my internship😃

English

🔄RLHF → RLVR → Rubrics → OnlineRubrics 👤 Human feedback = noisy & coarse 🧮 Verifiable rewards = too narrow 📋 Static rubrics = rigid, easy to hack, miss emergent behaviors 💡We introduce OnlineRubrics: elicited rubrics that evolve as models train. arxiv.org/abs/2510.07284

1

57

MohammadHossein Rezaei retweetledi

Afra Feyza Akyürek@afeyzaakyurek·9 Eki

✨ We just published a new @scale_AI paper! ✨ Rubrics are great for quantifying quality in open-ended tasks but they struggle to capture emergent behaviors as RL training evolves. We propose a framework for online construction of rubrics via pairwise comparisons of model responses!

Bing Liu@vbingliu

English

22

4.1K

MohammadHossein Rezaei@mhrezaeics·9 Eki

@akyurekekin Thanks for sharing, @akyurekekin!

English

🔄RLHF → RLVR → Rubrics → OnlineRubrics 👤 Human feedback = noisy & coarse 🧮 Verifiable rewards = too narrow 📋 Static rubrics = rigid, easy to hack, miss emergent behaviors 💡We introduce OnlineRubrics: elicited rubrics that evolve as models train. arxiv.org/abs/2510.07284

64

Ekin Akyürek@akyurekekin·9 Eki

❤️

Bing Liu@vbingliu

ART

0

19

2.6K

MohammadHossein Rezaei@mhrezaeics·29 Ağu

@neilkale @scale_AI @_zifan_wang Congrats, Neil!!

English

1

47

Neil Kale@neilkale·29 Ağu

Excited to share my summer research at @scale_AI , advised by @_zifan_wang . Go check it out!

Scale AI@scale_AI

New Scale research: Can smaller models reliably oversee stronger LLM agents? We red team monitoring systems to detect covert sabotage, like agents secretly downloading sensitive information.

English

4

1

17

1.9K

MohammadHossein Rezaei@mhrezaeics·22 Tem

x.com/diyi_yang/stat…

Diyi Yang@Diyi_Yang

Check out 🔥 EgoNormia: a benchmark for physical social norm understanding egonormia.org Can we really trust VLMs to make decisions that align with human norms? 👩‍⚖️ With EgoNormia, a 1800 ego-centric video 🥽 QA benchmark, we show that this is surprisingly challenging 🤖 🌐 arxiv.org/abs/2502.20490 Our amazing team: MohammadHossein Rezaei* (U of A), Yicheng Fu* , Phil Cuvin* (U of T), @cjziems , @StevenyzZhang , @_Hao_Zhu

ZXX

Stanford NLP Group@stanfordnlp

1

165

MohammadHossein Rezaei@mhrezaeics·22 Tem

Check out EgoNormia.org at #acl2025!

.@stanfordnlp papers at @aclmeeting in Vienna next week: • HumT DumT: Measuring and controlling human-like language in LLMs @chengmyra1 @sunnyyuych @jurafsky • Controllable and Reliable Knowledge-Intensive Task Agents with Declarative GenieWorksheets @harshitj__ @ShichengGLiu … @MonicaSLam • Distilling an End-to-End Voice Assistant Without Instruction Training Data @WilliamBarrHeld @StevenyzZhang … @Diyi_Yang • SynthesizeMe! Inducing Persona-Guided Prompts for Personalized Reward Models in LLMs @michaelryan207 @oshaikh13 … @Diyi_Yang • Attacking Vision-Language Computer Agents via Pop-ups @StevenyzZhang … @Diyi_Yang • Mind the Gap: Static and Interactive Evaluations of Large Audio Models @EllaMinzhiLi @WilliamBarrHeld … @Diyi_Yang • Drop Dropout on Single Epoch Language Model Pretraining @houjun_liu @AngledLuffa @chrmanning • ACLED-DS: A Large Multilingual Expert-Annotated Event Dataset for the Real World @sina_semnani … @MonicaSLam • SPHERE: An Evaluation Card for Human-AI Systems @dorazhao9 … @Diyi_Yang • EgoNormia: Benchmarking Physical Social Norm Understanding @mhrezaeics … @Diyi_Yang • Tell, Don’t Show: Leveraging Language Models’ Abstractive Retellings to Model Literary Themes @lucy3_li @camgriffi … @ddemszky

English

4

2.9K

MohammadHossein Rezaei@mhrezaeics·22 Tem

@lasha_nlp @mpi_sws_ Congratulations!!

English

MohammadHossein Rezaei@mhrezaeics

0

1

247

Abhilasha Ravichander@lasha_nlp·22 Tem

Life update: I’m excited to share that I’ll be starting as faculty at the Max Planck Institute for Software Systems(@mpi_sws_) this Fall!🎉 I’ll be recruiting PhD students in the upcoming cycle, as well as research interns throughout the year: lasharavichander.github.io/contact.html

English

83

46

594

61.9K

MohammadHossein Rezaei@mhrezaeics·30 Nis

If you’re at NAACL today, I’ll be presenting this poster in Hall 3 from 2:00 – 3:30 PM. Paper link: aclanthology.org/2025.naacl-lon…

1/🚨 Thrilled to share that our paper (w/ @eduardo_nlp), "Making Language Models Robust Against Negation," has been accepted to the #NAACL2025 main conference! 🎉 #Negation has always been a challenge for language models. Here's our self-supervised method to tackle this issue:

English

1

7

525

MohammadHossein Rezaei@mhrezaeics·1 Nis

I'm excited to share that I’ll be joining @Scale_AI as a Research Intern this summer (2025)! Many thanks to everyone who supported me throughout the process.

English

0

9

541

MohammadHossein Rezaei retweetledi

Yijia Shao@EchoShao8899·4 Mar

Physical AI is getting a lot of attention after Jensen Huang's keynote. But can AI make decisions that align with human norms in physical scenarios? Unfortunately, not really. Check out EgoNormia project to learn more!

Diyi Yang@Diyi_Yang

Check out 🔥 EgoNormia: a benchmark for physical social norm understanding egonormia.org Can we really trust VLMs to make decisions that align with human norms? 👩‍⚖️ With EgoNormia, a 1800 ego-centric video 🥽 QA benchmark, we show that this is surprisingly challenging 🤖 🌐 arxiv.org/abs/2502.20490 Our amazing team: MohammadHossein Rezaei* (U of A), Yicheng Fu* , Phil Cuvin* (U of T), @cjziems , @StevenyzZhang , @_Hao_Zhu

English

10

60

11.3K

MohammadHossein Rezaei@mhrezaeics·4 Mar

🔥 Excited to share EgoNormia! A benchmark for physical social norm understanding. Can we really trust VLMs to make decisions that align with human norms? 🌐 Check out our website for the answer: egonormia.org Proud to be part of this amazing team! 🚀

Diyi Yang@Diyi_Yang

Check out 🔥 EgoNormia: a benchmark for physical social norm understanding egonormia.org Can we really trust VLMs to make decisions that align with human norms? 👩‍⚖️ With EgoNormia, a 1800 ego-centric video 🥽 QA benchmark, we show that this is surprisingly challenging 🤖 🌐 arxiv.org/abs/2502.20490 Our amazing team: MohammadHossein Rezaei* (U of A), Yicheng Fu* , Phil Cuvin* (U of T), @cjziems , @StevenyzZhang , @_Hao_Zhu

English