Emily Cheng @ ICLR

66 posts

Emily Cheng @ ICLR

@sparse_emcheng

PhD @colt_upf in computational linguistics What is the happiest state? Maryland 💅🏼 Before: Apple MLR, MIT CSAIL, ENS

Sumali Nisan 2022

171 Sinusundan169 Mga Tagasunod

Emily Cheng @ ICLR nag-retweet

Marine Biological Laboratory (MBL)@MBLScience·15 Mar

DEADLINE APPROACHING! Apply to Brains, Minds, and Machines by March 23. This three-week course is designed for graduate students, postdocs, and faculty in computer science or neuroscience. 🔗More info here: go.mbl.edu/BMM

Marine Biological Laboratory (MBL) tweet media

English

6.1K

Emily Cheng @ ICLR@sparse_emcheng·6 Mar

@zxlzr Really cool work! We found similar results in our ICLR 2026 paper, where more granular control goals hurt both controllability and the sample complexity needed to estimate controllable sets. Might be relevant: openreview.net/forum?id=HJTFg…

English

133

Ningyu Zhang@ZJU@zxlzr·4 Mar

How controllable is a Large Language Model, really? 🧐 We often prompt LLMs to "be polite" or "act like a pirate," but the gap between intent and instantiation remains a black box. Introducing our latest work SteerEval: “How Controllable Are Large Language Models? A Unified Evaluation across Behavioral Granularities” 📄 Paper: huggingface.co/papers/2603.02… 💻 Code: github.com/zjunlp/EasyEdi… 📊 Datasets: huggingface.co/datasets/zjunl… 🛠️ What is SteerEval? It’s not just a dataset—it’s a domain-extensible, hierarchical conceptual framework for automatic benchmark synthesis. By combining this automated pipeline with rigorous human verification, we introduce a principled benchmark to audit LLM controllability across 4 domins, including Language Features (Form), Reasoning Patterns (Thought), Sentiment (Emotion), and Personality (Soul). 📊✨ 🧠 Grounded in Marr’s Three-Level Theory To bridge the "intent-realization gap," we borrow from arr’s three levels of analysis, reframing LLM behavior into a Triple-Level Specification (L1-L3): 🎯 L1: Computational Level (What to express) The behavioral goal/intent (e.g., "Be Enthusiasm"). ⚙️ L2: Algorithmic Level (How to express it) The behavioral strategy & patterns (e.g., "Use active voice and energized praise"). ✍️ L3: Implementational Level (How to instantiate it) The physical textual realization (e.g., "Must includes 'hooray' twice"). 🔍 Key Findings: The "Granularity Gap" 📉 Our evaluation of many steering methods reveals a striking "Granularity Gap": Steered LLMs may follow high-level commands (L1) while failing to maintain the underlying behavioral DNA at the implementational level (L3). Surface-level obedience ≠ Deep-level control. 💡 🚀 Why it matters? Structured Auditing: Provides a "mechanistic map" for behavioral safety. 🗺️ Scalable Synthesis: The framework allows researchers to easily extend SteerEval to new behavioral domains. 🏗️ Beyond Prompting: Shifts the focus from "black-box prompting" to "fine-grained behavioral engineering." 🧬 We hope SteerEval serves as a foundation for building LLMs that are not just powerful, but truly predictable and faithful to human intent. 🤝 Would you like to see how your model performs on the L1-L3 hierarchy? Let’s chat! 💬 #Steering #SteerEval #KnowledgeEditing #AI #NLP #LLMs

English

3.3K

Emily Cheng @ ICLR nag-retweet

Jason Ramapuram@jramapuram·26 Şub

Autoregressive models dominate, but what if we treat multimodal generation as discrete order agnostic iterative refinement? Excited to share our systematic study on the design space of Tri-Modal Masked Diffusion Models (MDMs). We pre-trained the first Tri-Modal MDM from scratch on (text,), (image, text), and (audio, text). The same model can do ASR, TTS, T2I, captioning and native text generation. What I'm the most proud of in this work is the scientific rigor. Over 3,500 training runs. Principled hyperparameter transfer. Honest results. Carefully controlled ablations across multiple different axis of entanglement. A thread on our empirical findings (arXiV: arxiv.org/abs/2602.21472)

English

235

39.2K

Emily Cheng @ ICLR nag-retweet

Davide Zoccolan@davide_zoccolan·17 Şub

Our Stretch-and-Squeeze (SnS) paper has been accepted to #ICLR2026 @iclr_conf ! Credits to all the other authors for their great work: @LTausani, P. Muratore, @MorganBDTalbot, @giacomo_amerio and @gkreiman . Here’s a thread explaining SnS (full paper: arxiv.org/abs/2506.17040) 1/5

English

580

Emily Cheng @ ICLR nag-retweet

Joséphine Raugel@JRaugel·3 Ara

We’re so glad to be sharing this new project today at @NeuripsConf: “Scaling and Context Steer LLMs along the Same Computational Path as the Human Brain.” 📍 Come chat at Hall C–E, Poster #2006, from 4 PM 💫🧠

Jean-Rémi King@JeanRemiKing

🎉 Our paper has been selected for a Neurips Spotlight: “Scaling and Context Steer LLMs along the Same Computational Path as the Human Brain” 👥led by @JRaugel w/ @stephanedascoli, Jérémy Rapin, @valentinwyart 📄openreview.net/pdf?id=4YKlo58… 📍 Hall C-E Poster #2006 🧵thread 👇

English

2.5K

Emily Cheng @ ICLR nag-retweet

Geeling Chau@GeelingC·1 Ara

I'll be at #NeurIPS2025 presenting w/: 1️⃣ Eshani Patel and @yisongyue on generalizing models to input preprocessing 2️⃣ Chris Sandino et al., on new temporal reps of EEG 3️⃣ @shivashriganesh et al., on new iEEG training frameworks 📍Brain & Body FM on Sat 😍 🧠? Let's chat!! 🔗⬇️

English

Emily Cheng @ ICLR nag-retweet

Preetum Nakkiran@PreetumNakkiran·21 Kas

If you liked our calibration paper and want to work with me & our team, please apply to this PhD internship. 6-months in our Paris office:

Michael Kirchhof@mkirchhof_

Our research team is hiring PhD interns 🍏 Spend your next summer in Paris and explore the next frontiers of LLMs for uncertainty quantification, calibration, RL and post-training, and Bayesian experimental design. Details & Application ➡️ jobs.apple.com/en-my/details/…

English

185

25.7K

Emily Cheng @ ICLR nag-retweet

Cohere Labs@Cohere_Labs·12 Kas

@weiyinko_ml @singhshiviii @beyzaermis Cohere Labs Connect Conference Lightning Talk ⚡️ How Good Are LLMs at Multi-Session Coding Interactions? with @carraznathanael @mhamdy_res They'll explore how well LLMs maintain continuity and context across multiple coding sessions. 📜arxiv.org/abs/2502.13791

English

199

Emily Cheng @ ICLR nag-retweet

Federico Danieli@FedericoDa40495·4 Kas

𝗣𝗮𝗿𝗮𝗥𝗡𝗡: 𝗨𝗻𝗹𝗼𝗰𝗸𝗶𝗻𝗴 𝗣𝗮𝗿𝗮𝗹𝗹𝗲𝗹 𝗧𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝗼𝗳 𝗡𝗼𝗻𝗹𝗶𝗻𝗲𝗮𝗿 𝗥𝗡𝗡𝘀 𝗳𝗼𝗿 𝗟𝗟𝗠𝘀 For years, we’ve given RNNs for doomed, and looked at Transformer as 𝘁𝗵𝗲 LLM—but we just needed better math 📄arxiv.org/abs/2510.21450 💻github.com/apple/ml-parar…

English

1.5K

Emily Cheng @ ICLR nag-retweet

Pau Rodríguez@prlz77·21 Eki

🚀 Excited to share LinEAS, our new activation steering method accepted at NeurIPS 2025! It approximates optimal transport maps e2e to precisely guide 🧭 activations achieving finer control 🎚️ with ✨ less than 32 ✨ prompts! 💻github.com/apple/ml-lineas 📄arxiv.org/abs/2503.10679

English

5.4K

Emily Cheng @ ICLR nag-retweet

Mario Giulianelli@glnmario·11 Eki

I am hiring a PhD student to start my lab at @ucl! Get in touch if you have any questions, the deadline to apply through ELLIS is 31 October. More details🧵

English

153

634

101.2K

Emily Cheng @ ICLR nag-retweet

Julian Coda-Forno@juliancodaforno·7 Eki

New paper from my Meta internship! 🚀 We explored dual-architecture communication for latent reasoning in LLMs ☯️ —accepted at the #NeurIPS2025 Foundations of Reasoning in LLMs workshop. Paper: arxiv.org/pdf/2510.00494 1/9 🧵

English

Emily Cheng @ ICLR nag-retweet

Tankred Saanum@TankredSaanum·2 Eki

Induction heads are surprisingly powerful. In a new preprint, we find that they can learn what to attend to in-context! We study this in a hierarchical prediction task and uncover a possible mechanism giving rise to in-context learning in induction heads. See thread for details!

English

965

Emily Cheng @ ICLR nag-retweet

Tal Linzen@tallinzen·1 Eki

LLMs as a synthesis between symbolic and distributed approaches to language 💯 arxiv.org/abs/2502.11856

English

3.2K

Emily Cheng @ ICLR nag-retweet

CBMM@MIT_CBMM·29 Ağu

The Center for Brains, Minds & Machines [2013-2025]

English

176

38.8K

Emily Cheng @ ICLR nag-retweet

Richard Antonello@NeuroRJ·18 Ağu

In our new paper, we explore how we can build encoding models that are both powerful and understandable. Our model uses an LLM to answer 35 questions about a sentence's content. The answers linearly contribute to our prediction of how the brain will respond to that sentence. 1/6

English

114

10.1K

Emily Cheng @ ICLR nag-retweet

Zhijing Jin@ZhijingJin·12 Ağu

Our "Competitions of Mechanisms" paper proposes an interesting way to interpret LLM behaviors thru how it handles multiple conflicting mechanisms. E.G., in-context knowledge vs. in-weights knowledge🧐This is an elegant philophical way of thinking --

English

265

26.9K

Emily Cheng @ ICLR nag-retweet

Mario Giulianelli@glnmario·1 Ağu

I will be a SPAR mentor this Fall🤖 Check out the programme and apply by 20 August to work with me on formalising and/or measuring and/or intervening on goal-directed behaviour in AI agents More info on potential projects here 🧵

English

2.4K

Emily Cheng @ ICLR nag-retweet

Andrea Santilli@teelinsan·22 Tem

Uncertainty quantification (UQ) is key for safe, reliable LLMs... but are we evaluating it correctly? 🚨 Our ACL2025 paper finds a hidden flaw: if both UQ methods and correctness metrics are biased by the same factor (e.g., response length), evaluations get systematically skewed

English

4.1K

Emily Cheng @ ICLR nag-retweet

Bhaskar Mitra | ভাস্কর মিত্র@UnderdogGeek·17 Tem

A personal update. bhaskar-mitra.github.io/posts/2025/07/…

English

12K

Tuklasin

@zxlzr @iclr_conf @LTausani @MorganBDTalbot @giacomo_amerio @gkreiman @NeuripsConf @yisongyue