MIT NLP

99 posts

MIT NLP banner
MIT NLP

MIT NLP

@nlp_mit

NLP Group at @MIT_CSAIL! PIs: @yoonrkim @jacobandreas @lateinteraction @pliang279 @david_sontag, Jim Glass, @roger_p_levy

Cambridge, MA Beigetreten Mart 2025
63 Folgt4.2K Follower
Angehefteter Tweet
MIT NLP
MIT NLP@nlp_mit·
Hello everyone! We are quite a bit late to the twitter party, but welcome to the MIT NLP Group account! follow along for the latest research from our labs as we dive deep into language, learning, and logic 🤖📚🧠
MIT NLP tweet media
English
26
54
551
105.5K
MIT NLP retweetet
Isha Puri
Isha Puri@ishapuri101·
ChatGPT several times where's best to go for spring break? It recommends Barcelona almost every time. This isn't a fluke. RL training rewards one best answer, so the model learns to commit to one mode and repeat it. Meet Multi-Answer RL: a simple RL method that trains LMs to reason through and output a distribution of answers in a single generation. [1/N]
Isha Puri tweet media
English
22
73
443
94.8K
MIT NLP retweetet
Ao Qu
Ao Qu@ao_qu18465·
(1/n)🚀 We’re excited to introduce CORAL, an extensible infrastructure for autonomous multi-agent evolution. You can think of CORAL as a system for running something close to @karpathy’s AutoResearch on arbitrary tasks — but more robustly and safely, with multi-agent communication and persistent knowledge accumulation. Even the first results are already striking: 🏆 4 agents pushed Anthropic’s kernel engineering take-home score from 1363 (the previous best public score) to 1103 clock cycles ⚡ With the same base model (Opus 4.6), single-agent CORAL achieves 2.5× higher improvement rate and 10× faster evolution than OpenEvolve on Erdős Minimum Overlap, reaching 0.3808878 and surpassing the best score reported in AlphaEvolve (0.380924) 👥 When agents evolve together, we observe emergent organizational behaviors: independent research, cross-referencing, and spontaneous consensus-building We now believe we are at a critical intersection: between increasingly capable self-evolving agents and a still-unclear science of how they should collaborate, organize, and co-evolve with humans. We wrote this blog (human-agent-society.github.io/CORAL/) to document the early signals, surface the open questions, and invite the community to help shape this emerging frontier. The code for our infra is fully open-source: github.com/Human-Agent-So… #AI #Agents #SelfEvolvingAgents #MultiAgentSystems #LLM #OpenSource #AlphaEvolve #AutoResearch
Ao Qu tweet media
English
11
19
86
14K
MIT NLP retweetet
Seungwook Han
Seungwook Han@seungwookh·
Can language models learn useful priors without ever seeing language? We pre-pre-train transformers on neural cellular automata — fully synthetic, zero language. This improves language modeling by up to 6%, speeds up convergence by 40%, and strengthens downstream reasoning. Surprisingly, it even beats pre-pre-training on natural text! Blog: hanseungwook.github.io/blog/nca-pre-p… (1/n)
Seungwook Han tweet media
English
47
261
1.7K
246.7K
MIT NLP retweetet
jenny huang
jenny huang@JennyHuang99·
🧵1/ 🤔New paper: Do LLMs Benefit from Their Own Words? In multi-turn chats, models are typically given their own past responses as context. But do their own words always help… or can they sometimes be a distraction?
jenny huang tweet media
English
6
32
170
17.6K
MIT NLP retweetet
Leshem (Legend) Choshen 🤖🤗
Don't complain. Do it yourself. When the @evaluatingevals coalition started studying together what is broken in evaluation, I knew what we need to do. We need to digitize evals. How come every evaluation is reported differently? In a separate place? Every Eval Ever:
EvalEval Coalition@evaluatingevals

🚀 Launching Every Eval Ever: Toward a Common Language for AI Eval Reporting 🚀 A shared schema + crowdsourced repository so we can finally compare evals across frameworks and stop rerunning everything from scratch 🔧 A tale of broken AI evals 🧵👇 evalevalai.com/projects/every…

English
0
2
14
1.7K
MIT NLP retweetet
Leshem (Legend) Choshen 🤖🤗
Agents should be general. Why are we building code agents, CLI agents, browser agents separately? Why does adapting to a new benchmark take a month? Our collaboration brings diverse views, pros here cons in the paper & Your push back if I’m wrong. Argument + paper link 👇🧵
Leshem (Legend) Choshen 🤖🤗 tweet media
English
1
3
18
1.7K
MIT NLP retweetet
dvd@dvd.chat
[email protected]@ddvd233·
Any-to-Any Multimodal Learning Workshop @CVPR 2026 We are organizing an AnyToAny Multimodal Learning workshop, exploring unified learning across vision, language, audio, 3D, video, and beyond. Call for papers: a2a-mml-2026.vercel.app
dvd@dvd.chat tweet media
English
0
9
37
14.3K
MIT NLP retweetet
alex zhang
alex zhang@a1zhang·
I was considering waiting a while to polish this first, but decided it'd be better to just release an initial version to get better community feedback and squash bugs! This is the official RLM repo, with native support for cloud-based and local REPLs. github.com/alexzhang13/rlm
English
46
131
1.2K
120.1K
MIT NLP retweetet
alex zhang
alex zhang@a1zhang·
Much like the switch in 2025 from language models to reasoning models, we think 2026 will be all about the switch to Recursive Language Models (RLMs). It turns out that models can be far more powerful if you allow them to treat *their own prompts* as an object in an external environment, which they understand and manipulate by writing code that invokes LLMs! Our full paper on RLMs is now available—with much more expansive experiments compared to our initial blogpost from October 2025! arxiv.org/pdf/2512.24601
alex zhang tweet media
English
253
1.1K
7.4K
2M
MIT NLP retweetet
MIT NLP retweetet
Philip Schroeder
Philip Schroeder@_pschro·
Excited to share our NeurIPS 2025 paper introducing our video reasoning framework, ROVER (Reasoning Over VidEo Recursively), that improves visual understanding of VLMs in embodied settings. ROVER is a recursive framework that enables the model to maintain a compact attention window at each timestep of the video, without losing global context across the full video. ROVER works by decomposing the video into segments corresponding to each subtask within the full task trajectory. ROVER then generates a separate line of reasoning for each subtask, instead of attempting to reason across the full trajectory. We evaluate on simulated and real-world robotic manipulation tasks from RoboCasa and OpenX Embodiment. Overall, ROVER significantly improves the ability of VLMs to reason about what is happening at each moment during a robot task attempt. rover-vlm.github.io
English
1
4
8
2.2K
MIT NLP retweetet
dvd@dvd.chat
[email protected]@ddvd233·
在北京野生动物园买到了稳定的卡皮巴拉,希望之后训练模型也能保佑我稳定收敛(
dvd@dvd.chat tweet media
中文
5
2
76
5K
MIT NLP retweetet
Chanakya Ekbote
Chanakya Ekbote@thecekbote·
Ever wondered how LLMs generalize to entirely new patterns? In our Spotlight paper at #neurips2025, we study this in a fully controlled setting and show the minimal transformer architecture needed to learn induction heads. Paper Link: arxiv.org/pdf/2508.07208 🧵👇
Chanakya Ekbote tweet media
English
1
16
45
12.1K
MIT NLP retweetet
Pratyusha Sharma ✈️ NeurIPS
Pratyusha Sharma ✈️ NeurIPS@pratyusha_PS·
📢 Some big (& slightly belated) life updates! 1. I defended my PhD at MIT this summer! 🎓 2. I'm joining NYU as an Assistant Professor starting Fall 2026, with a joint appointment in Courant CS and the Center for Data Science. 🎉 🔬 My lab will focus on empirically studying the science of deep learning and applying deep learning to accelerate the natural sciences. Very broadly interested in questions at the intersection of language, reasoning and sequential decision making. (Plus any other fun problems that catch our eye along the way!) 🚀 I am recruiting 2 PhD students for this cycle! If you're interested in joining, please apply here: cs.nyu.edu/dynamic/phd/ad… cds.nyu.edu/phd-admissions…
Pratyusha Sharma ✈️ NeurIPS tweet mediaPratyusha Sharma ✈️ NeurIPS tweet mediaPratyusha Sharma ✈️ NeurIPS tweet media
English
101
94
1.8K
244.1K
MIT NLP retweetet
Chanakya Ekbote
Chanakya Ekbote@thecekbote·
How do we teach LLMs not just to reason, but to reflect, debug, and improve themselves? We at AWS AI Labs introduce MURPHY 🤖, a multi-turn RL framework that brings self-correction into #RLVR (#GRPO). 🧵👇 Link: arxiv.org/abs/2511.07833
English
2
21
32
5.5K
MIT NLP retweetet
Shannon Shen
Shannon Shen@shannonzshen·
Today's AI agents are optimized to complete tasks in one shot. But real-world tasks are iterative, with evolving goals that need collaboration with users. We introduce collaborative effort scaling to evaluate how well agents work with people—not just complete tasks 🧵
Shannon Shen tweet media
English
7
53
284
104.8K
MIT NLP retweetet
Zhaofeng Wu
Zhaofeng Wu@zhaofeng_wu·
Just arrived in Suzhou to present reWordBench at #EMNLP2025. Come to our talk to hear how SOTA reward models can easily break under minor input transformations, and how to fix it! 🗓️ Wed 11/5 🕒 3:00 PM 📍 Safety & Alignment session
Zhaofeng Wu@zhaofeng_wu

Robust reward models are critical for alignment/inference-time algos, auto eval, etc. (e.g. to prevent reward hacking which could render alignment ineffective). ⚠️ But we found that SOTA RMs are brittle 🫧 and easily flip predictions when the inputs are slightly transformed 🍃 🧵

English
2
8
53
8.3K