
Daniel Fried
937 posts

Daniel Fried
@dan_fried
Assistant prof. @LTIatCMU @SCSatCMU. Working on NLP: LLM agents, language-to-code, applied pragmatics, grounding.


AI agents are tackling more and more "human work" But are they benchmarked on the work people actually do? tl;dr: Not really Most benchmarks focus on math & coding, while most human labor and capital lie elsewhere. 📒 We built a database linking agent benchmarks & real-world work Submit new tasks + agent trajectories today 🧵

Training on issue-solving only does NOT guarantee transfer to other tasks. 🎨Introducing Hybrid-Gym - synthetic training tasks for generalization (hybrid-gym.github.io) +25.4% on SWE-Bench / +7.9% on SWT-Bench / +5.1% on Commit-0 with NO issue-solving / test-gen/... training

Training on issue-solving only does NOT guarantee transfer to other tasks. 🎨Introducing Hybrid-Gym - synthetic training tasks for generalization (hybrid-gym.github.io) +25.4% on SWE-Bench / +7.9% on SWT-Bench / +5.1% on Commit-0 with NO issue-solving / test-gen/... training








‼️Position: AI coding agent research needs recalibration. We've heavily optimized for solo autonomy, and far less for designing agents that empower the humans using them. It’s time to build human-centered coding agents. 🧵


Are you interested in learning more about the world of language technology? Are you a pre-doctoral student – even one from a non-CS background? Our Language Technology for All Internship could be right for you! Apply by Feb. 20 for Summer 2026: lti.cmu.edu/news-and-event…

There's been a lot of excitement about diffusion LMs, but we don't have a good understanding of what they might offer over autoregressive models in areas like reasoning and planning We show that diffusion models can leverage what we call latent tokens to gain an advantage on certain kinds of reasoning problems.



Software agents can self-improve via self-play RL Introducing Self-play SWE-RL (SSR): training a single LLM agent to self-play between bug-injection and bug-repair, grounded in real-world repositories, no human-labeled issues or tests. 🧵


The best poster awards go to: 1. Go-Browse: Training Web Agents with Structured Exploration Apurva Gandhi, Graham Neubig 2. Scaling Open-Ended Reasoning to Predict the Future Nikhil Chandak, Shashwat Goel, Ameya Prabhu, Moritz Hardt, Jonas Geiping 🎉Congrats!


