

Kelly Buchanan
1K posts

@ekellbuch
Postdoctoral Fellow @Stanford with @HazyResearch and @Scott_linderman. Working on 🤖🧠 PhD @Columbia @ZuckermanBrain @GoogleAI






We improved Composer by scaling training, generating more complex RL environments, and introducing new learning methods. For example, we use text feedback during RL to learn faster by assigning credit in rollouts spanning hundreds of thousands of tokens.

Introducing Gemma-4-31B-it-Pearl on Together AI, Pearl Research Labs’ instruction-tuned checkpoint of Gemma 4 31B powered by @prlnet Proof of Useful Work protocol. AI natives can now use this Pearl model as a serverless inference endpoint on Together AI, at a 25%+ discounted pricing.




Very excited to release Terminal-Bench 2.1! Coding agents are among the most economically consequential deployments of LLMs to date. As agents improve, benchmark reliability matters more. We audited TB2.0 and found and corrected issues in 28/89 tasks. 30% of the benchmark! But the rankings survived, absolute scores moved up to 12pp!


last week was a fun week for benchmarks, which advanced the key axes for measuring frontier AI: - Legal Agent Benchmark (LAB) (from @harvey) → environment complexity: 1200+ tasks covering realistic instructions and work products, with expert rubrics - Continual Learning Bench (from @BerkeleySky & @SnorkelAI) → autonomy horizon: the first benchmark to capture ability of AI systems to learn from experience - ProgramBench (from @Meta & @StanfordAILab) → output complexity: expanding the scope of tasks from patches to entire programs, with a 0% pass rate - Bonus (from @ekellbuch & @terminalbench): TerminalBench 2.1 released with 28/89 tasks audited & fixed- showing that continuous quality control & task-level rigor are critical for enduring benchmarks!

People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. thinkingmachines.ai/blog/interacti…





The Sam Altman and @miramurati texts from the day he got fired from @OpenAI in 2023 just became evidence in the @elonmusk v. @sama trial. It felt like a meaningful moment in AI history, so I turned it into a musical. The lyrics are the texts.





Reproducing all of Schmidhuber’s papers (1990-2025) using an AI coding assistant. Cool project by @yaroslavvb! It even reproduced the “World Models” paper by me and @SchmidhuberAI with a toy env, with a full VAE + RNN world model implementation. Project: github.com/cybertronai/sc…