Kelly Buchanan

944 posts

Kelly Buchanan

@ekellbuch

Postdoctoral Fellow @Stanford with @HazyResearch and @Scott_linderman. Working on 🤖🧠 PhD @Columbia @ZuckermanBrain @GoogleAI

Palo Alto, CA Beigetreten Temmuz 2011

2.2K Folgt1.3K Follower

Angehefteter Tweet

Kelly Buchanan@ekellbuch·24 Haz

LLMs can generate 100 answers, but which one is right? Check out our latest work closing the generation-verification gap by aggregating weak verifiers and distilling them into a compact 400M model. If this direction is exciting to you, we’d love to connect.

Jon Saad-Falcon@JonSaadFalcon

How can we close the generation-verification gap when LLMs produce correct answers but fail to select them? 🧵 Introducing Weaver: a framework that combines multiple weak verifiers (reward models + LM judges) to achieve o3-mini-level accuracy with much cheaper non-reasoning models like Llama 3.3 70B Instruct! 🧵(1 / N)

English

11.6K

Kelly Buchanan retweetet

Vinod Khosla@vkhosla·1h

The right approach to many things we believe.

Andrej Karpathy@karpathy

- Drafted a blog post - Used an LLM to meticulously improve the argument over 4 hours. - Wow, feeling great, it’s so convincing! - Fun idea let’s ask it to argue the opposite. - LLM demolishes the entire argument and convinces me that the opposite is in fact true. - lol The LLMs may elicit an opinion when asked but are extremely competent in arguing almost any direction. This is actually super useful as a tool for forming your own opinions, just make sure to ask different directions and be careful with the sycophancy.

English

16.4K

Kelly Buchanan retweetet

Ziran Yang@__zrrr__·2d

Introducing Goedel-Code-Prover 🌲 LLMs write code, but can they prove it correct? Not just pass tests, but construct machine-checkable proofs that a program works for ALL possible inputs. We built a system that does exactly this. Given aprogram and its specification in Lean 4, Goedel-Code-Prover automatically synthesizes formal proofs ofcorrectness. Our 8B model achieves 62% overall success rate across three benchmarks (Verina, Clever &AlgoVeri), a 2.6x improvement over the strongest baseline, surpassing both frontier LLMs (GPT/Gemini/Claude)and open-source theorem provers up to 84x larger (DeepSeek-Prover/Goedel-Prover/Kimina-Prover/BFS-Prover).

English

545

64.8K

Kelly Buchanan@ekellbuch·1d

@Avanika15 @OpenJarvisAI hahaha

Filipino

Avanika Narayan@Avanika15·1d

don’t be like the agent 🐐 @ekellbuch 😊 run your agents locally with @OpenJarvisAI avoid internet overage fees

English

1.1K

Kelly Buchanan retweetet

Chroma@trychroma·2d

Introducing Chroma Context-1, a 20B parameter search agent. > pushes the pareto frontier of agentic search > order of magnitude faster > order of magnitude cheaper > Apache 2.0, open-source

English

137

397

4.1K

Kelly Buchanan retweetet

Stuart Sul@stuart_sul·3d

Happy to share new ThunderKittens attention kernels for B300 GPUs -- faster than FA4! Check it out:

Nash Brown@nash_c_brown

Excited to share new ThunderKittens attention kernels that match or outperform Flash Attention 4 on Blackwell GPUs! Currently only supports QK192/V128 shapes, but more coming soon. Check out the code here: github.com/HazyResearch/T… Shoutout to the FA4 team for the algorithmic innovations and to @stuart_sul for the helpful discussions.

English

152

13.2K

Kelly Buchanan retweetet

Phillip Isola@phillip_isola·3d

@GuanyaShi I mostly see "algorithmic novelty" as a cost. That cost needs to be justified by a sufficiently surprising result (e.g., a new capability or insight). All the better if you get the same with zero change in method. My rough heuristic: value = log P(results) - log P(methods)

English

105

5.9K

Kelly Buchanan@ekellbuch·3d

Great blogpost on writing papers, something I will keep coming back to.

Neel Guha@NeelGuha

I wrote a blogpost about writing machine learning research papers (e.g., NeurIPS, ICML, ICLR, etc.). The core idea is that most papers follow one of a predetermined set of templates. The post talks about each template, describes their rules, and offers examples...

English

286

Kelly Buchanan retweetet

Fireworks AI@FireworksAI_HQ·4d

We’re seeing lots of interest in how Cursor delivered Composer 2. One less obvious insight: you don't need to spend billions on a giant cluster to do reinforcement learning. With disaggregated sampling, we ran @Cursor_ai Composer 2 training across 3-4 clusters worldwide, with a unified capacity of Fireworks Virtual Cloud. Check how we optimize cross-region 1TB+ model updates by 98%+ while keeping staleness under a few minutes: fireworks.ai/blog/frontier-…

Cursor@cursor_ai

We're releasing a technical report describing how Composer 2 was trained.

English

330

77.9K

Kelly Buchanan retweetet

Xavier Gonzalez@xavierjgonzalez·4d

Parallelizing nonlinear RNNs is gaining traction! More efficient than transformers; more expressive than linear RNNs. My PhD thesis provides an intro guide to the math (Newton's method) behind the parallelization. Great as a quick-start if you want to explore this new field!

English

359

31.3K

Kelly Buchanan retweetet

Stuart Sul@stuart_sul·4d

Happy to share this technical report! Building MXFP8/NVFP4 training kernels for Composer 2 with ThunderKittens/ParallelKittens was a lot of fun. We share some details in the report, including our novel variant of NVFP4:

Cursor@cursor_ai

We're releasing a technical report describing how Composer 2 was trained.

English

200

17.1K

Kelly Buchanan retweetet

Stephen Roller@stephenroller·12 Eki

@srush_nlp I find people unfamiliar with scaling are shocked by this:

English

279

Kelly Buchanan@ekellbuch·5d

@yusan_lin @mirrormirror_ai congrats Yusan!

Català

255

Yusan Lin@yusan_lin·5d

Today @mirrormirror_ai is launching the marketplace where fashion models license their likeness and brands get stunning AI-generated imagery featuring real people. Commercially licensed, model-approved. Try our platform: mirrormirrorai.com As a fashion model I used to spend hours on fashion photoshoot sets. I later did my PhD in CS and became a Research Scientist on AI for fashion. I can see clearly that AI image generation is replacing a large portion of my old job. But brands that use AI recklessly have already paid the price. It damages reputations and hurts the bottom line. Putting real people at the core of AI-generated imagery isn't just about avoiding backlash. It's better business. That's what Mirror Mirror AI is built for. Right now, Mirror Mirror AI houses agency-signed models who have graced the covers of Vogue and Harper's Bazaar. You can digitally book them using our fashion-centric AI software, get your campaign done in hours instead of weeks, and never have to fly anyone in. You purchase a license for commercial use upon approval, and the models get paid. Mirror Mirror AI is also opening a global call for independent models from anywhere in the world to apply to be featured on the platform. Work with fashion brands internationally, choose the projects you take on, and earn from your own likeness on your own terms. Selected models will be announced at an exclusive event in New York during @Techweek_ this June. Apply for the open call: mirrormirrorai.com/open-call A huge thank you to our incredible team for pouring their hearts into this launch, and to a16z @speedrun for believing in our vision from the start. We're just getting started.

English

112

832

201.9K

Kelly Buchanan retweetet

Christina Baek@_christinabaek·18 Mar

Models are typically specialized to new domains by finetuning on small, high-quality datasets. We find that repeating the same dataset 10–50× starting from pretraining leads to substantially better downstream performance, in some cases outperforming larger models. 🧵

English

614

90.4K

Kelly Buchanan retweetet

Yi Ma@YiMaTweets·20 Mar

This is precisely what Chapter 7 of our new book says: ma-lab-berkeley.github.io/deep-represent…

CLaE@leafs_s

Transformers are Bayesian Networks arxiv.org/abs/2603.17063

English

791

83.7K

Kelly Buchanan retweetet

Jon Saad-Falcon@JonSaadFalcon·12 Mar

Personal AI should run on your personal devices. So, we built OpenJarvis: a personal AI that lives, learns, and works on-device. Try it today and top the OpenJarvis Leaderboard for a chance to win a Mac Mini! Collab w/ @Avanika15, John Hennessy, @HazyResearch, and @Azaliamirh. Details in thread.

English

319

98.9K

Kelly Buchanan retweetet

Zitong Yang@ZitongYang0·13 Mar

This is only possible with @tyler_griggs_'s tool use library github.com/thinking-machi… I am unfortunately late to the party, but I only recently realized how much of a paradigm shift multi-turn+tool-use is. I even wonder if it makes sense to rewrite the entire pretraining corpus into an agentic trajectory? This solves two problems: (1) removing the gap between pretraining and test distribution; (2) agentic turn change can function as a natural "glue" that puts related internet documents together in context -- agent browsing one document at turn 7 influences its action/generation at turn 107 -- encoding the internet in a natural long-context format. Also, a great time to share that I have joined @thinkymachines. Thanks @miramurati for teaching me the value of focus, @lilianweng for instilling in me the power of responsibility, and @johnschulman2 for showing me by example the free spirit of scientific exploration! We are hiring job-boards.greenhouse.io/thinkingmachin…

clare ❤️‍🔥@clarejtbirch

kind of a big deal but actual legend @ZitongYang0 has integrated @tinkerapi with @harborframework, so you can use Harbor on Tinker w ~no code change now 🤠🧡

English

116

30.2K

Kelly Buchanan retweetet

Omar Shaikh@oshaikh13·10 Mar

What’s the point of a “helpful assistant” if you have to always tell it what to do next? In a new paper, we introduce a reasoning model that predicts what you’ll do next over long contexts (LongNAP 💤). We trained it on 1,800 hours of computer use from 20 users. 🧵

English

291

98.3K

Kelly Buchanan retweetet

Sam Buchanan@_sdbuchanan·10 Mar

We've released an updated "v2.0" of our book on deep representation learning! We've reorganized and improved many sections for better pedagogical clarity, and added many new examples and applications throughout the book. Massive thanks are due to folks in the community who submitted feedback and corrections on the first version, including @sirbayes :-) 📕Read: ma-lab-berkeley.github.io/deep-represent… 🛠️Contribute: github.com/Ma-Lab-Berkele…

Kevin Patrick Murphy@sirbayes

I am delighted to see a new version of the book by @_sdbuchanan, @druv_pai , @pengwang2003 and @YiMaTweets . This is the best book on the foundations of deep representation learning! In this era of coding agents, the math is all you need to learn :) ma-lab-berkeley.github.io/deep-represent…

English

6.1K

Kelly Buchanan retweetet

Ken Liu@kenziyuliu·27 Şub

Can we build a blind, *unlinkable inference* layer where ChatGPT/Claude/Gemini can't tell which call came from which users, like a “VPN for AI inference”? Yes! Blog post below + we built it into open source infra/chat app and served >15k prompts at Stanford so far. How it helps with AI user privacy: # The AI user privacy problem If you ask AI to analyze your ChatGPT history today, it’s surprisingly easy to infer your demographics, health, immigration status, and political beliefs. Every prompt we send accumulates into an (identity-linked) profile that the AI lab controls completely and indefinitely. At a minimum this is a goldmine for ads (as we know now). A bigger issue is the concentration of power: AI labs can easily become (or asked to become) a Cambridge Analytica, whistleblow your immigration status, or work with health insurance to adjust your premium if they so choose. This is a uniquely worse problem than search engines because your average query is now more revealing (not just keywords), interactive, and intelligence is now cheap. Despite this, most of us still want these remote models; they’re just too good and convenient! (this is aka the "privacy paradox".) # Unlinkable inference as a user privacy architecture The idea of unlinkable inference is to add privacy while preserving access to the remote models controlled by someone else. A “privacy wrapper” or “VPN for AI inference”, so to speak. Concretely, it’s a blind inference middle layer that: (1) consists of decentralized proxies that anyone can operate; (2) blindly authenticates requests (via blind signatures / RFC9474,9578) so requests are provably sandboxed from each other and from user identity; (3) relays prompts over randomly chosen proxies that don’t see or log traffic (via client-side ephemeral keys or hosting in TEEs); and (4) the provider simply sees a mixed pool of anonymous prompts from the proxies. No state, pseudonyms, or linkable metadata. If you squint, an unlinkable inference layer is essentially a vendor for per-request, anonymous, ephemeral AI access credentials (for users or agents alike). It partitions your context so that user tracking is drastically harder. Obviously, unlinkability isn’t a silver bullet: the prompt itself still goes to the remote model and can leak privacy (so don't use our chat app for a therapy session!). It aims to combat *longitudinal tracking* as a major threat to user privacy, and its statistical power increases quickly by mixing more users and requests. Unlinkability can be applied at any granularity. For an AI chat app, you can unlinkably request a fresh ephemeral key for every session so tracking is virtually impossible. # The Open Anonymity Project We started this project with the belief that intelligence should be a truly public utility. Like water and electricity, providers should be compensated by usage, not who you are or what you do with it. We think unlinkable inference is a first step towards this “intelligence neutrality”. # Try it out! It’s quite practical - Chat app “oa-chat”: chat.openanonymity.ai (<20 seconds to get going) - Blog post that should be a fun read: openanonymity.ai/blog/unlinkabl… - Project page: openanonymity.ai - GitHub: github.com/OpenAnonymity

English

157

828

374.5K

Kelly Buchanan retweetet

Together AI@togethercompute·10 Mar

Introducing the official Together MCP server! Use it in your favorite coding agent to build AI apps, fine-tune models, or spin up clusters faster.

English

1.5K

Entdecken

@Avanika15 @OpenJarvisAI @GuanyaShi @Cursor_ai @srush_nlp @yusan_lin @mirrormirror_ai @Techweek_