Kelly Buchanan

944 posts

Kelly Buchanan

Kelly Buchanan

@ekellbuch

Postdoctoral Fellow @Stanford with @HazyResearch and @Scott_linderman. Working on 🤖🧠 PhD @Columbia @ZuckermanBrain @GoogleAI

Palo Alto, CA Beigetreten Temmuz 2011
2.2K Folgt1.3K Follower
Angehefteter Tweet
Kelly Buchanan
Kelly Buchanan@ekellbuch·
LLMs can generate 100 answers, but which one is right? Check out our latest work closing the generation-verification gap by aggregating weak verifiers and distilling them into a compact 400M model. If this direction is exciting to you, we’d love to connect.
Jon Saad-Falcon@JonSaadFalcon

How can we close the generation-verification gap when LLMs produce correct answers but fail to select them? 🧵 Introducing Weaver: a framework that combines multiple weak verifiers (reward models + LM judges) to achieve o3-mini-level accuracy with much cheaper non-reasoning models like Llama 3.3 70B Instruct! 🧵(1 / N)

English
2
15
62
11.6K
Kelly Buchanan retweetet
Kelly Buchanan retweetet
Ziran Yang
Ziran Yang@__zrrr__·
Introducing Goedel-Code-Prover 🌲 LLMs write code, but can they prove it correct? Not just pass tests, but construct machine-checkable proofs that a program works for ALL possible inputs. We built a system that does exactly this. Given aprogram and its specification in Lean 4, Goedel-Code-Prover automatically synthesizes formal proofs ofcorrectness. Our 8B model achieves 62% overall success rate across three benchmarks (Verina, Clever &AlgoVeri), a 2.6x improvement over the strongest baseline, surpassing both frontier LLMs (GPT/Gemini/Claude)and open-source theorem provers up to 84x larger (DeepSeek-Prover/Goedel-Prover/Kimina-Prover/BFS-Prover).
Ziran Yang tweet media
English
19
73
545
64.8K
Kelly Buchanan retweetet
Chroma
Chroma@trychroma·
Introducing Chroma Context-1, a 20B parameter search agent. > pushes the pareto frontier of agentic search > order of magnitude faster > order of magnitude cheaper > Apache 2.0, open-source
English
137
397
4.1K
1M
Kelly Buchanan retweetet
Stuart Sul
Stuart Sul@stuart_sul·
Happy to share new ThunderKittens attention kernels for B300 GPUs -- faster than FA4! Check it out:
Nash Brown@nash_c_brown

Excited to share new ThunderKittens attention kernels that match or outperform Flash Attention 4 on Blackwell GPUs! Currently only supports QK192/V128 shapes, but more coming soon. Check out the code here: github.com/HazyResearch/T… Shoutout to the FA4 team for the algorithmic innovations and to @stuart_sul for the helpful discussions.

English
2
13
152
13.2K
Kelly Buchanan retweetet
Phillip Isola
Phillip Isola@phillip_isola·
@GuanyaShi I mostly see "algorithmic novelty" as a cost. That cost needs to be justified by a sufficiently surprising result (e.g., a new capability or insight). All the better if you get the same with zero change in method. My rough heuristic: value = log P(results) - log P(methods)
English
2
5
105
5.9K
Kelly Buchanan retweetet
Fireworks AI
Fireworks AI@FireworksAI_HQ·
We’re seeing lots of interest in how Cursor delivered Composer 2. One less obvious insight: you don't need to spend billions on a giant cluster to do reinforcement learning. With disaggregated sampling, we ran @Cursor_ai Composer 2 training across 3-4 clusters worldwide, with a unified capacity of Fireworks Virtual Cloud. Check how we optimize cross-region 1TB+ model updates by 98%+ while keeping staleness under a few minutes: fireworks.ai/blog/frontier-…
Cursor@cursor_ai

We're releasing a technical report describing how Composer 2 was trained.

English
5
27
330
77.9K
Kelly Buchanan retweetet
Xavier Gonzalez
Xavier Gonzalez@xavierjgonzalez·
Parallelizing nonlinear RNNs is gaining traction! More efficient than transformers; more expressive than linear RNNs. My PhD thesis provides an intro guide to the math (Newton's method) behind the parallelization. Great as a quick-start if you want to explore this new field!
Xavier Gonzalez tweet media
English
6
48
359
31.3K
Kelly Buchanan retweetet
Stephen Roller
Stephen Roller@stephenroller·
@srush_nlp I find people unfamiliar with scaling are shocked by this:
Stephen Roller tweet media
English
17
26
279
0
Yusan Lin
Yusan Lin@yusan_lin·
Today @mirrormirror_ai is launching the marketplace where fashion models license their likeness and brands get stunning AI-generated imagery featuring real people. Commercially licensed, model-approved. Try our platform: mirrormirrorai.com As a fashion model I used to spend hours on fashion photoshoot sets. I later did my PhD in CS and became a Research Scientist on AI for fashion. I can see clearly that AI image generation is replacing a large portion of my old job. But brands that use AI recklessly have already paid the price. It damages reputations and hurts the bottom line. Putting real people at the core of AI-generated imagery isn't just about avoiding backlash. It's better business. That's what Mirror Mirror AI is built for. Right now, Mirror Mirror AI houses agency-signed models who have graced the covers of Vogue and Harper's Bazaar. You can digitally book them using our fashion-centric AI software, get your campaign done in hours instead of weeks, and never have to fly anyone in. You purchase a license for commercial use upon approval, and the models get paid. Mirror Mirror AI is also opening a global call for independent models from anywhere in the world to apply to be featured on the platform. Work with fashion brands internationally, choose the projects you take on, and earn from your own likeness on your own terms. Selected models will be announced at an exclusive event in New York during @Techweek_ this June. Apply for the open call: mirrormirrorai.com/open-call A huge thank you to our incredible team for pouring their hearts into this launch, and to a16z @speedrun for believing in our vision from the start. We're just getting started.
English
112
64
832
201.9K
Kelly Buchanan retweetet
Christina Baek
Christina Baek@_christinabaek·
Models are typically specialized to new domains by finetuning on small, high-quality datasets. We find that repeating the same dataset 10–50× starting from pretraining leads to substantially better downstream performance, in some cases outperforming larger models. 🧵
Christina Baek tweet media
English
18
81
614
90.4K
Kelly Buchanan retweetet
Jon Saad-Falcon
Jon Saad-Falcon@JonSaadFalcon·
Personal AI should run on your personal devices. So, we built OpenJarvis: a personal AI that lives, learns, and works on-device. Try it today and top the OpenJarvis Leaderboard for a chance to win a Mac Mini! Collab w/ @Avanika15, John Hennessy, @HazyResearch, and @Azaliamirh. Details in thread.
Jon Saad-Falcon tweet media
English
36
92
319
98.9K
Kelly Buchanan retweetet
Zitong Yang
Zitong Yang@ZitongYang0·
This is only possible with @tyler_griggs_'s tool use library github.com/thinking-machi… I am unfortunately late to the party, but I only recently realized how much of a paradigm shift multi-turn+tool-use is. I even wonder if it makes sense to rewrite the entire pretraining corpus into an agentic trajectory? This solves two problems: (1) removing the gap between pretraining and test distribution; (2) agentic turn change can function as a natural "glue" that puts related internet documents together in context -- agent browsing one document at turn 7 influences its action/generation at turn 107 -- encoding the internet in a natural long-context format. Also, a great time to share that I have joined @thinkymachines. Thanks @miramurati for teaching me the value of focus, @lilianweng for instilling in me the power of responsibility, and @johnschulman2 for showing me by example the free spirit of scientific exploration! We are hiring job-boards.greenhouse.io/thinkingmachin…
clare ❤️‍🔥@clarejtbirch

kind of a big deal but actual legend @ZitongYang0 has integrated @tinkerapi with @harborframework, so you can use Harbor on Tinker w ~no code change now 🤠🧡

English
5
7
116
30.2K
Kelly Buchanan retweetet
Omar Shaikh
Omar Shaikh@oshaikh13·
What’s the point of a “helpful assistant” if you have to always tell it what to do next? In a new paper, we introduce a reasoning model that predicts what you’ll do next over long contexts (LongNAP 💤). We trained it on 1,800 hours of computer use from 20 users. 🧵
English
16
81
291
98.3K
Kelly Buchanan retweetet
Sam Buchanan
Sam Buchanan@_sdbuchanan·
We've released an updated "v2.0" of our book on deep representation learning! We've reorganized and improved many sections for better pedagogical clarity, and added many new examples and applications throughout the book. Massive thanks are due to folks in the community who submitted feedback and corrections on the first version, including @sirbayes :-) 📕Read: ma-lab-berkeley.github.io/deep-represent… 🛠️Contribute: github.com/Ma-Lab-Berkele…
Kevin Patrick Murphy@sirbayes

I am delighted to see a new version of the book by @_sdbuchanan, @druv_pai , @pengwang2003 and @YiMaTweets . This is the best book on the foundations of deep representation learning! In this era of coding agents, the math is all you need to learn :) ma-lab-berkeley.github.io/deep-represent…

English
1
4
40
6.1K
Kelly Buchanan retweetet
Ken Liu
Ken Liu@kenziyuliu·
Can we build a blind, *unlinkable inference* layer where ChatGPT/Claude/Gemini can't tell which call came from which users, like a “VPN for AI inference”? Yes! Blog post below + we built it into open source infra/chat app and served >15k prompts at Stanford so far. How it helps with AI user privacy: # The AI user privacy problem If you ask AI to analyze your ChatGPT history today, it’s surprisingly easy to infer your demographics, health, immigration status, and political beliefs. Every prompt we send accumulates into an (identity-linked) profile that the AI lab controls completely and indefinitely. At a minimum this is a goldmine for ads (as we know now). A bigger issue is the concentration of power: AI labs can easily become (or asked to become) a Cambridge Analytica, whistleblow your immigration status, or work with health insurance to adjust your premium if they so choose. This is a uniquely worse problem than search engines because your average query is now more revealing (not just keywords), interactive, and intelligence is now cheap. Despite this, most of us still want these remote models; they’re just too good and convenient! (this is aka the "privacy paradox".) # Unlinkable inference as a user privacy architecture The idea of unlinkable inference is to add privacy while preserving access to the remote models controlled by someone else. A “privacy wrapper” or “VPN for AI inference”, so to speak. Concretely, it’s a blind inference middle layer that: (1) consists of decentralized proxies that anyone can operate; (2) blindly authenticates requests (via blind signatures / RFC9474,9578) so requests are provably sandboxed from each other and from user identity; (3) relays prompts over randomly chosen proxies that don’t see or log traffic (via client-side ephemeral keys or hosting in TEEs); and (4) the provider simply sees a mixed pool of anonymous prompts from the proxies. No state, pseudonyms, or linkable metadata. If you squint, an unlinkable inference layer is essentially a vendor for per-request, anonymous, ephemeral AI access credentials (for users or agents alike). It partitions your context so that user tracking is drastically harder. Obviously, unlinkability isn’t a silver bullet: the prompt itself still goes to the remote model and can leak privacy (so don't use our chat app for a therapy session!). It aims to combat *longitudinal tracking* as a major threat to user privacy, and its statistical power increases quickly by mixing more users and requests. Unlinkability can be applied at any granularity. For an AI chat app, you can unlinkably request a fresh ephemeral key for every session so tracking is virtually impossible. # The Open Anonymity Project We started this project with the belief that intelligence should be a truly public utility. Like water and electricity, providers should be compensated by usage, not who you are or what you do with it. We think unlinkable inference is a first step towards this “intelligence neutrality”. # Try it out! It’s quite practical - Chat app “oa-chat”: chat.openanonymity.ai (<20 seconds to get going) - Blog post that should be a fun read: openanonymity.ai/blog/unlinkabl… - Project page: openanonymity.ai - GitHub: github.com/OpenAnonymity
Ken Liu tweet media
English
62
157
828
374.5K
Kelly Buchanan retweetet
Together AI
Together AI@togethercompute·
Introducing the official Together MCP server! Use it in your favorite coding agent to build AI apps, fine-tune models, or spin up clusters faster.
Together AI tweet media
English
3
3
13
1.5K