Percy Liang

1.3K posts

Percy Liang

Percy Liang

@percyliang

professor of computer science @Stanford @stanfordnlp, co-founder of @togethercompute, creator of https://t.co/7R5THVogW2, co-founder of @simile_ai, pianist

Stanford, CA Katılım Ekim 2009
424 Takip Edilen100.3K Takipçiler
Sabitlenmiş Tweet
Percy Liang
Percy Liang@percyliang·
What would truly open-source AI look like? Not just open weights, open code/data, but *open development*, where the entire research and development process is public *and* anyone can contribute. We built Marin, an open lab, to fulfill this vision:
Percy Liang tweet media
English
64
221
1.2K
190.2K
Percy Liang retweetledi
Kanjun 🐙
Kanjun 🐙@kanjun·
Twitter’s algorithm is optimized for addiction, not for us. We deserve better. We’re releasing Bouncer today so you can take back control of your feed. Describe what you don't want, and Bouncer removes it. It’s free, doesn’t collect your data, and will be open source soon.
English
210
294
3.1K
568.7K
Percy Liang
Percy Liang@percyliang·
Our 1e23 Delphi run finished last night. It's loss was within 0.005 of the projected (preregistered) loss. Note that these projections were based on only training models over 100x smaller (3e20)! Still more work to do. We still had loss spikes and if you closely, our scaling laws are bending. We have some ideas for fixing both...
Will Held@WilliamBarrHeld

How far do Marin's scaling laws extrapolate? At least 100x, apparently! Despite spooky spikes, our 1e23 Delphi finished on forecast. The compute-optimal ladder costs ~1e21 FLOPs to train. Good scaling science lets you “run” this (not tiny) experiment at 1/100th the cost.

English
7
13
191
32.5K
Percy Liang
Percy Liang@percyliang·
Academic titles are funny. After 14 years, I finally have the official title that people might have always assumed I had.
English
94
22
1.3K
114.3K
Percy Liang retweetledi
Together AI
Together AI@togethercompute·
New from Together Research: Aurora. Speculative decoding that adapts to shifting traffic in real time — and keeps improving the longer it runs. Open-source, RL-based, 1.25x faster vs. a well-trained static speculator with no offline retraining pipeline. Thread 🧵
Together AI tweet media
English
5
45
263
32.8K
Percy Liang
Percy Liang@percyliang·
@_aidan_clark_ Thanks. We thought about killing the run, but we already preregistered the loss, so wanted to see where it landed. We're definitely going to keep on iterating before scaling up though.
English
1
0
0
630
Aidan Clark
Aidan Clark@_aidan_clark_·
@percyliang Some legit free advice: focus on fixing those spikes not training the model!
English
1
0
17
1.3K
Percy Liang retweetledi
Hanna Hajishirzi
Hanna Hajishirzi@HannaHajishirzi·
Life update here: Last week marked the end of my time at Ai2. Proud to have built releases like Olmo, Tülu, FlexOlmo, DRTulu, OLMoTrace, OlmoE, and datasets including Dolma and Dolci—and of how strongly we pushed for open models and open science. Our artifacts reached 33M+ downloads, including ~4M for Olmo 3. I believe Olmo has empowered researchers to push the boundaries of AI I’ll always be cheering on Ai2 and will continue to strongly support open-source, open-science AI. I’m deeply grateful for this chapter and excited for what comes next.
Hanna Hajishirzi tweet media
English
40
25
548
56.9K
Percy Liang
Percy Liang@percyliang·
In our last episode, careful tuning, scaling, and ensembles led to a 5x gain in data efficiency (requires 5x less data to get the same loss). Now, with a rephraser model, we can get an additional 1.8x gain in data efficiency. I know, everyone's compute constrained, but we're preparing for a data-constrained future.
Konwoo Kim@konwookim

for data-constrained pre-training, synth data isn’t just benchmaxxing, it lowers loss on the real data distribution as we generate more tokens for even better scaling, treat synth gens as forming one long 𝗺𝗲𝗴𝗮𝗱𝗼𝗰: 1.8x data efficiency with larger gains under more compute

English
7
27
254
36K
Percy Liang
Percy Liang@percyliang·
Here's the GitHub issue with all the details: github.com/marin-communit… This is part of our Delphi suite, a "modernized" version of Pythia: x.com/percyliang/sta…
Percy Liang@percyliang

For trying to understanding LMs deeply, @AiEleuther’s Pythia has been an invaluable resource: 16 LMs (70M to 12B parameters) trained on the same data (The Pile) in the same order, with intermediate checkpoints. It’s been two years and it’s time for a refresh.

English
0
2
24
6.8K
Percy Liang
Percy Liang@percyliang·
In Marin, we are trying to get really good at scaling laws. We have trained models up to 1e22 FLOPs and have made a prediction of the loss at 1e23 FLOPs, which @WilliamBarrHeld is running. This prediction is preregistered on GitHub, so we'll see in a few days how accurate our prediction was. What we want is not just a single model but a training recipe that scales reliably.
Percy Liang tweet media
English
18
48
469
78.5K
Percy Liang
Percy Liang@percyliang·
I think it’s pretty clear that simulation is the next frontier for AI. The most impressive feats of AI to date are when we have a clear environment + reward, whether it be beating Le Sedol at Go, winning an IMO gold medal, or writing entire apps from scratch. In these cases, the RL algorithm can try different actions, and observe the well-defined consequences in the safety of a docker container. But what about messy real-world situations involving people? The rewards are unclear, the stakes are high, and you can’t experiment in the real world. But these situations are precisely where the next big opportunity in AI is. To crack this, we need to *simulate* society (“put society into a docker container”). Concretely, this means building a model that can predict what will happen in any given situation (real or hypothetical). If we can do this, we are only limited by our imagination: predict the future, optimize for better outcomes, answer hypothetical (“what if”) questions. Ultimately, this goes beyond making better decisions, but it’s about giving us a better understanding of ourselves and the world. Simulation is the whole enchilada. And this is exactly the research that @simile_ai is working on. Read more here: simile.ai/blog/simulatio…
English
45
111
1.1K
112.8K
Percy Liang retweetledi
elie
elie@eliebakouch·
today is my last day at hugging face feeling really grateful to have worked with such an amazing team and learned so much along the way. i’m proud of what we accomplished together, especially the smollm series. building that project from scratch, putting so much into it, and getting to iterate on a model and training recipe that pushed the frontier for its size was really rewarding i hope i was able to play a part in making model training more accessible and in pushing the open model ecosystem forward. i’m also very thankful to hf for giving me the chance to share my passion for llm research, especially here, and to connect with so many awesome people things can get quite intense in this field, but i’m still very excited about the next challenges and about the good this technology can do but first, taking a few weeks break :)
English
116
10
745
33K
Percy Liang
Percy Liang@percyliang·
Normally replay old data reduces forgetting, but it actually helps you learn on new data too! We finally put this paper out on arxiv, but had it up as a Marin GitHub issue ~1 year ago: github.com/marin-communit…
Suhas Kotha@kothasuhas

to improve fine-tuning data efficiency, replay generic pre-training data not only does this reduce forgetting, it actually improves performance on the fine-tuning domain! especially when fine-tuning data is scarce in pre-training (w/ @percyliang)

English
12
25
248
36.1K
Percy Liang retweetledi
Joon Sung Park
Joon Sung Park@joon_s_pk·
Exciting to see @WSJ cover what we’re building at @simile_ai. It’s great to see the technology we developed in the lab making real world impact alongside foundational institutions like CVS and Gallup. Nothing like frontier research meeting real PMF! wsj.com/cio-journal/ca…
English
7
18
115
18.9K
Yaroslav Bulatov
Yaroslav Bulatov@yaroslavvb·
Random photo from ICML 2008. Fedor Zhdanov, Gustavo Lacerda, Percy Liang, some guy in striped shirt
Yaroslav Bulatov tweet media
English
8
1
349
35.6K
Percy Liang
Percy Liang@percyliang·
I stopped using ChatGPT a few months ago. Since then, I have been only using oa-chat. All chat history is stored locally. Each query is sent to OpenAI under a temporary key which is unlinkable to any other query. I’m not a privacy nut, but oa-chat is such a convenient drop-in replacement for your favorite AI assistant that there’s no reason not to try it out.
Ken Liu@kenziyuliu

Can we build a blind, *unlinkable inference* layer where ChatGPT/Claude/Gemini can't tell which call came from which users, like a “VPN for AI inference”? Yes! Blog post below + we built it into open source infra/chat app and served >15k prompts at Stanford so far. How it helps with AI user privacy: # The AI user privacy problem If you ask AI to analyze your ChatGPT history today, it’s surprisingly easy to infer your demographics, health, immigration status, and political beliefs. Every prompt we send accumulates into an (identity-linked) profile that the AI lab controls completely and indefinitely. At a minimum this is a goldmine for ads (as we know now). A bigger issue is the concentration of power: AI labs can easily become (or asked to become) a Cambridge Analytica, whistleblow your immigration status, or work with health insurance to adjust your premium if they so choose. This is a uniquely worse problem than search engines because your average query is now more revealing (not just keywords), interactive, and intelligence is now cheap. Despite this, most of us still want these remote models; they’re just too good and convenient! (this is aka the "privacy paradox".) # Unlinkable inference as a user privacy architecture The idea of unlinkable inference is to add privacy while preserving access to the remote models controlled by someone else. A “privacy wrapper” or “VPN for AI inference”, so to speak. Concretely, it’s a blind inference middle layer that: (1) consists of decentralized proxies that anyone can operate; (2) blindly authenticates requests (via blind signatures / RFC9474,9578) so requests are provably sandboxed from each other and from user identity; (3) relays prompts over randomly chosen proxies that don’t see or log traffic (via client-side ephemeral keys or hosting in TEEs); and (4) the provider simply sees a mixed pool of anonymous prompts from the proxies. No state, pseudonyms, or linkable metadata. If you squint, an unlinkable inference layer is essentially a vendor for per-request, anonymous, ephemeral AI access credentials (for users or agents alike). It partitions your context so that user tracking is drastically harder. Obviously, unlinkability isn’t a silver bullet: the prompt itself still goes to the remote model and can leak privacy (so don't use our chat app for a therapy session!). It aims to combat *longitudinal tracking* as a major threat to user privacy, and its statistical power increases quickly by mixing more users and requests. Unlinkability can be applied at any granularity. For an AI chat app, you can unlinkably request a fresh ephemeral key for every session so tracking is virtually impossible. # The Open Anonymity Project We started this project with the belief that intelligence should be a truly public utility. Like water and electricity, providers should be compensated by usage, not who you are or what you do with it. We think unlinkable inference is a first step towards this “intelligence neutrality”. # Try it out! It’s quite practical - Chat app “oa-chat”: chat.openanonymity.ai (<20 seconds to get going) - Blog post that should be a fun read: openanonymity.ai/blog/unlinkabl… - Project page: openanonymity.ai - GitHub: github.com/OpenAnonymity

English
22
79
900
147K
Percy Liang retweetledi
Stefano Ermon
Stefano Ermon@StefanoErmon·
Mercury 2 is live 🚀🚀 The world’s first reasoning diffusion LLM, delivering 5x faster performance than leading speed-optimized LLMs. Watching the team turn years of research into a real product never gets old, and I’m incredibly proud of what we’ve built. We’re just getting started on what diffusion can do for language.
English
322
577
4.2K
1M
Percy Liang
Percy Liang@percyliang·
These days, I'm much more excited about dataset releases than model releases. Models come and go and don't compose, whereas good datasets are more enduring and can be studied, used, revised to create better models more broadly. Excited about these 155K coding agent trajectories...just SFT'ing on this data improves SWE-bench Verified massively (23% -> 59.4%).
Together AI@togethercompute

We’re open-sourcing CoderForge-Preview — 258K test-verified coding-agent trajectories (155K pass | 103K fail). Fine-tuning Qwen3-32B on the passing subset boosts SWE-bench Verified: 23.0% → 59.4% pass@1, and it ranks #1 among open-data models ≤32B parameters. Thread on the data generation pipeline 🧵

English
14
53
447
49.3K