Percy Liang

1.3K posts

Percy Liang

@percyliang

professor of computer science @Stanford @stanfordnlp, co-founder of @togethercompute, creator of https://t.co/7R5THVogW2, co-founder of @simile_ai, pianist

Stanford, CA Katılım Ekim 2009

424 Takip Edilen100.3K Takipçiler

Sabitlenmiş Tweet

Percy Liang@percyliang·19 May

What would truly open-source AI look like? Not just open weights, open code/data, but *open development*, where the entire research and development process is public *and* anyone can contribute. We built Marin, an open lab, to fulfill this vision:

English

221

1.2K

190.2K

Percy Liang retweetledi

Kanjun 🐙@kanjun·6d

Twitter’s algorithm is optimized for addiction, not for us. We deserve better. We’re releasing Bouncer today so you can take back control of your feed. Describe what you don't want, and Bouncer removes it. It’s free, doesn’t collect your data, and will be open source soon.

English

210

294

3.1K

568.7K

Percy Liang@percyliang·1 Nis

Our 1e23 Delphi run finished last night. It's loss was within 0.005 of the projected (preregistered) loss. Note that these projections were based on only training models over 100x smaller (3e20)! Still more work to do. We still had loss spikes and if you closely, our scaling laws are bending. We have some ideas for fixing both...

Will Held@WilliamBarrHeld

How far do Marin's scaling laws extrapolate? At least 100x, apparently! Despite spooky spikes, our 1e23 Delphi finished on forecast. The compute-optimal ladder costs ~1e21 FLOPs to train. Good scaling science lets you “run” this (not tiny) experiment at 1/100th the cost.

English

191

32.5K

Percy Liang@percyliang·1 Nis

Academic titles are funny. After 14 years, I finally have the official title that people might have always assumed I had.

English

1.3K

114.3K

Percy Liang retweetledi

Together AI@togethercompute·1 Nis

New from Together Research: Aurora. Speculative decoding that adapts to shifting traffic in real time — and keeps improving the longer it runs. Open-source, RL-based, 1.25x faster vs. a well-trained static speculator with no offline retraining pipeline. Thread 🧵

English

263

32.8K

Percy Liang@percyliang·29 Mar

@_aidan_clark_ Thanks. We thought about killing the run, but we already preregistered the loss, so wanted to see where it landed. We're definitely going to keep on iterating before scaling up though.

English

630

Aidan Clark@_aidan_clark_·28 Mar

@percyliang Some legit free advice: focus on fixing those spikes not training the model!

English

1.3K

Percy Liang@percyliang·28 Mar

Last time Marin 32B got burned by loss spikes, so this makes me very nervous...but maybe Delphi 1e23 will be different🤞

Will Held@WilliamBarrHeld

Our 1e23 "Delphi" (~25B param model trained for ~600B tokens) run for Marin has entered its learning rate decay phase. Lots of spikes at this scale, very scary! Despite that, the run is looking on track to be close to our pre-registered scaling laws predictions. Stay tuned...

English

23K

Percy Liang@percyliang·26 Mar

Why doing scaling laws right can save you $$$

Will Held@WilliamBarrHeld

Scaling laws are "just" regressions. But a biased fitting method can quietly misallocate millions of $ of compute at frontier scales. My coworker Eric Czech dug into a bias in parabolic IsoFLOP fits used by Meta, DeepSeek, Microsoft, Waymo, et al. for their scaling laws🧵

English

19.4K

Percy Liang retweetledi

Hanna Hajishirzi@HannaHajishirzi·24 Mar

Life update here: Last week marked the end of my time at Ai2. Proud to have built releases like Olmo, Tülu, FlexOlmo, DRTulu, OLMoTrace, OlmoE, and datasets including Dolma and Dolci—and of how strongly we pushed for open models and open science. Our artifacts reached 33M+ downloads, including ~4M for Olmo 3. I believe Olmo has empowered researchers to push the boundaries of AI I’ll always be cheering on Ai2 and will continue to strongly support open-source, open-science AI. I’m deeply grateful for this chapter and excited for what comes next.

English

548

56.9K

Percy Liang@percyliang·21 Mar

In our last episode, careful tuning, scaling, and ensembles led to a 5x gain in data efficiency (requires 5x less data to get the same loss). Now, with a rephraser model, we can get an additional 1.8x gain in data efficiency. I know, everyone's compute constrained, but we're preparing for a data-constrained future.

Konwoo Kim@konwookim

for data-constrained pre-training, synth data isn’t just benchmaxxing, it lowers loss on the real data distribution as we generate more tokens for even better scaling, treat synth gens as forming one long 𝗺𝗲𝗴𝗮𝗱𝗼𝗰: 1.8x data efficiency with larger gains under more compute

English

254

36K

Percy Liang@percyliang·18 Mar

Here's the GitHub issue with all the details: github.com/marin-communit… This is part of our Delphi suite, a "modernized" version of Pythia: x.com/percyliang/sta…

Percy Liang@percyliang

For trying to understanding LMs deeply, @AiEleuther’s Pythia has been an invaluable resource: 16 LMs (70M to 12B parameters) trained on the same data (The Pile) in the same order, with intermediate checkpoints. It’s been two years and it’s time for a refresh.

English

6.8K

Percy Liang@percyliang·18 Mar

In Marin, we are trying to get really good at scaling laws. We have trained models up to 1e22 FLOPs and have made a prediction of the loss at 1e23 FLOPs, which @WilliamBarrHeld is running. This prediction is preregistered on GitHub, so we'll see in a few days how accurate our prediction was. What we want is not just a single model but a training recipe that scales reliably.

English

469

78.5K

Percy Liang@percyliang·10 Mar

I think it’s pretty clear that simulation is the next frontier for AI. The most impressive feats of AI to date are when we have a clear environment + reward, whether it be beating Le Sedol at Go, winning an IMO gold medal, or writing entire apps from scratch. In these cases, the RL algorithm can try different actions, and observe the well-defined consequences in the safety of a docker container. But what about messy real-world situations involving people? The rewards are unclear, the stakes are high, and you can’t experiment in the real world. But these situations are precisely where the next big opportunity in AI is. To crack this, we need to *simulate* society (“put society into a docker container”). Concretely, this means building a model that can predict what will happen in any given situation (real or hypothetical). If we can do this, we are only limited by our imagination: predict the future, optimize for better outcomes, answer hypothetical (“what if”) questions. Ultimately, this goes beyond making better decisions, but it’s about giving us a better understanding of ourselves and the world. Simulation is the whole enchilada. And this is exactly the research that @simile_ai is working on. Read more here: simile.ai/blog/simulatio…

English

111

1.1K

112.8K

Percy Liang retweetledi

elie@eliebakouch·9 Mar

today is my last day at hugging face feeling really grateful to have worked with such an amazing team and learned so much along the way. i’m proud of what we accomplished together, especially the smollm series. building that project from scratch, putting so much into it, and getting to iterate on a model and training recipe that pushed the frontier for its size was really rewarding i hope i was able to play a part in making model training more accessible and in pushing the open model ecosystem forward. i’m also very thankful to hf for giving me the chance to share my passion for llm research, especially here, and to connect with so many awesome people things can get quite intense in this field, but i’m still very excited about the next challenges and about the good this technology can do but first, taking a few weeks break :)

English

116

745

33K

Percy Liang@percyliang·7 Mar

Normally replay old data reduces forgetting, but it actually helps you learn on new data too! We finally put this paper out on arxiv, but had it up as a Marin GitHub issue ~1 year ago: github.com/marin-communit…

Suhas Kotha@kothasuhas

to improve fine-tuning data efficiency, replay generic pre-training data not only does this reduce forgetting, it actually improves performance on the fine-tuning domain! especially when fine-tuning data is scarce in pre-training (w/ @percyliang)

English

248

36.1K

Percy Liang retweetledi

Joon Sung Park@joon_s_pk·6 Mar

Exciting to see @WSJ cover what we’re building at @simile_ai. It’s great to see the technology we developed in the lab making real world impact alongside foundational institutions like CVS and Gallup. Nothing like frontier research meeting real PMF! wsj.com/cio-journal/ca…

English

115

18.9K

Percy Liang@percyliang·5 Mar

@yaroslavvb What a young and innocent time...

English

3.3K

Yaroslav Bulatov@yaroslavvb·5 Mar

Random photo from ICML 2008. Fedor Zhdanov, Gustavo Lacerda, Percy Liang, some guy in striped shirt

English

349

35.6K

Percy Liang@percyliang·27 Şub

I stopped using ChatGPT a few months ago. Since then, I have been only using oa-chat. All chat history is stored locally. Each query is sent to OpenAI under a temporary key which is unlinkable to any other query. I’m not a privacy nut, but oa-chat is such a convenient drop-in replacement for your favorite AI assistant that there’s no reason not to try it out.

Ken Liu@kenziyuliu

Can we build a blind, *unlinkable inference* layer where ChatGPT/Claude/Gemini can't tell which call came from which users, like a “VPN for AI inference”? Yes! Blog post below + we built it into open source infra/chat app and served >15k prompts at Stanford so far. How it helps with AI user privacy: # The AI user privacy problem If you ask AI to analyze your ChatGPT history today, it’s surprisingly easy to infer your demographics, health, immigration status, and political beliefs. Every prompt we send accumulates into an (identity-linked) profile that the AI lab controls completely and indefinitely. At a minimum this is a goldmine for ads (as we know now). A bigger issue is the concentration of power: AI labs can easily become (or asked to become) a Cambridge Analytica, whistleblow your immigration status, or work with health insurance to adjust your premium if they so choose. This is a uniquely worse problem than search engines because your average query is now more revealing (not just keywords), interactive, and intelligence is now cheap. Despite this, most of us still want these remote models; they’re just too good and convenient! (this is aka the "privacy paradox".) # Unlinkable inference as a user privacy architecture The idea of unlinkable inference is to add privacy while preserving access to the remote models controlled by someone else. A “privacy wrapper” or “VPN for AI inference”, so to speak. Concretely, it’s a blind inference middle layer that: (1) consists of decentralized proxies that anyone can operate; (2) blindly authenticates requests (via blind signatures / RFC9474,9578) so requests are provably sandboxed from each other and from user identity; (3) relays prompts over randomly chosen proxies that don’t see or log traffic (via client-side ephemeral keys or hosting in TEEs); and (4) the provider simply sees a mixed pool of anonymous prompts from the proxies. No state, pseudonyms, or linkable metadata. If you squint, an unlinkable inference layer is essentially a vendor for per-request, anonymous, ephemeral AI access credentials (for users or agents alike). It partitions your context so that user tracking is drastically harder. Obviously, unlinkability isn’t a silver bullet: the prompt itself still goes to the remote model and can leak privacy (so don't use our chat app for a therapy session!). It aims to combat *longitudinal tracking* as a major threat to user privacy, and its statistical power increases quickly by mixing more users and requests. Unlinkability can be applied at any granularity. For an AI chat app, you can unlinkably request a fresh ephemeral key for every session so tracking is virtually impossible. # The Open Anonymity Project We started this project with the belief that intelligence should be a truly public utility. Like water and electricity, providers should be compensated by usage, not who you are or what you do with it. We think unlinkable inference is a first step towards this “intelligence neutrality”. # Try it out! It’s quite practical - Chat app “oa-chat”: chat.openanonymity.ai (<20 seconds to get going) - Blog post that should be a fun read: openanonymity.ai/blog/unlinkabl… - Project page: openanonymity.ai - GitHub: github.com/OpenAnonymity

English

900

147K

Percy Liang retweetledi

Stefano Ermon@StefanoErmon·24 Şub

Mercury 2 is live 🚀🚀 The world’s first reasoning diffusion LLM, delivering 5x faster performance than leading speed-optimized LLMs. Watching the team turn years of research into a real product never gets old, and I’m incredibly proud of what we’ve built. We’re just getting started on what diffusion can do for language.

English

322

577

4.2K

Percy Liang@percyliang·26 Şub

These days, I'm much more excited about dataset releases than model releases. Models come and go and don't compose, whereas good datasets are more enduring and can be studied, used, revised to create better models more broadly. Excited about these 155K coding agent trajectories...just SFT'ing on this data improves SWE-bench Verified massively (23% -> 59.4%).

Together AI@togethercompute

We’re open-sourcing CoderForge-Preview — 258K test-verified coding-agent trajectories (155K pass | 103K fail). Fine-tuning Qwen3-32B on the passing subset boosts SWE-bench Verified: 23.0% → 59.4% pass @1, and it ranks #1 among open-data models ≤32B parameters. Thread on the data generation pipeline 🧵

English

447

49.3K

Keşfet

@_aidan_clark_ @WilliamBarrHeld @simile_ai @WSJ @yaroslavvb @elonmusk @BarackObama @taylorswift13