Yangjun Ruan

246 posts

Yangjun Ruan

@YangjunR

Creating @thinkymachines | @UofT @stanfordAILab @VectorInst

Palo Alto, CA Beigetreten Şubat 2021

774 Folgt1.4K Follower

Angehefteter Tweet

Yangjun Ruan@YangjunR·26 Mar

New paper on synthetic pretraining! We show LMs can synthesize their own thoughts for more data-efficient pretraining, bootstrapping their capabilities on limited, task-agnostic data. We call this new paradigm “reasoning to learn”. arxiv.org/abs/2503.18866 Here’s how it works🧵

English

489

51.4K

Yangjun Ruan retweetet

Konwoo Kim@konwookim·2d

for data-constrained pre-training, synth data isn’t just benchmaxxing, it lowers loss on the real data distribution as we generate more tokens for even better scaling, treat synth gens as forming one long 𝗺𝗲𝗴𝗮𝗱𝗼𝗰: 1.8x data efficiency with larger gains under more compute

English

355

90.9K

Yangjun Ruan retweetet

Christina Baek@_christinabaek·5d

Models are typically specialized to new domains by finetuning on small, high-quality datasets. We find that repeating the same dataset 10–50× starting from pretraining leads to substantially better downstream performance, in some cases outperforming larger models. 🧵

English

612

89.1K

Yangjun Ruan@YangjunR·4d

@WilliamBarrHeld @TristanThrush You can always hack in backtesting, but not the future!

English

Will Held@WilliamBarrHeld·4d

I think pre-registration is an important part of this though! We can only build trust in small scale results with scaling laws if we see them pre-registered and then validated at larger scales. (@YangjunR and @TristanThrush were way ahead on this btw...)

English

396

Will Held@WilliamBarrHeld·4d

I’m even more obsessed with this in a world where agents can run non-trivial experiments. Good scaling laws let you run *smaller* experiments! As the implementation bottleneck for trying new things gets smaller, saving FLOPs on your proxy runs buys you more shots on goal.

Cody Blakeney@code_star

Every conversation I have had IRL with @WilliamBarrHeld has been him obsessing over scaling laws. The man is on a mission. Can't wait to see 1e23!

English

5.1K

Yangjun Ruan retweetet

Ken Liu@kenziyuliu·27 Şub

Can we build a blind, *unlinkable inference* layer where ChatGPT/Claude/Gemini can't tell which call came from which users, like a “VPN for AI inference”? Yes! Blog post below + we built it into open source infra/chat app and served >15k prompts at Stanford so far. How it helps with AI user privacy: # The AI user privacy problem If you ask AI to analyze your ChatGPT history today, it’s surprisingly easy to infer your demographics, health, immigration status, and political beliefs. Every prompt we send accumulates into an (identity-linked) profile that the AI lab controls completely and indefinitely. At a minimum this is a goldmine for ads (as we know now). A bigger issue is the concentration of power: AI labs can easily become (or asked to become) a Cambridge Analytica, whistleblow your immigration status, or work with health insurance to adjust your premium if they so choose. This is a uniquely worse problem than search engines because your average query is now more revealing (not just keywords), interactive, and intelligence is now cheap. Despite this, most of us still want these remote models; they’re just too good and convenient! (this is aka the "privacy paradox".) # Unlinkable inference as a user privacy architecture The idea of unlinkable inference is to add privacy while preserving access to the remote models controlled by someone else. A “privacy wrapper” or “VPN for AI inference”, so to speak. Concretely, it’s a blind inference middle layer that: (1) consists of decentralized proxies that anyone can operate; (2) blindly authenticates requests (via blind signatures / RFC9474,9578) so requests are provably sandboxed from each other and from user identity; (3) relays prompts over randomly chosen proxies that don’t see or log traffic (via client-side ephemeral keys or hosting in TEEs); and (4) the provider simply sees a mixed pool of anonymous prompts from the proxies. No state, pseudonyms, or linkable metadata. If you squint, an unlinkable inference layer is essentially a vendor for per-request, anonymous, ephemeral AI access credentials (for users or agents alike). It partitions your context so that user tracking is drastically harder. Obviously, unlinkability isn’t a silver bullet: the prompt itself still goes to the remote model and can leak privacy (so don't use our chat app for a therapy session!). It aims to combat *longitudinal tracking* as a major threat to user privacy, and its statistical power increases quickly by mixing more users and requests. Unlinkability can be applied at any granularity. For an AI chat app, you can unlinkably request a fresh ephemeral key for every session so tracking is virtually impossible. # The Open Anonymity Project We started this project with the belief that intelligence should be a truly public utility. Like water and electricity, providers should be compensated by usage, not who you are or what you do with it. We think unlinkable inference is a first step towards this “intelligence neutrality”. # Try it out! It’s quite practical - Chat app “oa-chat”: chat.openanonymity.ai (<20 seconds to get going) - Blog post that should be a fun read: openanonymity.ai/blog/unlinkabl… - Project page: openanonymity.ai - GitHub: github.com/OpenAnonymity

English

157

827

373.5K

Yangjun Ruan retweetet

CLS@ChengleiSi·23 Oca

Can LLMs automate frontier LLM research, like pre-training and post-training? In our new paper, LLMs found post-training methods that beat GRPO (69.4% vs 48.0%), and pre-training recipes faster than nanoGPT (19.7 minutes vs 35.9 minutes). 1/

English

141

574

105.3K

Yangjun Ruan@YangjunR·23 Oca

I always think TTT as the best scientific setup for studying data efficiency in the limit - and here we have some signs of life that there are very data-efficiency learning paradigms

Mert Yuksekgonul@mertyuksekgonul

How to get AI to make discoveries on open scientific problems? Most methods just improve the prompt with more attempts. But the AI itself doesn't improve. With test-time training, AI can continue to learn on the problem it’s trying to solve: test-time-training.github.io/discover.pdf

English

4.5K

Yangjun Ruan@YangjunR·13 Oca

We've seen pretraining as such a powerful learning paradigm by compressing information in the context into weights - now we should start doing that at test time, too.

Karan Dalal@karansdalal

LLM memory is considered one of the hardest problems in AI. All we have today are endless hacks and workarounds. But the root solution has always been right in front of us. Next-token prediction is already an effective compressor. We don’t need a radical new architecture. The missing piece is to continue training the model at test-time, using context as training data. Our full release of End-to-End Test-Time Training (TTT-E2E) with @NVIDIAAI, @AsteraInstitute, and @StanfordAILab is now available. Blog: nvda.ws/4syfyMN Arxiv: arxiv.org/abs/2512.23675 This has been over a year in the making with @arnuvtandon and an incredible team.

English

1.5K

Yangjun Ruan retweetet

Berivan Isik@BerivanISIK·8 Ara

It has been a super fun day @LLM_eval workshop @NeurIPSConf with amazing talks, posters, and an engaging panel discussion! @dawnsongtweets @natolambert @orf_bnw @sanmikoyejo @abeirami @hamishivi @MariusHobbhahn @beyzaermis @Diyi_Yang @attaluri_nithya @RishiBommasani @YangjunR

English

136

16.8K

Yangjun Ruan@YangjunR·3 Ara

I’ll be attending #NeurIPS starting Wednesday as part of @thinkymachines! Feel free to DM me if you’d like to catch up, chat about research, or learn more about Thinky (we have openings!)🤝 job-boards.greenhouse.io/thinkingmachin…

English

165

16.5K

Yangjun Ruan retweetet

Devendra Chaplot@dchaplot·2 Ara

Many of us from @thinkymachines are at NeurIPS this week. Would love to chat with people interested in joining us or using Tinker. We are also giving away free Tinker credits! Open roles: job-boards.greenhouse.io/thinkingmachin… Signup for Tinker: thinkingmachines.ai/tinker/

English

423

56.8K

Yangjun Ruan@YangjunR·21 Kas

Observational scaling laws hold!

Epoch AI@EpochAIResearch

Benchmarking data is dominated by a single “General Capability” dimension. Is this due to good generalization across tasks, or to developers pushing on all benchmarks at once? 🧵 with some analysis, including the discovery of a “Claudiness” dimension.

English

778

Yangjun Ruan retweetet

Thinking Machines@thinkymachines·27 Eki

Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other approaches for a fraction of the cost. thinkingmachines.ai/blog/on-policy…

English

404

2.8K

1.9M

Yangjun Ruan retweetet

Diyi Yang@Diyi_Yang·22 Eki

Thanks @thinkymachines for supporting Tinker access for our CS329x students on Homework 2 😉

🦋/acc 🌲☀️@argyros_selini

Its not even been a month since @thinkymachines released Tinker & Stanford already has an assignment on it

English

589

313K

Yangjun Ruan retweetet

John Schulman@johnschulman2·21 Eki

Fine-tuning APIs are becoming more powerful and widespread, but they're harder to safeguard against misuse than fixed-weight sampling APIs. Excited to share a new paper: Detecting Adversarial Fine-tuning with Auditing Agents (arxiv.org/abs/2510.16255). Auditing agents search through training datasets and query the model being trained; using these tools they can detect various existing fine-tuning attacks, with a low false-positive rate. I advised this project through the MATS program. I've been impressed by the organization of the program and the caliber of people involved.

English

467

89.1K

Yangjun Ruan@YangjunR·2 Eki

Building infra for R&D is essential but painful and often repetitive (when not publicly shared). Tinker removes this layer of complexity for you and lets you focus on your actual ideas!

Thinking Machines@thinkymachines

Introducing Tinker: a flexible API for fine-tuning language models. Write training loops in Python on your laptop; we'll run them on distributed GPUs. Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models! thinkingmachines.ai/tinker

English

6.1K

Yangjun Ruan retweetet

Lilian Weng@lilianweng·1 Eki

GPUs are expensive and setting up the infrastructure to make GPUs work for you properly is complex, making experimentation on cutting-edge models challenging for researchers and ML practitioners. Providing high quality research tooling is one of the most effective ways to improve research productivity of the wider community and Tinker API is one step towards our mission there. Tinker API is built on top of our experimental results on fine-tuning with LoRA: thinkingmachines.ai/blog/lora/ Beta starts and you can join the waitlist today: thinkingmachines.ai/tinker/

English

134

2.1K

196.1K

Yangjun Ruan retweetet

Thinking Machines@thinkymachines·1 Eki

English

245

789

5.9K

4.2M

Yangjun Ruan retweetet

Thinking Machines@thinkymachines·29 Eyl

LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA. thinkingmachines.ai/blog/lora/

English

557

3.5K

1.4M

Yangjun Ruan retweetet

Thinking Machines@thinkymachines·26 Eyl

Efficient training of neural networks is difficult. Our second Connectionism post introduces Modular Manifolds, a theoretical step toward more stable and performant training by co-designing neural net optimizers with manifold constraints on weight matrices. thinkingmachines.ai/blog/modular-m… We explore a fundamental understanding of the geometry of neural network optimization.

English

114

448

2.9K

1.5M

Yangjun Ruan retweetet

Zitong Yang@ZitongYang0·22 Eyl

📜 Paper on new pretraining paradigm: Synthetic Bootstrapped Pretraining SBP goes beyond next-token supervision in a single document by leveraging inter-document correlations to synthesize new data for training — no teacher needed. Validation: 1T data + 3B model from scratch.🧵

English

256

41.2K

Entdecken

@WilliamBarrHeld @TristanThrush @LLM_eval @NeurIPSConf @dawnsongtweets @natolambert @orf_bnw @sanmikoyejo