nathan lile

1.6K posts

nathan lile

@NathanThinks

ceo/cofounder @ https://t.co/bDd3J4Lmzf hiring in SF 🌁 scaling synthetic reasoning. recurrent rabbit hole victim. nothing great is easy.

San Francisco Katılım Ağustos 2013

1.2K Takip Edilen2.4K Takipçiler

Sabitlenmiş Tweet

nathan lile@NathanThinks·10 Oca

Superintelligence isn't about discovering new things; it's about discovering new ways to discover I think our latest work formalizes Meta Chain-of-Thought which we believe lies on the path to ASI When we train models on the problem-solving process itself—rather than the final solution—they internalize how to think about reasoning tasks, not just what to think The next wave of AI is a Meta-CoT loop. We can't predict what novel forms of thinking might emerge, but it points to an extraordinary synthetic future I'm so proud of @synth_labs team & our incredible open science collaborators for getting this work out

Rafael Rafailov @ NeurIPS@rm_rafailov

We have a new position paper on "inference time compute" and what we have been working on in the last few months! We present some theory on why it is necessary, how does it work, why we need it and what does it mean for "super" intelligence.

English

140

35.7K

nathan lile@NathanThinks·20 Oca

@zaph0id @0xyaza @DSPyOSS @lateinteraction #L1" target="_blank" rel="nofollow noopener">github.com/stanfordnlp/ds…

QME

huduga@zaph0id·19 Oca

@0xyaza @DSPyOSS So RLM is now in @DSPyOSS ?? @lateinteraction ?

English

400

Yasir@0xyaza·19 Oca

The RLM implementation and usage in @DSPyOSS is beautiful.

English

203

12.8K

nathan lile@NathanThinks·7 Eki

we’re at #COLM2025🍁 come see our poster # 26 (session 1) today reach out ✉️ if you'd like to chat!

nathan lile@NathanThinks

Qwen+RL = dramatic, Aha! Llama+RL = quick plateau Same size. Same RL. Why? Qwen naturally exhibits cognitive behaviors that Llama doesn't Prime Llama with 4 synthetic reasoning patterns & it matched Qwen's self-improvement performance! We can engineer this into any model! 👇

English

11.1K

nathan lile retweetledi

Anikait Singh@Anikait_Singh_·3 Eki

🚨🚨New Paper: Training LLMs to Discover Abstractions for Solving Reasoning Problems Introducing RLAD, a two-player RL framework for LLMs to discover 'reasoning abstractions'—natural language hints that encode procedural knowledge for structured exploration in reasoning.🧵⬇️

English

111

596

56K

nathan lile retweetledi

Zichen Liu@zzlccc·2 Eki

much more convinced after getting my own results: LoRA with rank=1 learns (and generalizes) as well as full-tuning while saving 43% vRAM usage! allows me to RL bigger models with limited resources😆 script: github.com/sail-sg/oat/bl…

Thinking Machines@thinkymachines

LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA. thinkingmachines.ai/blog/lora/

English

792

203.8K

nathan lile retweetledi

Rafael Rafailov @ NeurIPS@rm_rafailov·6 Eyl

I’ll take the opposite view - current methods are saturating and we need at least 1 practical breakthrough and at least two fundamental ones (which will likely take years) just off the top of my head to reach AGI. None of these are oversight or safety related.

Stephen McAleer@McaleerStephen

Scalable oversight is pretty much the last big research problem left. Once you get an unhackable reward function for anything then you can RL on everything.

English

154

30K

nathan lile@NathanThinks·2 Eyl

@parafactual @whybyfire very cool!

English

You@parafactual·2 Eyl

@whybyfire the model is presented with a reddit submission and a single comment thread inside that submission. the usernames are all redacted. the task is to identify which comments in the thread are from the same user

English

143

You@parafactual·2 Eyl

I fucking obliterated a 32b base model s brain

English

104

22.9K

nathan lile retweetledi

John Burn-Murdoch@jburnmurdoch·8 Ağu

NEW: Is the internet changing our personalities for the worse? Conscientiousness and extroversion are down, neuroticism up, with young adults leading the charge. This is a really consequential shift, and there’s a lot going on here, so let’s get into the weeds 🧵

English

392

2.9K

11.7K

2.9M

nathan lile retweetledi

Seth Kimmel@sethkimmel3·16 Tem

Really great collaborating with @NathanThinks! Reach out if you're working on synthetic data generation, offline RL, or simulating agentic behavior.

Sutro@sutro_sh

Earlier this year we partnered with SynthLabs (synthlabs.ai), a post-training research lab, to generate a 351 billion token synthetic dataset 10x faster and 80% cheaper. Read more in our case study: sutro.sh/case-studies/s…

English

1.3K

nathan lile@NathanThinks·16 Tem

@_jasonwei excellent as always!

English

2.1K

Jason Wei@_jasonwei·16 Tem

New blog post about asymmetry of verification and "verifier's law": jasonwei.net/blog/asymmetry… Asymmetry of verification–the idea that some tasks are much easier to verify than to solve–is becoming an important idea as we have RL that finally works generally. Great examples of asymmetry of verification are things like sudoku puzzles, writing the code for a website like instagram, and BrowseComp problems (takes ~100 websites to find the answer, but easy to verify once you have the answer). Other tasks have near-symmetry of verification, like summing two 900-digit numbers or some data processing scripts. Yet other tasks are much easier to propose feasible solutions for than to verify them (e.g., fact-checking a long essay or stating a new diet like "only eat bison"). An important thing to understand about asymmetry of verification is that you can improve the asymmetry by doing some work beforehand. For example, if you have the answer key to a math problem or if you have test cases for a Leetcode problem. This greatly increases the set of problems with desirable verification asymmetry. "Verifier's law" states that the ease of training AI to solve a task is proportional to how verifiable the task is. All tasks that are possible to solve and easy to verify will be solved by AI. The ability to train AI to solve a task is proportional to whether the task has the following properties: 1. Objective truth: everyone agrees what good solutions are 2. Fast to verify: any given solution can be verified in a few seconds 3. Scalable to verify: many solutions can be verified simultaneously 4. Low noise: verification is as tightly correlated to the solution quality as possible 5. Continuous reward: it’s easy to rank the goodness of many solutions for a single problem One obvious instantiation of verifier's law is the fact that most benchmarks proposed in AI are easy to verify and so far have been solved. Notice that virtually all popular benchmarks in the past ten years fit criteria #1-4; benchmarks that don’t meet criteria #1-4 would struggle to become popular. Why is verifiability so important? The amount of learning in AI that occurs is maximized when the above criteria are satisfied; you can take a lot of gradient steps where each step has a lot of signal. Speed of iteration is critical—it’s the reason that progress in the digital world has been so much faster than progress in the physical world. AlphaEvolve from Google is one of the greatest examples of leveraging asymmetry of verification. It focuses on setups that fit all the above criteria, and has led to a number of advancements in mathematics and other fields. Different from what we've been doing in AI for the last two decades, it's a new paradigm in that all problems are optimized in a setting where the train set is equivalent to the test set. Asymmetry of verification is everywhere and it's exciting to consider a world of jagged intelligence where anything we can measure will be solved.

English

251

1.6K

391.3K

nathan lile@NathanThinks·15 Tem

@eshear cc @QuentinAnthon15 x.com/QuentinAnthon1…

Quentin Anthony@QuentinAnthon15

I was one of the 16 devs in this study. I wanted to speak on my opinions about the causes and mitigation strategies for dev slowdown. I'll say as a "why listen to you?" hook that I experienced a -38% AI-speedup on my assigned issues. I think transparency helps the community.

533

Emmett Shear@eshear·15 Tem

METR’s analysis of this experiment is wildly misleading. The results indicate that people who have ~never used AI tools before are less productive while learning to use the tools, and say ~nothing about experienced AI tool users. Let's take a look at why. x.com/METR_Evals/sta…

METR@METR_Evals

At the beginning of the study, developers forecasted that they would get sped up by 24%. After actually doing the work, they estimated that they had been sped up by 20%. But it turned out that they were actually slowed down by 19%.

English

848

190.7K

nathan lile@NathanThinks·12 Tem

@kalomaze yes pls!

English

184

kalomaze@kalomaze·12 Tem

if anyone wants to play around with this endpoint i can share a link

kalomaze@kalomaze

this is the Kimi K2 base model's attempt to complete an unfinished version of my blogpost

English

166

10.9K

nathan lile@NathanThinks·11 Tem

@higgsfield_ai ID

428

nathan lile@NathanThinks·10 Tem

up _and_ left 😲

English

74.5K

nathan lile retweetledi

roon@tszzl·9 Tem

you have no idea how hard it is to get an rlhf model to be even “centrist” much less right reactionary. they must have beat this guy up pretty hard

Daniel@growing_daniel

blocked it because of this. No hate on the timeline please!

English

168

161

4.3K

593.2K

nathan lile retweetledi

Sam Altman@sama·4 Tem

I’m not big on identities, but I am extremely proud to be American. This is true every day, but especially today—I firmly believe this is the greatest country ever on Earth. The American miracle stands alone in world history. I believe in techno-capitalism. We should encourage people to make tons of money and then also find ways to widely distribute wealth and share the compounding magic of capitalism. One doesn’t work without the other; you cannot raise the floor and not also raise the ceiling for very long. The world should get richer every year through science and technology, but everyone has to be in the “up elevator”. I think the government usually does a worse job than markets, and so we need to encourage our culture of innovation and entrepreneurship. I also believe that education is critically important to keeping the American edge. I believed this when I was 20, when I was 30, and now I am 40 and still believe it. The Democratic party seemed reasonably aligned with it when I was 20, losing the plot when I was 30, and completely to have moved somewhere else at this point. So now I am politically homeless. But that’s fine; I care much, much more about being American than any political party. I’d rather hear from candidates about how they are going to make everyone have the stuff billionaires have instead of how they are going to eliminate billionaires. The American experiment has always been messy. I am hopeful for another great 250 years. Happy 4th!

English

2.8K

2.3K

32.8K

3.1M

nathan lile retweetledi

Vaibhav (VB) Srivastav@reach_vb·3 Tem

Apple dropping diffusion based Coding LLMs on Hugging Face was not on my bingo

English

846

106.9K

nathan lile retweetledi

Fred Lambert@FredLambert·26 Haz

Xiaomi got 200,000 orders in 3 minutes for the YU7 and I’m not even surprised. The value proposition is just nuts. I’m kinda of bummed because it means a few more years of having to satisfy demand from China before global expansions.

English

382

83.6K

nathan lile retweetledi

TractoAI@tractoai·25 Haz

the future is about smart tokens

nathan lile@NathanThinks

What if models could learn which problems _deserve_ deep thinking? No labels. Just let the model discover difficulty through its own performance during training. Instead of burning compute 🔥💸 on trivial problems, it allocates 5x more on problems that actually need it ↓

English

2.4K

nathan lile@NathanThinks·24 Haz

SynthLabs@synth_labs

Our new method (ALP) monitors solve rates across RL rollouts and applies inverse difficulty penalties during RL training. Result? Models learn an implicit difficulty estimator—allocating 5x more tokens to hard vs easy problems, cutting overall usage by 50% 🧵👇1/10

English

5.8K

nathan lile@NathanThinks·24 Haz

@yacineMTB lmfao wish x had a trends dashboard of 📈📉 ‘user`s posts as ‘seen_by_my_timeline’ over ‘365 | …’ sometimes catches me off guard who silently crashes out

English

144

kache@yacineMTB·24 Haz

Someone should fire roon so he can post again

English

12.9K

Keşfet

@zaph0id @0xyaza @DSPyOSS @lateinteraction @parafactual @whybyfire @_jasonwei @eshear