Vivek

788 posts

Vivek

@vivek_2332

generating past my context window @lossfunk rl · agents · context · evals

inside kv-cache Katılım Eylül 2019

1.5K Takip Edilen359 Takipçiler

Sabitlenmiş Tweet

Vivek@vivek_2332·5d

introducing autoresearch-rl, autonomous research for rl post-training. inspired by @karpathy autoresearch, and i think rl post-training is honestly one of the places where this idea fits perfectly. there are at least 50+ hyper parameters to tweak, learning rate, batch size, rollouts, clipping ratios, kl penalties, schedulers, the list goes on. instead of sitting there for hours turning knobs one at a time, just let the model figure out the right starting config on its own. some things worth mentioning: -> built on @PrimeIntellect prime-rl (my favourite rl post-training framework) and @willccbb verifiers for reward verification. -> ran qwen2.5-0.5b-instruct on gsm8k across 60+ autonomous experiments. eval score went from 0.475 to 0.550 and the agent actually found a way to do it in fewer steps (20 instead of 30). less compute, better results -> the whole thing was surprisingly smooth to set up and run. point the agent at the config, go to sleep, wake up to a full experiment log. i really wish i could try this on a bigger model but gpu poor for now lol -> the agent discovers things you wouldn't think to try. like how rollouts = 4 beats rollouts = 8, or how a constant lr schedule outperforms cosine. it just methodically tests everything i think the real value here is that rl training is so fragile and noisy that having an agent patiently run experiment after experiment is genuinely more effective than a human doing it manually. check it out: github.com/vivekvkashyap/…

English

749

78.3K

Vivek@vivek_2332·10h

very cool!!

Sameer@samtwtss

bro, it's so over for designers google stitch is insane. 🤯

English

Vivek@vivek_2332·1d

@natolambert my go to book for learning anything related to RL post-training.

English

144

Nathan Lambert@natolambert·1d

Recording the first lecture of an RLHF Book course I'm making. Things are coming together! Like and subscribe?

English

385

14.6K

Vivek@vivek_2332·1d

-> seeing @karpathy autoresearch and @cursor_ai compaction. there is a clear pattern that we're going from humans controlling everything except weight updates to giving models control over that too. -> makes sense. we don't understand model internals well enough to handcraft these decisions. train with rl and let the model figure it out. -> this is just the start. models will write their own skill files, design tool use, pick training strategy. humans just write the prompt and reward function. everything else gets learned. excited to see more experiments in this direction.

English

114

Vivek@vivek_2332·2d

cursor just dropped a blog on how they train composer to summarize its own context using rl. the idea is so cool!! -> cursor trained composer to summarize its own context using rl. when the agent hits a token limit it pauses and writes a compact summary before continuing the task. -> the key insight is that summarization is part of the training rollout. the final reward (did the code pass tests) flows back to the summary tokens too so the model learns what info matters. (very imp) -> current approaches use massive prompts telling the model what to preserve or just drop old context. both lose critical info. cursor's approach just says "summarize" and the model figures out what to keep because it was trained to. -> cuts compaction error by 50% while using one fifth the tokens. less is more when the model actually learned what to remember. -> summarized 100k+ tokens down to 1000 for terminal bench. definitely check it out.

Cursor@cursor_ai

We trained Composer to self-summarize through RL instead of a prompt. This reduces the error from compaction by 50% and allows Composer to succeed on challenging coding tasks requiring hundreds of actions.

English

781

Vivek@vivek_2332·2d

@elliotarledge awesome man!! always amazed at how you come up with these crazy ideas.

English

342

Elliot Arledge@elliotarledge·2d

Introducing Mafia! github.com/Infatoshi/mafia After playing a bunch of Mafia in real life with friends and family, I figured I just had to kick off an RL run to see how language models would evolve and potentially reward hack. I trained Qwen3-8B on H100s with the following roles: > Mafia > Villager > Doctor > Detective > Troll One file. One GPU. No frameworks beyond PyTorch + HuggingFace Transformers.

English

321

22.8K

Vivek@vivek_2332·3d

just finished watching "the thinking game". from chess prodigy to nobel prize. what a ride. insane!! > demis, chess obsessed kid, master at 13 > deep blue beats kasparov, demis loses his mind > intelligence = generality + learning > molyneux offers £1M to skip uni > says no, goes to cambridge > founds deepmind. pure research, zero products > builds DQN to play atari, fails repeatedly > finally works, beats humans at their own games > goes after Go, hardest board game ever > trains alphago on expert games, then self-play > challenges go legend lee sedol > alphago plays moves no human has in 3000 years > alphago wins 4-1, world watches > takes on ke jie in china, loses all 3 > chinese government bans the livestream > builds alphazero. zero human data, pure self-play > masters go from scratch in days > builds alphastar for starcraft > crushes pros, then loses live exhibition match > AI capability terrifies him, manhattan project parallels > focuses on protein folding > enters CASP, alphafold tops the field but not enough > hires biologist, rewrites everything from scratch > alphafold 2 crosses 90, protein folding solved > folds every known protein on earth > open sources it to the world > demis and john jumper win the nobel prize > it was always just a good thinking game this man went from playing chess at 4 to folding every protein on earth. one lab. one idea. solve intelligence, then use it to solve everything else. i don't think people understand what just happened.

Google DeepMind@GoogleDeepMind

To celebrate five years of #AlphaFold, we’re making The Thinking Game available on YouTube. 🧬 Get a candid look at the triumphs, the challenges and the pivotal moments that led to a breakthrough on a 50-year-old grand challenge in biology. Stream for free on @YouTube → goo.gle/4pCVQNY

English

172

Vivek@vivek_2332·3d

if you want to learn all the post-training rl algorithms in one place, check this out. minimal, easy to follow and no fluff.

Alex Weers@a_weers

Finally finished! If you're interested in an overview of recent methods in reinforcement learning for reasoning LLMs, check out this blog post: aweers.de/blog/2026/rl-f… It summarizes ten methods, tries to highlight differences and trends, and has a collection of open problems

English

357

Vivek@vivek_2332·3d

@zachpogrob 😂😂

QME

∩@zachpogrob·3d

The hate on this tweet is hilarious And clearly from those at an introspective phase of their life-- not in heads down/burn everything/build-mode Those people are too busy to reply

David Senra@davidsenra

Great men of history had little to no introspection. The personality that builds empires is not the same personality that sits around quietly questioning itself. @pmarca and I discuss what we both noticed but no one talks about: David: You don't have any levels of introspection? Marc: Yes, zero. As little as possible. David: Why? Marc: Move forward. Go! I found people who dwell in the past get stuck in the past. It's a real problem and it's a problem at work and it's a problem at home. David: So I've read 400 biographies of history’s greatest entrepreneurs and someone asked me what the most surprising thing I’ve learned from this was [and I answered] they have little or zero introspection. Sam Walton didn't wake up thinking about his internal self. He just woke up and was like: I like building Walmart. I'm going to keep building Walmart. I'm going to make more Walmarts. And he just kept doing it over and over again. Marc: If you go back 400 years ago it never would've occurred to anybody to be introspective. All of the modern conceptions around introspection and therapy, and all the things that kind of result from that are, a kind of a manufacture of the 1910s, 1920s. Great men of history didn't sit around doing this stuff. The individual runs and does all these things and builds things and builds empires and builds companies and builds technology. And then this kind of this kind of guilt based whammy kind of showed up from Europe. A lot of it from Vienna in 1910, 1920s, Freud and all that entire movement. And kind of turned all that inward and basically said, okay, now we need to basically second guess the individual. We need to criticize the individual. The individual needs to self criticize. The individual needs to feel guilt, needs to look backwards, needs to dwell in the past. It never resonated with me.

English

6.8K

Vivek@vivek_2332·3d

@novasarc01 @eliebakouch looks great!!

English

λux@novasarc01·3d

@eliebakouch yeah! i made some visualizations for distributed training concepts (along with a DiLoCo explainer) and they were pretty spot on!

English

λux@novasarc01·3d

love the new claude visualisations!

elie@eliebakouch

visual summary of attention residuals by kimi, beautiful paper

English

3.1K

Vivek retweetledi

shouko@shoukointech·3d

Peter Thiel on Competition: "Don't do what everyone else is doing"

English

522

20.1K

Vivek@vivek_2332·3d

@_xjdr thanks for sharing. will definitely read it!!

English

xjdr@_xjdr·4d

Noumena.com/research

ZXX

663

160.9K

Vivek@vivek_2332·3d

the coding stack is collapsing. coding in 2020: write every line yourself coding in 2023: ai helps you edit coding in 2024: ai writes it for you coding in 2025: ai opens prs and fixes its own bugs coding now: ai writes its own skill files based on what it lacks next step : is literally just thinking it. we're almost there.

English

Vivek@vivek_2332·3d

@himanshustwts @smallest_AI awesome man!! congrats.

English

himanshu@himanshustwts·3d

Career update: Excited to share that I have joined the incredible team at @smallest_AI to work on Research x Devrel! The team is cooking incredible small + efficient multi-modal models and it feels like an exciting time to push the frontier on scale!

English

205

1.8K

59.4K

Vivek@vivek_2332·3d

@maharshii crazy how time flies.

English

maharshi@maharshii·4d

day by day i’m reminded of how there is so much to learn, to discover, to invent, to make sense of, but so little time to do all of that

English

368

7.4K

Vivek retweetledi

Kpaxs@Kpaxs·3d

Building in public is the new keeping a journal.

English

2.2K

Vivek@vivek_2332·3d

@zachpogrob Yeah, Timothee Chalamet should’ve won the Oscar. Even though I felt the movie itself was average, his acting was absolutely top notch.

English

376

∩@zachpogrob·3d

Michael B Jordan won an oscar But Timmy Chalamet has a decade on him, and is just as famous Also Sinners was dumb

DiscussingFilm@DiscussingFilm

Michael B. Jordan's reaction to winning his first ever Oscar. See the full winners list: bit.ly/OscarWins26

English

127

35.5K

Vivek@vivek_2332·3d

@naval 😂😂

QME

Naval@naval·3d

Coding an app is the new starting a podcast.

English

1.5K

2.4K

27.2K

2.7M

Vivek@vivek_2332·3d

@a_weers @27upon2 great work man!! will definitely look into it.

English

772

Alex Weers@a_weers·4d

English

239

1.8K

300.8K

Vivek@vivek_2332·4d

read this today and now i want to go down deep into rl post-training infra. really good blog!! @WorkshopLabs

Workshop Labs@WorkshopLabs

Introducing Trellis for Kimi K2 Thinking. It's post-training code that's 50x faster than the best single-node open-source version and 2x cheaper than training APIs. After safety testing, we're open-sourcing it, giving builders the best tools to customize a frontier model. 🧵

English

234

Keşfet

@natolambert @karpathy @cursor_ai @elliotarledge @zachpogrob @novasarc01 @eliebakouch @_xjdr