Vivek

788 posts

Vivek banner
Vivek

Vivek

@vivek_2332

generating past my context window @lossfunk rl · agents · context · evals

inside kv-cache Katılım Eylül 2019
1.5K Takip Edilen359 Takipçiler
Sabitlenmiş Tweet
Vivek
Vivek@vivek_2332·
introducing autoresearch-rl, autonomous research for rl post-training. inspired by @karpathy autoresearch, and i think rl post-training is honestly one of the places where this idea fits perfectly. there are at least 50+ hyper parameters to tweak, learning rate, batch size, rollouts, clipping ratios, kl penalties, schedulers, the list goes on. instead of sitting there for hours turning knobs one at a time, just let the model figure out the right starting config on its own. some things worth mentioning: -> built on @PrimeIntellect prime-rl (my favourite rl post-training framework) and @willccbb verifiers for reward verification. -> ran qwen2.5-0.5b-instruct on gsm8k across 60+ autonomous experiments. eval score went from 0.475 to 0.550 and the agent actually found a way to do it in fewer steps (20 instead of 30). less compute, better results -> the whole thing was surprisingly smooth to set up and run. point the agent at the config, go to sleep, wake up to a full experiment log. i really wish i could try this on a bigger model but gpu poor for now lol -> the agent discovers things you wouldn't think to try. like how rollouts = 4 beats rollouts = 8, or how a constant lr schedule outperforms cosine. it just methodically tests everything i think the real value here is that rl training is so fragile and noisy that having an agent patiently run experiment after experiment is genuinely more effective than a human doing it manually. check it out: github.com/vivekvkashyap/…
Vivek tweet media
English
22
53
749
78.3K
Vivek
Vivek@vivek_2332·
@natolambert my go to book for learning anything related to RL post-training.
English
0
0
1
144
Nathan Lambert
Nathan Lambert@natolambert·
Recording the first lecture of an RLHF Book course I'm making. Things are coming together! Like and subscribe?
English
17
17
385
14.6K
Vivek
Vivek@vivek_2332·
-> seeing @karpathy autoresearch and @cursor_ai compaction. there is a clear pattern that we're going from humans controlling everything except weight updates to giving models control over that too. -> makes sense. we don't understand model internals well enough to handcraft these decisions. train with rl and let the model figure it out. -> this is just the start. models will write their own skill files, design tool use, pick training strategy. humans just write the prompt and reward function. everything else gets learned. excited to see more experiments in this direction.
English
0
0
1
114
Vivek
Vivek@vivek_2332·
cursor just dropped a blog on how they train composer to summarize its own context using rl. the idea is so cool!! -> cursor trained composer to summarize its own context using rl. when the agent hits a token limit it pauses and writes a compact summary before continuing the task. -> the key insight is that summarization is part of the training rollout. the final reward (did the code pass tests) flows back to the summary tokens too so the model learns what info matters. (very imp) -> current approaches use massive prompts telling the model what to preserve or just drop old context. both lose critical info. cursor's approach just says "summarize" and the model figures out what to keep because it was trained to. -> cuts compaction error by 50% while using one fifth the tokens. less is more when the model actually learned what to remember. -> summarized 100k+ tokens down to 1000 for terminal bench. definitely check it out.
Cursor@cursor_ai

We trained Composer to self-summarize through RL instead of a prompt. This reduces the error from compaction by 50% and allows Composer to succeed on challenging coding tasks requiring hundreds of actions.

English
2
0
8
781
Vivek
Vivek@vivek_2332·
@elliotarledge awesome man!! always amazed at how you come up with these crazy ideas.
English
0
0
1
342
Elliot Arledge
Elliot Arledge@elliotarledge·
Introducing Mafia! github.com/Infatoshi/mafia After playing a bunch of Mafia in real life with friends and family, I figured I just had to kick off an RL run to see how language models would evolve and potentially reward hack. I trained Qwen3-8B on H100s with the following roles: > Mafia > Villager > Doctor > Detective > Troll One file. One GPU. No frameworks beyond PyTorch + HuggingFace Transformers.
Elliot Arledge tweet mediaElliot Arledge tweet media
English
30
12
321
22.8K
Vivek
Vivek@vivek_2332·
just finished watching "the thinking game". from chess prodigy to nobel prize. what a ride. insane!! > demis, chess obsessed kid, master at 13 > deep blue beats kasparov, demis loses his mind > intelligence = generality + learning > molyneux offers £1M to skip uni > says no, goes to cambridge > founds deepmind. pure research, zero products > builds DQN to play atari, fails repeatedly > finally works, beats humans at their own games > goes after Go, hardest board game ever > trains alphago on expert games, then self-play > challenges go legend lee sedol > alphago plays moves no human has in 3000 years > alphago wins 4-1, world watches > takes on ke jie in china, loses all 3 > chinese government bans the livestream > builds alphazero. zero human data, pure self-play > masters go from scratch in days > builds alphastar for starcraft > crushes pros, then loses live exhibition match > AI capability terrifies him, manhattan project parallels > focuses on protein folding > enters CASP, alphafold tops the field but not enough > hires biologist, rewrites everything from scratch > alphafold 2 crosses 90, protein folding solved > folds every known protein on earth > open sources it to the world > demis and john jumper win the nobel prize > it was always just a good thinking game this man went from playing chess at 4 to folding every protein on earth. one lab. one idea. solve intelligence, then use it to solve everything else. i don't think people understand what just happened.
Google DeepMind@GoogleDeepMind

To celebrate five years of #AlphaFold, we’re making The Thinking Game available on YouTube. 🧬 Get a candid look at the triumphs, the challenges and the pivotal moments that led to a breakthrough on a 50-year-old grand challenge in biology. Stream for free on @YouTubegoo.gle/4pCVQNY

English
0
0
1
172
∩
@zachpogrob·
The hate on this tweet is hilarious And clearly from those at an introspective phase of their life-- not in heads down/burn everything/build-mode Those people are too busy to reply
David Senra@davidsenra

Great men of history had little to no introspection. The personality that builds empires is not the same personality that sits around quietly questioning itself. @pmarca and I discuss what we both noticed but no one talks about: David: You don't have any levels of introspection? Marc: Yes, zero. As little as possible. David: Why? Marc: Move forward. Go! I found people who dwell in the past get stuck in the past. It's a real problem and it's a problem at work and it's a problem at home. David: So I've read 400 biographies of history’s greatest entrepreneurs and someone asked me what the most surprising thing I’ve learned from this was [and I answered] they have little or zero introspection. Sam Walton didn't wake up thinking about his internal self. He just woke up and was like: I like building Walmart. I'm going to keep building Walmart. I'm going to make more Walmarts. And he just kept doing it over and over again. Marc: If you go back 400 years ago it never would've occurred to anybody to be introspective. All of the modern conceptions around introspection and therapy, and all the things that kind of result from that are, a kind of a manufacture of the 1910s, 1920s. Great men of history didn't sit around doing this stuff. The individual runs and does all these things and builds things and builds empires and builds companies and builds technology. And then this kind of this kind of guilt based whammy kind of showed up from Europe. A lot of it from Vienna in 1910, 1920s, Freud and all that entire movement. And kind of turned all that inward and basically said, okay, now we need to basically second guess the individual. We need to criticize the individual. The individual needs to self criticize. The individual needs to feel guilt, needs to look backwards, needs to dwell in the past. It never resonated with me.

English
12
2
84
6.8K
λux
λux@novasarc01·
@eliebakouch yeah! i made some visualizations for distributed training concepts (along with a DiLoCo explainer) and they were pretty spot on!
λux tweet mediaλux tweet mediaλux tweet mediaλux tweet media
English
4
4
52
2K
Vivek retweetledi
shouko
shouko@shoukointech·
Peter Thiel on Competition: "Don't do what everyone else is doing"
English
5
76
522
20.1K
Vivek
Vivek@vivek_2332·
@_xjdr thanks for sharing. will definitely read it!!
English
0
0
1
72
Vivek
Vivek@vivek_2332·
the coding stack is collapsing. coding in 2020: write every line yourself coding in 2023: ai helps you edit coding in 2024: ai writes it for you coding in 2025: ai opens prs and fixes its own bugs coding now: ai writes its own skill files based on what it lacks next step : is literally just thinking it. we're almost there.
English
0
0
2
89
himanshu
himanshu@himanshustwts·
Career update: Excited to share that I have joined the incredible team at @smallest_AI to work on Research x Devrel! The team is cooking incredible small + efficient multi-modal models and it feels like an exciting time to push the frontier on scale!
himanshu tweet media
English
205
32
1.8K
59.4K
maharshi
maharshi@maharshii·
day by day i’m reminded of how there is so much to learn, to discover, to invent, to make sense of, but so little time to do all of that
English
9
28
368
7.4K
Vivek retweetledi
Kpaxs
Kpaxs@Kpaxs·
Building in public is the new keeping a journal.
English
7
3
37
2.2K
Vivek
Vivek@vivek_2332·
@zachpogrob Yeah, Timothee Chalamet should’ve won the Oscar. Even though I felt the movie itself was average, his acting was absolutely top notch.
English
0
0
0
376
Naval
Naval@naval·
Coding an app is the new starting a podcast.
English
1.5K
2.4K
27.2K
2.7M
Alex Weers
Alex Weers@a_weers·
Finally finished! If you're interested in an overview of recent methods in reinforcement learning for reasoning LLMs, check out this blog post: aweers.de/blog/2026/rl-f… It summarizes ten methods, tries to highlight differences and trends, and has a collection of open problems
Alex Weers tweet media
English
19
239
1.8K
300.8K