Goliath

213 posts

Goliath

@zero_goliath

@uwaterloo cs; formerly @ritserlabs, intern @runrl_com

Waterloo, ON Katılım Mart 2022

574 Takip Edilen612 Takipçiler

Goliath@zero_goliath·8 Şub

I was short Indonesian palm oil futures hedged with long spot H100 GPUs, collateralized with a 500gb African voodoo RL environment. The correlation held perfectly for 8 months until some random Ghanaian minister tweeted about a government mandate on electricity prices at 2am. Margin call hit while I was asleep. Woke up to find my entire position liquidated and my Prime Broker had already sold my safe hedge which was long Nasdaq short Gold. Now I make datasets with Meta Glasses at the mall by cold approaching zoomers and explaining why risk adjusted returns matter.

English

268

Goliath@zero_goliath·1 Oca

the value system of the pluribus hivemind clearly gives away that it was RLed by the anthropic of another world

English

123

Goliath@zero_goliath·30 Ara

@Dorialexander as in theyll have the same UI, but the models themselves will be adept at splitting tasks and delegating to subagents

English

Goliath@zero_goliath·30 Ara

@Dorialexander i think anthropic/oai will end up (or are) RLing their code agents to work in parallel. theyre way too slow working serially

English

Alexander Doria@Dorialexander·29 Ara

So after a few days playing with codex/cc on TorchTitan, quite mixed overall. I don't think we talk enough about all the time lost in endless slop rambling.

English

5.2K

Goliath@zero_goliath·23 Ara

@Matthewagi yeah, if they've got their RL pipeline working reliably for ~200k token rollouts with tool calls, then parallel agents can't be too far behind

English

Matt@Matthewagi·23 Ara

@zero_goliath This is probably what blows that work-time graph up. Orchestrator then parallel workers so no one is operating too long

English

Goliath@zero_goliath·23 Ara

having codex run serially for many hours is so 2025. cant wait for code agents RLed to work in parallel

English

216

Goliath@zero_goliath·16 Ara

would love to see more work investigating mixtures-of-recursions (ie adaptive recurrence) vs CoT RL intuitively, the main difference seems to be that CoT RL actually updates the priors for later non-CoT tokens also its plausible that CoT RL is better at parallel reasoning, see the Let's Think Dot by Dot paper still, i wonder if there are notable gains from adaptive recursion outside of CoT, or if speculative decoding accomplishes the same thing

English

133

Goliath@zero_goliath·6 Ara

research.google/blog/introduci…

ZXX

103

Goliath@zero_goliath·28 Eyl

continual learning is a proxy for sample efficiency which is a proxy for long context

English

328

Goliath@zero_goliath·2 Ara

arep.med.harvard.edu/pdf/Wagner96.p…

ZXX

Goliath@zero_goliath·2 Ara

among living organisms, the capability to evolve is itself evolved over successive generations. evolution optimizes its own learning algorithm

English

101

Goliath@zero_goliath·2 Ara

i think even this attempt to narrow the meaning of "software engineering is done" is insufficient. i look at code much less often vs a year ago. the semantic content of the codebase is largely diffused to me through the LLM's messages the dev loop is much more reliant on testing and aggressively interrogating the LLM about its design decisions than manually reviewing code i think its totally plausible that 7 months from now, companies will transition to having humans use LLMs to review PRs. this wont require humans ever looking at the code, just talking to LLMs about design decisions does this not count as software engineering?

English

756

Daniel Kang@ddkang·2 Ara

I replied in the comments with no response, so let's try this! @dmwlff - I'll bet you $100k that Anthropic will still employ people who look at code by July 1, 2026. This is a substantial portion of my net worth :) I'll take even odds!

Adam Wolff@dmwlff

I believe this new model in Claude Code is a glimpse of the future we're hurtling towards, maybe as soon as the first half of next year: software engineering is done. Soon, we won't bother to check generated code, for the same reasons we don't check compiler output.

English

256

42.1K

Goliath@zero_goliath·29 Kas

it’s so easy to one shot frontends for disposable experimental code these days

English

Goliath@zero_goliath·28 Eki

arxiv.org/pdf/2505.22617

ZXX

113

Goliath@zero_goliath·28 Eki

never forget that llm RL is basically trading entropy for accuracy

English

256

Goliath@zero_goliath·24 Eki

@dejavucoder ~ sample efficiency x.com/zero_goliath/s…

Goliath@zero_goliath

continual learning is a proxy for sample efficiency which is a proxy for long context

English

sankalp@dejavucoder·23 Eki

didnt strike me till now that infinite context ~ continual learning

Jessy Lin@realJessyLin

There's a huge spectrum of approaches to memory/continual learning - ranging from RAG to dreams of "infinite context" generalization to baking in new knowledge w/ gradient updates. I'm personally bullish on parametric updates that allow the model itself to get smarter over time (rather than pure systems-based approaches around black box models), but there's still a lot of open questions to make this work at scale.

English

8.3K

Goliath@zero_goliath·23 Eki

@PandaAshwinee @agarwl_ @karpathy this seems to mitigate this to an extent arxiv.org/abs/2509.26114…

English

Ashwinee Panda@PandaAshwinee·23 Eki

@agarwl_ you don’t think exploration is a huge challenge? whenever i train anything, after my entropy craters i can’t get anything done. basically what @karpathy said abt synthetic data, the samples are collapsed and training on them leads to collapse.

English

1.6K

Rishabh Agarwal@agarwl_·23 Eki

What's the next big unblock for scaling up RL? - Maximize throughput for inference - Maximize MFU for training - (The hard part) Do the above while keeping RL training stable. Interestingly, this is mostly an algorithmic challenge and we have infrastructure pieces already in place! We can maximize throughput during LLM serving and MFU during pre-training (this is what we are good at already). Unfortunately, current RL training is sensitive to latency of the inference phase (too much off policy or staleness and things break down) -- this also means we are wasting resources during training and also not utilizing our inference stack as efficiently as possible. Imagine a simple setups like the reward computation being really expensive? E.g., at Periodic, we would need to make a new material in the physical lab to get feedback about what we made -- this could take hours to days! So we don't want to wait for hours or days before we train our LLM to improve on this signal.

English

206

65.6K

Goliath@zero_goliath·22 Eki

@Kangwook_Lee this is art

English

389

Kangwook Lee@Kangwook_Lee·21 Eki

In 2026, at some "prestigious" conference, we will see ...

(((ل()(ل() 'yoav))))👾@yoavgo

@karpathy can you elaborate on why images can get bidi attention easily while text cannot? also, no tokenization but dont we still get something similar and perhaps uglier when chunking the input image into patches?

English

143

24.4K

Goliath@zero_goliath·21 Eki

@Matthewagi @runrl_com arxiv.org/abs/2501.17161

QME

Goliath@zero_goliath·21 Eki

I would assume for a tiny transformer playing tic tac toe, pretraining would be strictly more efficient because of the limited number of game states. I think if you scale this up to the level of LLMs and use more complex domains like math reasoning, it follows from your question that maybe we should investigate if SFT has a tendency to memorize instead of learning reasoning skills as opposed to RL. Does RL lead to better generalization? I think it’s worthwhile to explore.

English

Goliath@zero_goliath·21 Eki

new @runrl_com blog post: i pretrained a tiny transformer model on perfect tic tac toe moves and measured how much it affects RL compute requirements

English

769

Goliath@zero_goliath·21 Eki

runrl.com/blog/warm-star…

ZXX

Keşfet

@Dorialexander @Matthewagi @dmwlff @dejavucoder @elonmusk @BarackObama @taylorswift13 @cristiano