Goliath

213 posts

Goliath

Goliath

@zero_goliath

@uwaterloo cs; formerly @ritserlabs, intern @runrl_com

Waterloo, ON Katılım Mart 2022
574 Takip Edilen612 Takipçiler
Goliath
Goliath@zero_goliath·
I was short Indonesian palm oil futures hedged with long spot H100 GPUs, collateralized with a 500gb African voodoo RL environment. The correlation held perfectly for 8 months until some random Ghanaian minister tweeted about a government mandate on electricity prices at 2am. Margin call hit while I was asleep. Woke up to find my entire position liquidated and my Prime Broker had already sold my safe hedge which was long Nasdaq short Gold. Now I make datasets with Meta Glasses at the mall by cold approaching zoomers and explaining why risk adjusted returns matter.
English
1
0
10
268
Goliath
Goliath@zero_goliath·
the value system of the pluribus hivemind clearly gives away that it was RLed by the anthropic of another world
English
0
0
1
123
Goliath
Goliath@zero_goliath·
@Dorialexander as in theyll have the same UI, but the models themselves will be adept at splitting tasks and delegating to subagents
English
1
0
2
21
Goliath
Goliath@zero_goliath·
@Dorialexander i think anthropic/oai will end up (or are) RLing their code agents to work in parallel. theyre way too slow working serially
English
1
0
2
59
Alexander Doria
Alexander Doria@Dorialexander·
So after a few days playing with codex/cc on TorchTitan, quite mixed overall. I don't think we talk enough about all the time lost in endless slop rambling.
English
5
1
48
5.2K
Goliath
Goliath@zero_goliath·
@Matthewagi yeah, if they've got their RL pipeline working reliably for ~200k token rollouts with tool calls, then parallel agents can't be too far behind
English
0
0
2
24
Matt
Matt@Matthewagi·
@zero_goliath This is probably what blows that work-time graph up. Orchestrator then parallel workers so no one is operating too long
English
1
0
1
34
Goliath
Goliath@zero_goliath·
having codex run serially for many hours is so 2025. cant wait for code agents RLed to work in parallel
English
1
0
3
216
Goliath
Goliath@zero_goliath·
would love to see more work investigating mixtures-of-recursions (ie adaptive recurrence) vs CoT RL intuitively, the main difference seems to be that CoT RL actually updates the priors for later non-CoT tokens also its plausible that CoT RL is better at parallel reasoning, see the Let's Think Dot by Dot paper still, i wonder if there are notable gains from adaptive recursion outside of CoT, or if speculative decoding accomplishes the same thing
English
0
0
1
133
Goliath
Goliath@zero_goliath·
continual learning is a proxy for sample efficiency which is a proxy for long context
English
1
0
2
328
Goliath
Goliath@zero_goliath·
among living organisms, the capability to evolve is itself evolved over successive generations. evolution optimizes its own learning algorithm
English
1
0
3
101
Goliath
Goliath@zero_goliath·
i think even this attempt to narrow the meaning of "software engineering is done" is insufficient. i look at code much less often vs a year ago. the semantic content of the codebase is largely diffused to me through the LLM's messages the dev loop is much more reliant on testing and aggressively interrogating the LLM about its design decisions than manually reviewing code i think its totally plausible that 7 months from now, companies will transition to having humans use LLMs to review PRs. this wont require humans ever looking at the code, just talking to LLMs about design decisions does this not count as software engineering?
English
0
0
2
756
Daniel Kang
Daniel Kang@ddkang·
I replied in the comments with no response, so let's try this! @dmwlff - I'll bet you $100k that Anthropic will still employ people who look at code by July 1, 2026. This is a substantial portion of my net worth :) I'll take even odds!
Adam Wolff@dmwlff

I believe this new model in Claude Code is a glimpse of the future we're hurtling towards, maybe as soon as the first half of next year: software engineering is done. Soon, we won't bother to check generated code, for the same reasons we don't check compiler output.

English
8
6
256
42.1K
Goliath
Goliath@zero_goliath·
it’s so easy to one shot frontends for disposable experimental code these days
English
0
0
2
87
Goliath
Goliath@zero_goliath·
never forget that llm RL is basically trading entropy for accuracy
Goliath tweet media
English
2
0
5
256
Ashwinee Panda
Ashwinee Panda@PandaAshwinee·
@agarwl_ you don’t think exploration is a huge challenge? whenever i train anything, after my entropy craters i can’t get anything done. basically what @karpathy said abt synthetic data, the samples are collapsed and training on them leads to collapse.
English
2
0
9
1.6K
Rishabh Agarwal
Rishabh Agarwal@agarwl_·
What's the next big unblock for scaling up RL? - Maximize throughput for inference - Maximize MFU for training - (The hard part) Do the above while keeping RL training stable. Interestingly, this is mostly an algorithmic challenge and we have infrastructure pieces already in place! We can maximize throughput during LLM serving and MFU during pre-training (this is what we are good at already). Unfortunately, current RL training is sensitive to latency of the inference phase (too much off policy or staleness and things break down) -- this also means we are wasting resources during training and also not utilizing our inference stack as efficiently as possible. Imagine a simple setups like the reward computation being really expensive? E.g., at Periodic, we would need to make a new material in the physical lab to get feedback about what we made -- this could take hours to days! So we don't want to wait for hours or days before we train our LLM to improve on this signal.
English
10
11
206
65.6K
Kangwook Lee
Kangwook Lee@Kangwook_Lee·
In 2026, at some "prestigious" conference, we will see ...
Kangwook Lee tweet media
(((ل()(ل() 'yoav))))👾@yoavgo

@karpathy can you elaborate on why images can get bidi attention easily while text cannot? also, no tokenization but dont we still get something similar and perhaps uglier when chunking the input image into patches?

English
4
10
143
24.4K
Goliath
Goliath@zero_goliath·
I would assume for a tiny transformer playing tic tac toe, pretraining would be strictly more efficient because of the limited number of game states. I think if you scale this up to the level of LLMs and use more complex domains like math reasoning, it follows from your question that maybe we should investigate if SFT has a tendency to memorize instead of learning reasoning skills as opposed to RL. Does RL lead to better generalization? I think it’s worthwhile to explore.
English
1
0
3
53
Goliath
Goliath@zero_goliath·
new @runrl_com blog post: i pretrained a tiny transformer model on perfect tic tac toe moves and measured how much it affects RL compute requirements
Goliath tweet media
English
2
1
3
769