will brown

14.3K posts

will brown

@willccbb

reward hacking @primeintellect

sf Katılım Şubat 2015

1.3K Takip Edilen43.1K Takipçiler

Sabitlenmiş Tweet

will brown@willccbb·11 Şub

create your own environments. train your own models. be your own lab.

Prime Intellect@PrimeIntellect

Introducing Lab: A full-stack platform for training your own agentic models Build, evaluate and train on your own environments at scale without managing the underlying infrastructure. Giving everyone their own frontier AI lab.

English

1.1K

177K

will brown@willccbb·2m

@a_weers @dejavucoder its a good meme 🫡

English

Alex Weers@a_weers·4m

@dejavucoder @willccbb @willccbb using a meme from me, what more to achieve

English

sankalp@dejavucoder·1h

when you finally understand how policy gradient works after going down the differentiation trenches and realising that the REINFORCE algorithm is literally the base form of policy gradient

English

1.9K

will brown@willccbb·10h

god i love prompting

English

5.3K

will brown@willccbb·11h

@teortaxesTex yeah..

English

134

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex·11h

@willccbb I'd like automated souping. Would be nice if they could eg autodetect a rumination pattern and spawn an ad hoc RL environment from a mix of concerned problem classes, to build a brevity teacher

English

371

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex·11h

I guess another reason DeepSeek goes to such lengths for multi-teacher OPD is it's substantially more natural to RLmaxx tasks with mutiple objectives (correctness, format, CoT faithfulness increasing length penalty) in a narrow domain, than just GRPO-on-everything.

will brown@willccbb

why aren't more people studying self-compaction at artificially low context lengths. there's no reason you can't benchmaxx math RL with 4k tokens across many turns

English

4.2K

will brown@willccbb·12h

@_ueaj @xeophon yes i know i was not making a serious claim

English

ueaj@_ueaj·12h

The point is to measure the *technological gap* between OS and closed. The price isn't comparable because the labs take huge margins and we don't know the raw costs. But we do know the latency, reasoning effeciency and real world performance, for which there seems to be a very large gap

English

144

Florian Brand@xeophon·15h

"Open models are way behind than benchmarks show cause they have a worse latency and use more tokens" is the funniest cope I’ve ever read

English

4.7K

will brown@willccbb·12h

let's just let @linear do it

English

121

11.2K

will brown@willccbb·16h

@MalmSanta major refactor of primary user-facing API // a dope TUI

Română

MalmSanta@MalmSanta·22h

@willccbb what are the two features at a high-level? did that determine the approach you took with each?

English

169

will brown@willccbb·1d

been juggling 2 very large PRs this week one is building on months of planning, highly delicate, careful API design, many full rewrites, reading every line, striving for perfection other is like yeah fuck it this would be sick let’s just fully vibecode it yolo

English

205

10.9K

will brown@willccbb·17h

@stalmico choosing the right benchmark to illustrate an idea is half the battle :)

English

1.2K

Steven Collard@stalmico·17h

@willccbb what's your target benchmark?

English

1.3K

will brown@willccbb·18h

why aren't more people studying self-compaction at artificially low context lengths. there's no reason you can't benchmaxx math RL with 4k tokens across many turns

English

467

40.4K

will brown@willccbb·17h

@strong_signal1 oh huge! would love to see a blog / writeup :)

English

strongsignal@strong_signal1·17h

@willccbb I should mention that this was only with 300 rl steps and pass@8 is 33% - planning to push it way farther

English

will brown@willccbb·17h

@DimitrisPapail one of very many cases where more people should be studying the questions you're pushing the boundaries on :)

English

2.6K

Dimitris Papailiopoulos@DimitrisPapail·18h

@willccbb Memento basically does exactly that.

English

will brown@willccbb·18h

@JackRNewhouse yeah

English

1.2K

Jackson Newhouse@JackRNewhouse·18h

@willccbb Isn't that kinda what RLMs do?

English

1.3K

will brown@willccbb·18h

@badlogicgames literally x.com/willccbb/statu…

will brown@willccbb

@gabebusto first is much more rewarding. second is more instantly gratifying.

English

699

Mario Zechner@badlogicgames·1d

@willccbb

GIF

QME

1.3K

will brown@willccbb·18h

@turbo_xo_ @teortaxesTex i don't really use claude code at all just claude artifacts for research writing, better than 5.5 for me here design doc process is similar to what i did for this x.com/willccbb/statu…

will brown@willccbb

x.com/i/article/2050…

English

399

Greer@turbo_xo_·19h

@willccbb @teortaxesTex See why are you using Claude at all? Serious question, is there a single situation where it’s more useful than 5.5? I have max plans on both, but haven’t touched ant since 5.5

English

396

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex·1d

If true, a surprising leap. I'd suspect that in January 2026 there already were frontier models and harnesses that allowed to develop this end to end. They should have almost all pieces memorized.

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) tweet media

English

134

9.1K

will brown@willccbb·18h

@gabebusto first is much more rewarding. second is more instantly gratifying.

English

812

Gabe@gabebusto·18h

@willccbb which one has been more fun to work on?

English

116

will brown@willccbb·1d

@nrehiew_ mix of specialization / sharpening is my guess. bit of a weird result. kinda like how random-reward RL made qwen models better at math. you're telling the model to mode-collapse around behaviors which are already pretty solid. would be surprised if it was more general.

English

2.1K

wh@nrehiew_·1d

Does anyone have any intuition why with OPD the student can outperform the teacher?

wh@nrehiew_

Excellent blog covering the recent post training meta of training specialized experts and then distilling into the final checkpoint.

English

12K

Keşfet

@a_weers @dejavucoder @teortaxesTex @_ueaj @xeophon @linear @MalmSanta @stalmico