NP

4.2K posts

NP banner
NP

NP

@np_hard

( ´ ▽ ` )ノ i like things that grow

Katılım Eylül 2011
2.2K Takip Edilen873 Takipçiler
Sabitlenmiş Tweet
NP
NP@np_hard·
As part of @PrimeIntellect's RL residency program, I've been exploring how to do multi-agent RL using their current stack (from verifiers + prime-rl to lab experiments with hosted training /evals) and thinking about how it could be extended to support these abstractions natively. I've summarized my findings the blogpost below and I'll leave a few comments here, too...
NP tweet media
English
7
45
355
43.3K
NP retweetledi
will brown
will brown@willccbb·
veeery cool writeup digging into nuances of training, experimentation, and infra for multi-agent RL :)
NP@np_hard

As part of @PrimeIntellect's RL residency program, I've been exploring how to do multi-agent RL using their current stack (from verifiers + prime-rl to lab experiments with hosted training /evals) and thinking about how it could be extended to support these abstractions natively. I've summarized my findings the blogpost below and I'll leave a few comments here, too...

English
6
16
260
26.8K
NP
NP@np_hard·
I discuss some more details in the blogpost (nphard.io/2026/02/23/han…). I'm very excited to see what comes out of this, and related work in the residency, like @BillyHoy1_'s stuff - hopefully it will spark more work on open multi-agent RL!
English
1
3
14
757
NP
NP@np_hard·
As part of @PrimeIntellect's RL residency program, I've been exploring how to do multi-agent RL using their current stack (from verifiers + prime-rl to lab experiments with hosted training /evals) and thinking about how it could be extended to support these abstractions natively. I've summarized my findings the blogpost below and I'll leave a few comments here, too...
NP tweet media
English
7
45
355
43.3K
NP
NP@np_hard·
computer use
NP tweet media
English
0
0
2
123
NP retweetledi
Alex Wa
Alex Wa@_djdumpling·
new blog! What methodologies do labs use to train frontier models? The blog distills 7 open-weight model reports from frontier labs, covering architecture, stability, optimizers, data curation, pre/mid/post-training + RL, and behaviors/safety djdumpling.github.io/2026/01/31/fro…
Alex Wa tweet media
English
34
286
2K
280.1K
NP retweetledi
Prime Intellect
Prime Intellect@PrimeIntellect·
Introducing Lab: A full-stack platform for training your own agentic models Build, evaluate and train on your own environments at scale without managing the underlying infrastructure. Giving everyone their own frontier AI lab.
English
133
289
2.5K
749.9K
davinci
davinci@leothecurious·
i love @BerenMillidge's work and remember seeing his name on many of the most interesting works around PC, FEP, and active inference when i was deep down that rabbit hole. but then i came across the following surprising text in one of his blogs (thanks to @kparikh2001 for reminding me of it) and it just feels like a very premature pivot. it's probably the first thing i'd question him about if i ever got the chance.
davinci tweet mediadavinci tweet media
English
5
0
27
3.5K