Alex Wa

28 posts

Alex Wa

Alex Wa

@_djdumpling

intern @modal, residency @primeintellect, @yale

Katılım Kasım 2022
363 Takip Edilen1.3K Takipçiler
Sabitlenmiş Tweet
Alex Wa
Alex Wa@_djdumpling·
new blog! What methodologies do labs use to train frontier models? The blog distills 7 open-weight model reports from frontier labs, covering architecture, stability, optimizers, data curation, pre/mid/post-training + RL, and behaviors/safety djdumpling.github.io/2026/01/31/fro…
Alex Wa tweet media
English
34
285
2K
284.6K
Alex Wa retweetledi
Alex Shan
Alex Shan@alexshander03·
We’re launching @JudgmentLabs today and announcing $32M in funding. As AI agents take on more of the work that creates economic value, they generate massive amounts of production data: the clearest record of how they behave with users, software, and the real world. Judgment builds infrastructure for improving AI agents from production data.
English
212
155
1K
3.5M
Alex Wa retweetledi
Anish Lakkapragada
Anish Lakkapragada@_anishlk·
introducing tubestack: bringing substack’s zen to youtube a quiet, local desktop app to explore youtube without visual pollution. download here for free on mac & windows: anishlk.com/tubestack (typography taken from the pretty blogs of @AnthropicAI, @OpenAI, @X, @Substack)
English
6
12
25
1.7K
Alex Wa retweetledi
Prime Intellect
Prime Intellect@PrimeIntellect·
Over the past months, Cohort I of our RL Residency has been shipping. Highlights - continual learning - automating AI research (from GPU programming to RL itself) - embodied environments - multi-agent systems - materials science discovery
English
4
33
320
58.9K
Alex Wa retweetledi
justin wang
justin wang@jstwng·
The AI Ecosystem Transaction site is live. On it, you can comb through 265 deals totaling $2.6+ trillion in disclosed volume – spanning across hyperscalers, neoclouds, frontier labs, chip designers, server OEMs, and more. Demo is below. Try it now at compute.jstwng.com.
English
13
15
41
5K
Alex Wa retweetledi
Anish Lakkapragada
Anish Lakkapragada@_anishlk·
Last summer, I had been struggling for two weeks to rigorously prove my intuition on a stats project. A few days ago, I dug up the old latex file and spun up Aristotle (@harmonicmath), OpenGauss (@mathematics_inc), and Axiom Lean Engine (@axiommathai). All four theorems are now formally verified in Lean 4. Here's what happened 🧵 arXiv link: arxiv.org/abs/2603.20655
Anish Lakkapragada tweet media
English
5
11
28
2.7K
Alex Wa retweetledi
NP
NP@np_hard·
As part of @PrimeIntellect's RL residency program, I've been exploring how to do multi-agent RL using their current stack (from verifiers + prime-rl to lab experiments with hosted training /evals) and thinking about how it could be extended to support these abstractions natively. I've summarized my findings the blogpost below and I'll leave a few comments here, too...
NP tweet media
English
9
50
420
65.7K
Alex Wa
Alex Wa@_djdumpling·
@micpsst not explicitly, but there were some brief notes about Qwen 2.5, Qwen 3, and Qwen3-Next on dual chunk attention, hybrid models, chat templates, and data filtering
English
0
0
3
2.3K
Alex Wa
Alex Wa@_djdumpling·
new blog! What methodologies do labs use to train frontier models? The blog distills 7 open-weight model reports from frontier labs, covering architecture, stability, optimizers, data curation, pre/mid/post-training + RL, and behaviors/safety djdumpling.github.io/2026/01/31/fro…
Alex Wa tweet media
English
34
285
2K
284.6K
Alex Wa
Alex Wa@_djdumpling·
3. distilling R1 into small models beat large-scale RL on reasoning 4. increasing MoE sparsity yields perf improvements for fixed FLOPs (e.g: 8/384 in Kimi K2) 5. during R1-Zero's pure RL, reflective words like 'wait' spiked 5-7x would love feedback, especially corrections! :)
English
1
1
24
3.9K
Alex Wa
Alex Wa@_djdumpling·
interesting bits: 1. changing chat template token ('assistant'->'me') shifted Hermes 4's behavior to embody peer-like, consistent voices with higher behavioral plasticity 2. Kimi K2's MuonClip stabilizes attention logits via per-head clipping where softcapping/QK-norm fell short
English
1
1
26
5.3K
Alex Wa
Alex Wa@_djdumpling·
@creet_z Not to mention spotting a B300 only costs 25 cents more/hr than a H100
English
0
0
5
591
Christian
Christian@creet_z·
Using spot 8xB200 for $8/hr feels illegal like I’m robbing someone, taking compute from a baby if you will
English
9
0
253
20.2K
hallerite
hallerite@hallerite·
Happy to finally share what I have been working on for some time now. Introducing »Ludic« – an LLM-RL library for the era of experience. While there are now a lot of LLM-RL codebases, even many good ones, I want to share my very idiosyncratic way to think about LLM-RL.
hallerite tweet media
English
15
33
271
23.8K
Alex Wa
Alex Wa@_djdumpling·
@ccui9 it's also worth mentioning that the LLMs tend to choose from among the top of the move list instead of reasoning about all possible moves, which would also lead to convergent strategies
English
0
0
1
52
Alex Wa
Alex Wa@_djdumpling·
@ccui9 I forgot to mention this, but passing in legal actions seems to neutralize training; grok-4-fast, gpt-5.2, and grok-4 all got around 0.77 (1 rollout); I think there are some artifical hivemind ideas at play, where there strategies converge due to being given the same set of moves
English
1
0
1
85
Alex Wa
Alex Wa@_djdumpling·
New blog as a part of the @PrimeIntellect RL residency! 🧵 In Fruit box, a grid-based reasoning game, we find that post-training a small CNN policy outperforms LLMs, but only with legal action masks. Despite operating on token sequences, LLMs demonstrate strong spatial reasoning
Alex Wa tweet media
English
2
9
123
22.2K
Christian
Christian@creet_z·
>alex applies to prime intellect residency >links a single blog post on his site "whirlwind of PPO and RLHF for LLMs from scratch" but its a banger >bring him in as resident bc i want to see another one >sure enough, puts out Yet Another Banger
Alex Wa@_djdumpling

New blog as a part of the @PrimeIntellect RL residency! 🧵 In Fruit box, a grid-based reasoning game, we find that post-training a small CNN policy outperforms LLMs, but only with legal action masks. Despite operating on token sequences, LLMs demonstrate strong spatial reasoning

English
3
8
159
21.7K
Alex Wa
Alex Wa@_djdumpling·
Other ideas I’d love to see include continuous factorization with DDPG, testing VLMs due to their strong spatial priors, and interpreting attention traces+CNN features, and better credit assignment with value functions Thanks for reading, and any feedback is welcome!
English
2
0
11
951
Alex Wa
Alex Wa@_djdumpling·
The high-leverage fix: legal action masking. An "engineering pragmatism" lesson: don’t spend RL capacity relearning hard constraints. Enforce constraints, then let learning focus on strategy. With masking, the SFT policy beats all LLMs + most baselines, within ~6 pts of expert.
English
1
1
13
1.2K