Alex Wa

28 posts

Alex Wa

@_djdumpling

intern @modal, residency @primeintellect, @yale

Katılım Kasım 2022

363 Takip Edilen1.3K Takipçiler

Sabitlenmiş Tweet

Alex Wa@_djdumpling·18 Şub

new blog! What methodologies do labs use to train frontier models? The blog distills 7 open-weight model reports from frontier labs, covering architecture, stability, optimizers, data curation, pre/mid/post-training + RL, and behaviors/safety djdumpling.github.io/2026/01/31/fro…

English

285

284.6K

Alex Wa@_djdumpling·23h

Excited to have been able to contribute to a new WR! credit to the original paper: arxiv.org/abs/2603.09078

Larry Dial@classiclarryd

New NanoGPT Speedrun WR at 81.2 (-0.6s) from @_djdumpling , with learnable XSA. Per head learnable scalar to subtract out the portion of attn that is orthogonal to a token's own value vector. Applied to the 6 non paired head layers. github.com/KellerJordan/m…

English

4.4K

Alex Wa retweetledi

Alex Shan@alexshander03·12 May

We’re launching @JudgmentLabs today and announcing $32M in funding. As AI agents take on more of the work that creates economic value, they generate massive amounts of production data: the clearest record of how they behave with users, software, and the real world. Judgment builds infrastructure for improving AI agents from production data.

English

212

155

3.5M

Alex Wa retweetledi

Anish Lakkapragada@_anishlk·11 May

introducing tubestack: bringing substack’s zen to youtube a quiet, local desktop app to explore youtube without visual pollution. download here for free on mac & windows: anishlk.com/tubestack (typography taken from the pretty blogs of @AnthropicAI, @OpenAI, @X, @Substack)

English

1.7K

Alex Wa retweetledi

Prime Intellect@PrimeIntellect·30 Nis

Over the past months, Cohort I of our RL Residency has been shipping. Highlights - continual learning - automating AI research (from GPU programming to RL itself) - embodied environments - multi-agent systems - materials science discovery

English

320

58.9K

Alex Wa retweetledi

justin wang@jstwng·18 Nis

The AI Ecosystem Transaction site is live. On it, you can comb through 265 deals totaling $2.6+ trillion in disclosed volume – spanning across hyperscalers, neoclouds, frontier labs, chip designers, server OEMs, and more. Demo is below. Try it now at compute.jstwng.com.

English

Alex Wa retweetledi

Anish Lakkapragada@_anishlk·25 Mar

Last summer, I had been struggling for two weeks to rigorously prove my intuition on a stats project. A few days ago, I dug up the old latex file and spun up Aristotle (@harmonicmath), OpenGauss (@mathematics_inc), and Axiom Lean Engine (@axiommathai). All four theorems are now formally verified in Lean 4. Here's what happened 🧵 arXiv link: arxiv.org/abs/2603.20655

English

2.7K

Alex Wa retweetledi

NP@np_hard·23 Mar

As part of @PrimeIntellect's RL residency program, I've been exploring how to do multi-agent RL using their current stack (from verifiers + prime-rl to lab experiments with hosted training /evals) and thinking about how it could be extended to support these abstractions natively. I've summarized my findings the blogpost below and I'll leave a few comments here, too...

English

420

65.7K

Alex Wa@_djdumpling·19 Şub

@micpsst not explicitly, but there were some brief notes about Qwen 2.5, Qwen 3, and Qwen3-Next on dual chunk attention, hybrid models, chat templates, and data filtering

English

2.3K

Michal Pstrag@micpsst·19 Şub

@_djdumpling No qwen?

Filipino

Alex Wa@_djdumpling·18 Şub

English

285

284.6K

Alex Wa@_djdumpling·18 Şub

3. distilling R1 into small models beat large-scale RL on reasoning 4. increasing MoE sparsity yields perf improvements for fixed FLOPs (e.g: 8/384 in Kimi K2) 5. during R1-Zero's pure RL, reflective words like 'wait' spiked 5-7x would love feedback, especially corrections! :)

English

3.9K

Alex Wa@_djdumpling·18 Şub

interesting bits: 1. changing chat template token ('assistant'->'me') shifted Hermes 4's behavior to embody peer-like, consistent voices with higher behavioral plasticity 2. Kimi K2's MuonClip stabilizes attention logits via per-head clipping where softcapping/QK-norm fell short

English

5.3K

Alex Wa@_djdumpling·24 Ara

@creet_z Not to mention spotting a B300 only costs 25 cents more/hr than a H100

English

591

Christian@creet_z·24 Ara

Using spot 8xB200 for $8/hr feels illegal like I’m robbing someone, taking compute from a baby if you will

English

253

20.2K

Alex Wa@_djdumpling·18 Ara

@hallerite @arb8020 Great work!

English

141

hallerite@hallerite·17 Ara

Happy to finally share what I have been working on for some time now. Introducing »Ludic« – an LLM-RL library for the era of experience. While there are now a lot of LLM-RL codebases, even many good ones, I want to share my very idiosyncratic way to think about LLM-RL.

English

271

23.8K

Alex Wa@_djdumpling·16 Ara

@ccui9 it's also worth mentioning that the LLMs tend to choose from among the top of the move list instead of reasoning about all possible moves, which would also lead to convergent strategies

English

Alex Wa@_djdumpling·16 Ara

@ccui9 I forgot to mention this, but passing in legal actions seems to neutralize training; grok-4-fast, gpt-5.2, and grok-4 all got around 0.77 (1 rollout); I think there are some artifical hivemind ideas at play, where there strategies converge due to being given the same set of moves

English

Alex Wa@_djdumpling·16 Ara

New blog as a part of the @PrimeIntellect RL residency! 🧵 In Fruit box, a grid-based reasoning game, we find that post-training a small CNN policy outperforms LLMs, but only with legal action masks. Despite operating on token sequences, LLMs demonstrate strong spatial reasoning

English

123

22.2K

Alex Wa@_djdumpling·16 Ara

@creet_z Thanks goat :)

English

191

Christian@creet_z·16 Ara

>alex applies to prime intellect residency >links a single blog post on his site "whirlwind of PPO and RLHF for LLMs from scratch" but its a banger >bring him in as resident bc i want to see another one >sure enough, puts out Yet Another Banger

Alex Wa@_djdumpling

English

159

21.7K

Alex Wa@_djdumpling·16 Ara

Other ideas I’d love to see include continuous factorization with DDPG, testing VLMs due to their strong spatial priors, and interpreting attention traces+CNN features, and better credit assignment with value functions Thanks for reading, and any feedback is welcome!

English

951

Alex Wa@_djdumpling·16 Ara

The high-leverage fix: legal action masking. An "engineering pragmatism" lesson: don’t spend RL capacity relearning hard constraints. Enforce constraints, then let learning focus on strategy. With masking, the SFT policy beats all LLMs + most baselines, within ~6 pts of expert.

English

1.2K

Keşfet

@JudgmentLabs @AnthropicAI @OpenAI @X @Substack @harmonicmath @mathematics_inc @axiommathai