Arkadii

@ArkadiiBessonov

Pre-training @poolsideai | Ex @yandex

Inscrit le Nisan 2026

35 Abonnements258 Abonnés

Arkadii@ArkadiiBessonov·2d

Full write-up — every recipe, every matmul, drawn out: arkadii.be/blog/fp8-quant…

English

18.8K

Arkadii@ArkadiiBessonov·2d

Three main ways to do FP8 in LLM pretraining — and they differ in mainly one thing: how the scale is attached. per-tensor vs blockwise vs MXFP8. Why pretraining has so much structure here: forward + backward is 3 matmuls (Fprop, Dgrad, Wgrad) across 3 tensor roles (weights, activations, gradients). Each role wants its own scale layout — and that's where all the complexity lives. The three recipes differ in how the scale is attached — granularity, dtype, layout: — Per-tensor: one scale for the whole tensor. Simplest, least robust to outliers. — Blockwise: 1×128 / 128×128 tiles, FP32 scales. The DeepSeek-V3 style. — MXFP8: 1×32 blocks + E8M0 scale. Native on Blackwell. One rule ties it all together: the scale must stay constant along the matmul's contracted dimension. That single constraint derives every tile geometry above — nothing here is arbitrary. I drew every layout out, per recipe and per matmul, so the geometry is concrete instead of hand-wavy. Full walkthrough in my blogpost (link in comments)!

English

158

31.6K

Arkadii retweeté

Poolside@poolsideai·26 May

Today we’re publishing the technical report behind Laguna M.1 and Laguna XS.2. This report opens up more of what went into them: Model Factory, pre-training data, distributed training, post-training, agent RL, quantization, and evaluation. poolside.ai/assets/laguna/…

English

429

331.7K

Arkadii retweeté

Poolside@poolsideai·28 Nis

Today we’re releasing Laguna XS.2, Poolside’s first open-weight model. It’s a 33B total / 3B active MoE model built for agentic coding and long-horizon tasks. Trained fully in-house on our own stack. Runs on a single GPU. Released under Apache 2.0. Links 👇 Weights: huggingface.co/poolside/Lagun… API: platform.poolside.ai Blog: poolside.ai/blog/laguna-a-…

English

140

807

275.2K

Découvrir

@elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine @katyperry