Robert McHardy @ ICLR 2026 🏖️

576 posts

Robert McHardy @ ICLR 2026 🏖️ banner
Robert McHardy @ ICLR 2026 🏖️

Robert McHardy @ ICLR 2026 🏖️

@robert_mchardy

Pre-training Lead @poolsideai | Freelance editor @ComputerBase | Prev @AssemblyAI, @instadeepai, @ucl_dark, @Uni_Stuttgart, @Bosch_AI

London, UK Katılım Aralık 2009
473 Takip Edilen573 Takipçiler
Sabitlenmiş Tweet
Robert McHardy @ ICLR 2026 🏖️
Super excited to release the weights of Laguna XS.2 under Apache 2.0 to the community today. We are also opening access to our most capable model, Laguna M.1, via our API. Both models are designed for long-horizon agentic coding with strong performance on SWE-bench-style tasks. Learn more on our blog.
poolside@poolsideai

Today we’re releasing Laguna XS.2, Poolside’s first open-weight model. It’s a 33B total / 3B active MoE model built for agentic coding and long-horizon tasks. Trained fully in-house on our own stack. Runs on a single GPU. Released under Apache 2.0. Links 👇 Weights: huggingface.co/poolside/Lagun… API: platform.poolside.ai Blog: poolside.ai/blog/laguna-a-…

English
0
3
29
971
Robert McHardy @ ICLR 2026 🏖️
@teortaxesTex XS.2 was 20 days flat in pre-training, not pure training time though since we paused for a bit to run some additional experiments. Iirc it was 14 days pure training time (and it was 30T tokens, not 33T)
English
0
0
1
93
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
225B total 23B activated. 6-8K H200s 30B total 3B active, 33T tokens: 20-something days on 2K Man, this really puts the V3-671B's feat (55 days) into perspective a year and a half ago, those guys could push H800s to the level H200s normally can't touch (they're improving though)
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) tweet media
Eiso Kant@eisokant

@qubitium @poolsideai M.1 was 6-8k (mostly 6k but we wanted to speed up near the end, so scaled up) XS.2 was 2k

English
2
2
56
7.3K
Robert McHardy @ ICLR 2026 🏖️ retweetledi
Sebastian Raschka
Here is a 2nd batch of April architecture drops. What a month! - Ant Ling 2.6 1T - Minimax M2.7 - Xiaomi MiMo V2.5 - Poolside Laguna XS.2 - Tencent Hy3-preview - IBM Granite 4.1
Sebastian Raschka tweet media
English
17
105
764
34K
Robert McHardy @ ICLR 2026 🏖️ retweetledi
Eiso Kant
Eiso Kant@eisokant·
We are an American company with a global team and global aspirations. The story of @poolsideai is that early on in the life of the company we decided to focus on building out our applied research org. in Europe. That’s been the seed for an amazing team and a competitive advantage. Today we have team members all over the world, Europe and US are roughly equal in size, Asia is growing. Three years ago we thought France would be a great place to build from but in the early days found our hiring happened all across Europe instead. Today we have less than a handful of folks in France but large teams in Europe in London, Amsterdam, Zurich etc. We operate as a remote first company but have an office in Paris where we do monthly on-sites (it is great logistics for this) and an office in London which is used on a daily basis. When we raised capital early on, the vast majority of it came from US investors but European (including French ones) have been a part of our rounds. When we got to France (almost 3 years ago) we were offered significant double digit million research grants. My cofounder @jasoncwarner and I did not feel comfortable accepting the grants when we realized that France was going to be only a small part of our story. So we respectfully turned them down. France keeps a special place in our hearts but we’re a global company with global aspirations.
Chris O'Brien@obrien

People refer to @poolsideai as a "French" company, and I know there was some hype about them moving their HQ here a few years ago, but I don't think they are, unless I'm missing something. Based on LinkedIn (7 employees in France) and the company's job listings:

English
7
12
112
12.8K
elie
elie@eliebakouch·
nice oss release from poolside! i really like the arch :) some "unusual" hidden modeling details: > they use per head gated attention like qwen or stepfun, people usually use sigmoid, but they use softplus! they also use it on both swa and full attention which is nice! this is also why they don't use sink > there is a "moe_router_logit_softcapping" parameter in the config set to 0 by default, but this is non-standard i think, means that they likely saw some instabilities in the router scores? this also changes the bias strength in the top-k since they do it after > more attention heads on swa layers, xiaomi v2 flash did higher kv heads for swa layers > they use a different beta_fast for yarn than most models (64 instead of 32) and partial (half) rope for the full attention layers
poolside@poolsideai

Today we’re releasing Laguna XS.2, Poolside’s first open-weight model. It’s a 33B total / 3B active MoE model built for agentic coding and long-horizon tasks. Trained fully in-house on our own stack. Runs on a single GPU. Released under Apache 2.0. Links 👇 Weights: huggingface.co/poolside/Lagun… API: platform.poolside.ai Blog: poolside.ai/blog/laguna-a-…

English
2
4
92
7.8K
Robert McHardy @ ICLR 2026 🏖️
@andrew_n_carr @sakurayukiai Our implementation for this is actually from June last year or so, back then no correct & efficient open-source implementations existed. I remember one that was very fast but only so due to a total of 4 (iirc) different race conditions in the code
English
0
0
2
18
Sakura Yuki
Sakura Yuki@sakurayukiai·
Wait, Poolside trained a 225B MoE from scratch using the Muon optimizer instead of AdamW?? They distributed the Newton-Schulz math across GPU ranks to keep overhead under 1%. 15% faster convergence and half the optimizer VRAM. I need to see this training code immediately 👀
English
10
10
237
21K
Robert McHardy @ ICLR 2026 🏖️ retweetledi
Robert McHardy @ ICLR 2026 🏖️ retweetledi
Robert McHardy @ ICLR 2026 🏖️ retweetledi
Matthias Gallé
Matthias Gallé@mgalle·
In my short time at Poolside I was involved in 3 model launches (one internal), and it is impressive how industrialized the process is The XS release is most iconic: from nothing to release in 5 weeks
English
1
5
30
4.6K
Robert McHardy @ ICLR 2026 🏖️ retweetledi
OpenRouter
OpenRouter@OpenRouter·
The first public foundation models from @poolsideai just dropped on OpenRouter! Laguna M.1 and Laguna XS.2. Built from scratch for agentic coding and long-horizon work. Free for a limited time ⬇️
OpenRouter tweet media
English
20
28
350
55.2K
Sebastian Raschka
Sebastian Raschka@rasbt·
April was a pretty strong month for LLM releases: - Gemma 4 - GLM-5.1 - Qwen3.6 - Kimi K2.6 - DeepSeek V4 All are now added to the LLM Architecture Gallery. More details once I am fully back in May!
Sebastian Raschka tweet media
English
73
437
3K
120.8K
sankalp
sankalp@dejavucoder·
@poolsideai looks like a very good release. do you plan to publish a tech report?
English
1
0
2
1.3K
Robert McHardy @ ICLR 2026 🏖️ retweetledi
Pengming Wang
Pengming Wang@PengmingWang·
Today we’re releasing Laguna M.1 and Laguna XS.2, our first public models. Laguna XS.2 is our first open-weight release, with weights available today on Hugging Face: huggingface.co/poolside/Lagun… A few details on what went into them: large-scale pre-training, data mixture optimization, synthetic data, optimizer efficiency, and async agent RL.
English
11
26
225
20K
Robert McHardy @ ICLR 2026 🏖️ retweetledi
Eiso Kant
Eiso Kant@eisokant·
Today we’re shipping Laguna M.1 and Laguna XS.2 – our first public models. We’re also shipping our agent harness and a preview product experience. Both models were trained from scratch on our own stack: data pipelines, training infrastructure, and agent RL.
English
38
69
509
78.2K
Robert McHardy @ ICLR 2026 🏖️ retweetledi
Jean Kaddour @ ICLR 2026
Jean Kaddour @ ICLR 2026@jeankaddour·
Introducing Target Policy Optimization (TPO): TPO turns GRPO into supervised learning: build a target distribution over sampled completions, then fit with cross-entropy. The gradient vanishes once the target is matched, making multi-epoch training smooth. 🧵(1/4)
English
11
66
494
37.1K