Robert McHardy @ ICLR 2026 🏖️ (@robert_mchardy) - Twitter Profili

Sabitlenmiş Tweet

Robert McHardy @ ICLR 2026 🏖️@robert_mchardy·6d

Super excited to release the weights of Laguna XS.2 under Apache 2.0 to the community today. We are also opening access to our most capable model, Laguna M.1, via our API. Both models are designed for long-horizon agentic coding with strong performance on SWE-bench-style tasks. Learn more on our blog.

poolside@poolsideai

Today we’re releasing Laguna XS.2, Poolside’s first open-weight model. It’s a 33B total / 3B active MoE model built for agentic coding and long-horizon tasks. Trained fully in-house on our own stack. Runs on a single GPU. Released under Apache 2.0. Links 👇 Weights: huggingface.co/poolside/Lagun… API: platform.poolside.ai Blog: poolside.ai/blog/laguna-a-…

English

0

3

29

971

Robert McHardy @ ICLR 2026 🏖️@robert_mchardy·1d

@teortaxesTex XS.2 was 20 days flat in pre-training, not pure training time though since we paused for a bit to run some additional experiments. Iirc it was 14 days pure training time (and it was 30T tokens, not 33T)

English

0

1

93

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex·1d

225B total 23B activated. 6-8K H200s 30B total 3B active, 33T tokens: 20-something days on 2K Man, this really puts the V3-671B's feat (55 days) into perspective a year and a half ago, those guys could push H800s to the level H200s normally can't touch (they're improving though)

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) tweet media

Eiso Kant@eisokant

@qubitium @poolsideai M.1 was 6-8k (mostly 6k but we wanted to speed up near the end, so scaled up) XS.2 was 2k

English

2

56

7.3K

Robert McHardy @ ICLR 2026 🏖️ retweetledi

Sebastian Raschka@rasbt·1d

Here is a 2nd batch of April architecture drops. What a month! - Ant Ling 2.6 1T - Minimax M2.7 - Xiaomi MiMo V2.5 - Poolside Laguna XS.2 - Tencent Hy3-preview - IBM Granite 4.1

English

17

105

764

34K

Robert McHardy @ ICLR 2026 🏖️ retweetledi

Eiso Kant@eisokant·4d

We are an American company with a global team and global aspirations. The story of @poolsideai is that early on in the life of the company we decided to focus on building out our applied research org. in Europe. That’s been the seed for an amazing team and a competitive advantage. Today we have team members all over the world, Europe and US are roughly equal in size, Asia is growing. Three years ago we thought France would be a great place to build from but in the early days found our hiring happened all across Europe instead. Today we have less than a handful of folks in France but large teams in Europe in London, Amsterdam, Zurich etc. We operate as a remote first company but have an office in Paris where we do monthly on-sites (it is great logistics for this) and an office in London which is used on a daily basis. When we raised capital early on, the vast majority of it came from US investors but European (including French ones) have been a part of our rounds. When we got to France (almost 3 years ago) we were offered significant double digit million research grants. My cofounder @jasoncwarner and I did not feel comfortable accepting the grants when we realized that France was going to be only a small part of our story. So we respectfully turned them down. France keeps a special place in our hearts but we’re a global company with global aspirations.

Chris O'Brien@obrien

People refer to @poolsideai as a "French" company, and I know there was some hype about them moving their HQ here a few years ago, but I don't think they are, unless I'm missing something. Based on LinkedIn (7 employees in France) and the company's job listings:

English

7

12

112

12.8K

Robert McHardy @ ICLR 2026 🏖️@robert_mchardy·5d

@eliebakouch We actually didn't have any problems with the routers and this argument is/was unused in training, but nice hypothesis!

English

0

41

elie@eliebakouch·5d

nice oss release from poolside! i really like the arch :) some "unusual" hidden modeling details: > they use per head gated attention like qwen or stepfun, people usually use sigmoid, but they use softplus! they also use it on both swa and full attention which is nice! this is also why they don't use sink > there is a "moe_router_logit_softcapping" parameter in the config set to 0 by default, but this is non-standard i think, means that they likely saw some instabilities in the router scores? this also changes the bias strength in the top-k since they do it after > more attention heads on swa layers, xiaomi v2 flash did higher kv heads for swa layers > they use a different beta_fast for yarn than most models (64 instead of 32) and partial (half) rope for the full attention layers

poolside@poolsideai

Today we’re releasing Laguna XS.2, Poolside’s first open-weight model. It’s a 33B total / 3B active MoE model built for agentic coding and long-horizon tasks. Trained fully in-house on our own stack. Runs on a single GPU. Released under Apache 2.0. Links 👇 Weights: huggingface.co/poolside/Lagun… API: platform.poolside.ai Blog: poolside.ai/blog/laguna-a-…

English

2

4

92

7.8K

Robert McHardy @ ICLR 2026 🏖️@robert_mchardy·5d

@erikqu_ We mean starting from random weights by from scratch if that answers your question

English

0

6

79

Erik Quintanilla@erikqu_·5d

I wish there existed a method to compare what “training from scratch” truly means

OpenRouter@OpenRouter

The first public foundation models from @poolsideai just dropped on OpenRouter! Laguna M.1 and Laguna XS.2. Built from scratch for agentic coding and long-horizon work. Free for a limited time ⬇️

English

1

0

190

Robert McHardy @ ICLR 2026 🏖️@robert_mchardy·5d

@andrew_n_carr @sakurayukiai Our implementation for this is actually from June last year or so, back then no correct & efficient open-source implementations existed. I remember one that was very fast but only so due to a total of 4 (iirc) different race conditions in the code

English

0

2

18

Andrew Carr 🤸@andrew_n_carr·6d

@sakurayukiai Yeah, this is legit stuff

English

1

0

4

898

Sakura Yuki@sakurayukiai·6d

Wait, Poolside trained a 225B MoE from scratch using the Muon optimizer instead of AdamW?? They distributed the Newton-Schulz math across GPU ranks to keep overhead under 1%. 15% faster convergence and half the optimizer VRAM. I need to see this training code immediately 👀

English

10

237

21K

Robert McHardy @ ICLR 2026 🏖️ retweetledi

NVIDIA AI@NVIDIAAI·6d

Some more amazing models launched today! Congrats to @poolsideai on the release of Laguna XS.2 and Laguna M.1

poolside@poolsideai

Today we’re releasing Laguna XS.2, Poolside’s first open-weight model. It’s a 33B total / 3B active MoE model built for agentic coding and long-horizon tasks. Trained fully in-house on our own stack. Runs on a single GPU. Released under Apache 2.0. Links 👇 Weights: huggingface.co/poolside/Lagun… API: platform.poolside.ai Blog: poolside.ai/blog/laguna-a-…

English

4

33

221

19.3K

Robert McHardy @ ICLR 2026 🏖️ retweetledi

Niels Rogge@NielsRogge·6d

First open release by Poolside (upcoming LLM lab) 🙌 Coming in as #12 on SWE-Bench Pro, not too far behind Qwen3.6 🔥

poolside@poolsideai

Today we’re releasing Laguna XS.2, Poolside’s first open-weight model. It’s a 33B total / 3B active MoE model built for agentic coding and long-horizon tasks. Trained fully in-house on our own stack. Runs on a single GPU. Released under Apache 2.0. Links 👇 Weights: huggingface.co/poolside/Lagun… API: platform.poolside.ai Blog: poolside.ai/blog/laguna-a-…

English

5

15

93

13.8K

Robert McHardy @ ICLR 2026 🏖️ retweetledi

Matthias Gallé@mgalle·6d

In my short time at Poolside I was involved in 3 model launches (one internal), and it is impressive how industrialized the process is The XS release is most iconic: from nothing to release in 5 weeks

English

1

5

30

4.6K

Robert McHardy @ ICLR 2026 🏖️ retweetledi

OpenRouter@OpenRouter·6d

The first public foundation models from @poolsideai just dropped on OpenRouter! Laguna M.1 and Laguna XS.2. Built from scratch for agentic coding and long-horizon work. Free for a limited time ⬇️

English

20

28

350

55.2K

Robert McHardy @ ICLR 2026 🏖️@robert_mchardy·6d

@rasbt April isn't over yet, @rasbt 😄 x.com/poolsideai/sta…

poolside@poolsideai

Today we’re releasing Laguna XS.2, Poolside’s first open-weight model. It’s a 33B total / 3B active MoE model built for agentic coding and long-horizon tasks. Trained fully in-house on our own stack. Runs on a single GPU. Released under Apache 2.0. Links 👇 Weights: huggingface.co/poolside/Lagun… API: platform.poolside.ai Blog: poolside.ai/blog/laguna-a-…

English

0

1

46

Sebastian Raschka@rasbt·26 Nis

April was a pretty strong month for LLM releases: - Gemma 4 - GLM-5.1 - Qwen3.6 - Kimi K2.6 - DeepSeek V4 All are now added to the LLM Architecture Gallery. More details once I am fully back in May!

English

73

437

3K

120.8K

Robert McHardy @ ICLR 2026 🏖️@robert_mchardy·6d

@PMinervini You should check out Laguna XS.2, Pasquale! Fresh out the oven x.com/poolsideai/sta…

poolside@poolsideai

Today we’re releasing Laguna XS.2, Poolside’s first open-weight model. It’s a 33B total / 3B active MoE model built for agentic coding and long-horizon tasks. Trained fully in-house on our own stack. Runs on a single GPU. Released under Apache 2.0. Links 👇 Weights: huggingface.co/poolside/Lagun… API: platform.poolside.ai Blog: poolside.ai/blog/laguna-a-…

English

1

8

383

Pasquale Minervini@PMinervini·6d

Some quick stats on what local models may work for you if you are an NLP/ML person:

Pasquale Minervini@PMinervini

Done! I extended the benchmark a bit with stuff we care about in my group (transformers, jax, mechinterp, graph ML), and Qwen3.6-27B 8bit with opencode absolutely nails it

English

3

1

8

1.2K

Robert McHardy @ ICLR 2026 🏖️@robert_mchardy·6d

@dejavucoder @poolsideai that's the plan indeed!

English

0

3

88

sankalp@dejavucoder·6d

@poolsideai looks like a very good release. do you plan to publish a tech report?

English

1

0

2

1.3K

poolside@poolsideai·6d

Today we’re releasing Laguna XS.2, Poolside’s first open-weight model. It’s a 33B total / 3B active MoE model built for agentic coding and long-horizon tasks. Trained fully in-house on our own stack. Runs on a single GPU. Released under Apache 2.0. Links 👇 Weights: huggingface.co/poolside/Lagun… API: platform.poolside.ai Blog: poolside.ai/blog/laguna-a-…

English

45

143

800

258.1K

Robert McHardy @ ICLR 2026 🏖️@robert_mchardy·6d

@ctnzr Thank you, Bryan! 💪 Very excited to share this model with the world

English

0

4

111

Bryan Catanzaro@ctnzr·6d

Love to see more open models coming out! Great work from Poolside

Eiso Kant@eisokant

Today we’re shipping Laguna M.1 and Laguna XS.2 – our first public models. We’re also shipping our agent harness and a preview product experience. Both models were trained from scratch on our own stack: data pipelines, training infrastructure, and agent RL.

English

2

5

51

3.6K

Robert McHardy @ ICLR 2026 🏖️ retweetledi

Pengming Wang@PengmingWang·6d

Today we’re releasing Laguna M.1 and Laguna XS.2, our first public models. Laguna XS.2 is our first open-weight release, with weights available today on Hugging Face: huggingface.co/poolside/Lagun… A few details on what went into them: large-scale pre-training, data mixture optimization, synthetic data, optimizer efficiency, and async agent RL.

English

11

26

225

20K

Robert McHardy @ ICLR 2026 🏖️@robert_mchardy·6d

@poolsideai Let's go! 🔥

English

0

4

529

Robert McHardy @ ICLR 2026 🏖️ retweetledi

Eiso Kant@eisokant·6d

Today we’re shipping Laguna M.1 and Laguna XS.2 – our first public models. We’re also shipping our agent harness and a preview product experience. Both models were trained from scratch on our own stack: data pipelines, training infrastructure, and agent RL.

English

38

69

509

78.2K

Robert McHardy @ ICLR 2026 🏖️ retweetledi

poolside@poolsideai·20 Nis

Poolside is heading to ICLR 2026 (@iclr_conf) in Rio this week 🇧🇷 @robert_mchardy , @iamgrigorev, @marah_i_abdin, @helloo_polly, @maxwegman, @SzymonOzog_ and @ArkadiiBessonov will be there from the team. If you’re working on training infra, synthetic data, coding models, RL, or anything at the frontier — come find us!

English

1

6

31

5K

Robert McHardy @ ICLR 2026 🏖️ retweetledi

Jean Kaddour @ ICLR 2026@jeankaddour·16 Nis

Introducing Target Policy Optimization (TPO): TPO turns GRPO into supervised learning: build a target distribution over sampled completions, then fit with cross-entropy. The gradient vanishes once the target is matched, making multi-epoch training smooth. 🧵(1/4)

English

11

66

494

37.1K

Robert McHardy @ ICLR 2026 🏖️

Keşfet