Eric Alcaide

1.1K posts

Eric Alcaide banner
Eric Alcaide

Eric Alcaide

@eric_alcaide

Design is not finished. Common prosperity. LLMaxxing @poolsideai

LLMaxxing Katılım Eylül 2016
1.1K Takip Edilen1.2K Takipçiler
Ali Hatamizadeh
Ali Hatamizadeh@ahatamiz1·
Gated DeltaNet-2 is here. 🚀 🔥 New paper: Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention Gated DeltaNet-2 outperforms KDA and Mamba-3, the latest and best recurrent architectures, head to head at 1.3B. 🏆 💡 Here's the idea behind it: Linear attention squeezes an unbounded KV cache into a fixed-size recurrent state. The hard part isn't just what to forget, it's how to edit that memory without scrambling the associations already in it. Prior delta-rule models like Gated DeltaNet and KDA use one scalar gate to do two jobs at once: erasing old content and writing new content. But these two decisions act on different axes of the state, so tying them together is a real limitation. Gated DeltaNet-2 decouples them. ✂️ a channel-wise erase gate b_t picks which key-side coordinates to read and remove ✍️ a channel-wise write gate w_t picks which value-side coordinates to commit 🔁 recovers KDA when both gates collapse to a scalar, and Gated DeltaNet when the decay collapses too ⚡ still trains fast: chunkwise WY algorithm with gate-aware backward, fused in Triton 📊 Results: We train 1.3B models on 100B tokens of FineWeb-Edu, matched in recurrent state size, against Mamba-2, Gated DeltaNet, KDA, and Mamba-3. Best average on language modeling + commonsense reasoning, in both recurrent and hybrid settings Biggest gains on long-context RULER retrieval. S-NIAH-3 jumps from 63 to 90 over KDA, and multi-key needle retrieval climbs from 28 to 38 Joint work with @YejinChoinka and @jankautz. 📄 Paper: shorturl.at/AAlVb 💻 Code: github.com/NVlabs/GatedDe… #LinearAttention #StateSpaceModels #Mamba #LLM
Ali Hatamizadeh tweet media
English
21
99
644
180.5K
Skywork
Skywork@Skywork_ai·
Introducing SkyClaw-v1.0: an agent model optimized for OpenClaw, Hermes, and Nanobot. Stronger tool use and multi-turn task execution. Available now alongside SkyClaw-v1.0-lite, our faster, lower-cost variant.
Skywork tweet media
English
26
37
426
88.4K
Mike Bradley
Mike Bradley@The_Only_Signal·
Up next, time to run some local AI on this mysterious golden paperweight from @nvidia
Mike Bradley tweet media
English
26
5
178
9.6K
Eric Alcaide
Eric Alcaide@eric_alcaide·
@LChoshen Training on this setup might actually be less efficient but you can train on more data and promote aligned latents
English
1
0
0
80
Leshem (Legend) Choshen 🤖🤗
@eric_alcaide Does anyone knows why none of those works compare to predicting multiple tokens? Or ablate the contribution of the input trick and the output trick separately? (Yes I see why given the input you need something on the output, but you can do it by order)
English
3
0
2
511
Eric Alcaide
Eric Alcaide@eric_alcaide·
@bekindtopeople2 Cool idea yea iguess its all about a tradeoff between information density and closeness to the target distribution
English
0
0
0
75
Pan 🇵🇸
Pan 🇵🇸@bekindtopeople2·
@eric_alcaide Has anyone tried patch schedules? Why would they not ablate 3/4/N patch length?
English
1
0
1
385
Eric Alcaide retweetledi
Poolside
Poolside@poolsideai·
ok this is sick @pupposandro @davideciffa and @luceboxai got Laguna XS.2 running on a single RTX 3090 with ~111 tok/s decode, 5.4x faster 128K prefill vs llama.cpp, and made it the first MoE target for PFlash open weights doing open weights things
English
4
10
77
4.2K
Eric Alcaide
Eric Alcaide@eric_alcaide·
Great things are happening and uou can feel it 🔥
Sandro@pupposandro

PFlash now run @poolsideai's Laguna-XS.2 (33B-A3B MoE) on a single RTX 3090. - 111 tok/s decode @ short ctx - 128K TTFT in 15.91s, 5.4x faster prefill vs llama.cpp - NIAH passes every (ctx, keep) point up to 131K - first MoE target supported by PFlash - hand-rolled CUDA, ggml only, no libllama great collab w/ @eisokant, @eric_alcaide, and the rest of the @poolsideai team. looking forward to working more on their great coding models. repo + GGUF in first comment.

English
1
3
9
1.8K
Eric Alcaide
Eric Alcaide@eric_alcaide·
SGLang team is cracked. Respect 🫡
LMSYS Org@lmsysorg

🌊 SGLang now supports @poolsideai's Laguna-XS.2, a 33.4B-A3B hybrid SWA + MoE model purpose-built for agentic coding and long-horizon SWE work ☑️ SWE-bench Verified 68.2%; Multilingual 62.4%; Pro 44.5%; Terminal-Bench 2.0 30.1% ☑️ 131K-token context for long agent traces ☑️ Native poolside_v1 reasoning + tool-call parsers (OpenAI-compatible) ☑️ BF16, FP8, and NVFP4 quantizations 👉 Cookbook: docs.sglang.io/cookbook/autor…

English
0
5
19
3.3K
Eric Alcaide
Eric Alcaide@eric_alcaide·
@MrCatid yea lol claiming novelty for this work is an overstatement
English
0
0
0
7
catid
catid@MrCatid·
Same loss curves
catid tweet mediacatid tweet media
English
1
0
1
117
Eric Alcaide retweetledi
Poolside
Poolside@poolsideai·
Poolside is hosting a 2-day model research hackathon in London. Join us to push an open-weight agent model as far as you can. RL and fine-tune Laguna XS.2, our latest-generation model, on Prime Intellect Lab. Dates: May 29–30 Partners: @nvidia + @PrimeIntellect + @huggingface Prize: NVIDIA DGX Spark Agents need better models. Better models need cracked researchers. Link below.
English
27
45
231
90.7K
Eric Alcaide
Eric Alcaide@eric_alcaide·
@Raubertard @cheenanet Doesn’t need linear attention kernels and can have prefix caching without storing copies of the state
English
1
0
1
43
Eric Alcaide
Eric Alcaide@eric_alcaide·
The reason we can tell OpenAI has not reached AGI is how hard it is to find their invoice 👀
English
0
0
1
135
Eric Alcaide
Eric Alcaide@eric_alcaide·
@stableAPY Tell us back your thoughts and we might improve for the next one :)
English
0
0
2
25