Kimi.ai: "📎We've uploaded it to arXiv, enjoy! arxiv.org/abs/2603.15031" | Zamantika Mersobahis Locabet

Post

Kimi.ai

Kimi.ai@Kimi_Moonshot·2d

📎We've uploaded it to arXiv, enjoy! arxiv.org/abs/2603.15031

Kimi.ai tweet media

Kimi.ai@Kimi_Moonshot

Introducing 𝑨𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔: Rethinking depth-wise aggregation. Residual connections have long relied on fixed, uniform accumulation. Inspired by the duality of time and depth, we introduce Attention Residuals, replacing standard depth-wise recurrence with learned, input-dependent attention over preceding layers. 🔹 Enables networks to selectively retrieve past representations, naturally mitigating dilution and hidden-state growth. 🔹 Introduces Block AttnRes, partitioning layers into compressed blocks to make cross-layer attention practical at scale. 🔹 Serves as an efficient drop-in replacement, demonstrating a 1.25x compute advantage with negligible (<2%) inference latency overhead. 🔹 Validated on the Kimi Linear architecture (48B total, 3B activated parameters), delivering consistent downstream performance gains. 🔗Full report: github.com/MoonshotAI/Att…

English

292

3.1K

192.9K

Danav

Danav@Randomflux·2d

@Kimi_Moonshot Hitting on the nail. Nice work

English

0

0

165

Bagikan