
Arjun Kavi
784 posts



Introducing 𝑨𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔: Rethinking depth-wise aggregation. Residual connections have long relied on fixed, uniform accumulation. Inspired by the duality of time and depth, we introduce Attention Residuals, replacing standard depth-wise recurrence with learned, input-dependent attention over preceding layers. 🔹 Enables networks to selectively retrieve past representations, naturally mitigating dilution and hidden-state growth. 🔹 Introduces Block AttnRes, partitioning layers into compressed blocks to make cross-layer attention practical at scale. 🔹 Serves as an efficient drop-in replacement, demonstrating a 1.25x compute advantage with negligible (<2%) inference latency overhead. 🔹 Validated on the Kimi Linear architecture (48B total, 3B activated parameters), delivering consistent downstream performance gains. 🔗Full report: github.com/MoonshotAI/Att…

This is like a yacht-maximizing tax Similar to the capital-gains tax (neither tax should exist), the estate tax creates a huge incentive not to defer consumption



When I moved to new york, I found it hard to visualize what commute times actually looked like. The same dilemma occurs every time you move, or even book a hotel: what's actually accessible in 20 minutes of public transit? Deployment link below











One of the most persistent etymological quirks is how often people think that the history of a word comes from an acronym. It almost never does.











That thing is hideous. Can we all agree that luxury towers should not be erected on city owned land? The city wants to build a "600-ft.-tall apartment tower on city-owned land on Little West 12th Street in the Meatpacking District — which they call “Gansevoort Square” via @GVSHP












