KIMI-乔任梁💬🌟🔥

65 posts

KIMI-乔任梁💬🌟🔥

@KimiMeme_VIP

网址：https://t.co/NLFS4vJGno 委派给: @jianjun43860305 和 @Aileen971107 管理

Se unió Aralık 2010

16 Siguiendo161 Seguidores

Tweet fijado

KIMI-乔任梁💬🌟🔥@KimiMeme_VIP·11 Kas

更新合集：⭐️购买合集 - linktr.ee/KimiMeme.VIP ⭐️⭐️🔥芝麻交易所Gate Alaha Buy - gate.com/zh/alpha/sol-5…

KIMI-乔任梁💬🌟🔥@KimiMeme_VIP

注册购买视频合集教程，关注:@_Sicongwang @Si6399 @Si6366 电报群:t.me/JNBSi6399 芝麻:gateio24.com/referral/invit… OKX⭐:okx.com/join/71550227 币安:accounts.suitechsui.online/register?ref=R… 合约CA:5RAdU74DWZEreD7n2V9UAsrx47k8za6QSB3s5KFMpump

中文

9.9K

KIMI-乔任梁💬🌟🔥@KimiMeme_VIP·17 Mar

@elonmusk @_avichawla Kimi isn't just an AI, but also a commemorative coin from history Kimi不只是人工智能AI，而且还是历史上一枚纪念币

中文

137

KIMI-乔任梁💬🌟🔥 retuiteado

Elon Musk@elonmusk·16 Mar

@_avichawla Impressive work from Kimi

English

297

194

3.5K

460.9K

Avi Chawla@_avichawla·16 Mar

Big release from Kimi! They just released a new way to handle residual connections in Transformers. In a standard Transformer, every sub-layer (attention or MLP) computes an output and adds it back to the input via a residual connection. If you consider this across 40+ layers, the hidden state at any layer is just the equal-weighted sum of all previous layer outputs. Every layer contributes with weight=1, so every layer gets equal importance. This creates a problem called PreNorm dilution, where as the hidden state accumulates layer after layer, its magnitude grows linearly with depth. And any new layer's contribution gets progressively buried in the already-massive residual. This means deeper layers are then forced to produce increasingly large outputs just to have any influence, which destabilizes training. Here's what the Kimi team observed and did: RNNs compress all prior token information into a single state across time, leading to problems with handling long-range dependencies. And residual connections compress all prior layer information into a single state across depth. Transformers solved the first problem by replacing recurrence with attention. This was applied along the sequence dimension. Now they introduced Attention Residuals, which applies a similar idea to depth. Instead of adding all previous layer outputs with a fixed weight of 1, each layer now uses softmax attention to selectively decide how much weight each previous layer's output should receive. So each layer gets a single learned query vector, and it attends over all previous layer outputs to compute a weighted combination. The weights are input-dependent, so different tokens can retrieve different layer representations based on what's actually useful. This is Full Attention Residuals (shown in the second diagram below). But here's the practical problem with this idea. Full AttnRes requires keeping all layer outputs in memory and communicating them across pipeline stages during distributed training. To solve this, they introduce Block Attention Residuals (shown in the third diagram below). The idea is to group consecutive layers into roughly 8 blocks. Within each block, layer outputs are summed via standard residuals. But across blocks, the attention mechanism selectively combines block-level representations. This drops memory from O(Ld) to O(Nd), where N is the number of blocks. Layers within the current block can also attend to the partial sum of what's been computed so far inside that block, so local information flow isn't lost. And the raw token embedding is always available as a separate source, which means any layer in the network can selectively reach back to the original input. Results from the paper: - Block AttnRes matches the loss of a baseline LLM trained with 1.25x more compute. - Inference latency overhead is less than 2%, making it a practical drop-in replacement - On a 48B parameter Kimi Linear model (3B activated) trained on 1.4T tokens, it improved every benchmark they tested: GPQA-Diamond +7.5, Math +3.6, HumanEval +3.1, MMLU +1.1 The residual connection has mostly been unchanged since ResNet in 2015. This might be the first modification that's both theoretically motivated and practically deployable at scale with negligible overhead. More details in the post below by Kimi👇 ____ Find me → @_avichawla Every day, I share tutorials and insights on DS, ML, LLMs, and RAGs.

Kimi.ai@Kimi_Moonshot

Introducing 𝑨𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔: Rethinking depth-wise aggregation. Residual connections have long relied on fixed, uniform accumulation. Inspired by the duality of time and depth, we introduce Attention Residuals, replacing standard depth-wise recurrence with learned, input-dependent attention over preceding layers. 🔹 Enables networks to selectively retrieve past representations, naturally mitigating dilution and hidden-state growth. 🔹 Introduces Block AttnRes, partitioning layers into compressed blocks to make cross-layer attention practical at scale. 🔹 Serves as an efficient drop-in replacement, demonstrating a 1.25x compute advantage with negligible (<2%) inference latency overhead. 🔹 Validated on the Kimi Linear architecture (48B total, 3B activated parameters), delivering consistent downstream performance gains. 🔗Full report: github.com/MoonshotAI/Att…

English

218

2.3K

348.6K

KIMI-乔任梁💬🌟🔥@KimiMeme_VIP·17 Mar

Kimi isn't just an AI, but also a commemorative coin from history Kimi不只是人工智能AI，而且还是历史上一枚纪念币

Elon Musk@elonmusk

@_avichawla Impressive work from Kimi

中文

420

KIMI-乔任梁💬🌟🔥@KimiMeme_VIP·17 Mar

@elonmusk kimi

Türkçe

117

Elon Musk@elonmusk·17 Mar

True. New Roadster unveil probably in late April.

Massimo@Rainmaker1973

Today in 2008, Tesla began the regular production on its first car, the Tesla Roadster. Also, a reminder that there's one Tesla Roadster in an elliptical heliocentric orbit crossing the orbit of Mars.

English

4.8K

7.2K

74.7K

25.2M

KIMI-乔任梁💬🌟🔥@KimiMeme_VIP·17 Mar

#KIMI KIMI kimi 🚀🚀🚀🚀 kimimeme.pages.dev ca:5RAdU74DWZEreD7n2V9UAsrx47k8za6QSB3s5KFMpump

Elon Musk@elonmusk

@_avichawla Impressive work from Kimi

Polski

205

KIMI-乔任梁💬🌟🔥@KimiMeme_VIP·17 Mar

@elonmusk @_avichawla kimi 🌝🌝

Türkçe

126

KIMI-乔任梁💬🌟🔥@KimiMeme_VIP·16 Oca

@RonFilipkowski

QME

Ron Filipkowski@RonFilipkowski·16 Oca

Grinning like a 5-year old getting his participation trophy at the post-season T-ball pizza party.

English

4.1K

4.3K

29.8K

1.2M

KIMI-乔任梁💬🌟🔥@KimiMeme_VIP·16 Oca

@WhiteHouse @MariaCorinaYA

QME

The White House@WhiteHouse·16 Oca

President Donald J. Trump meets with María Corina Machado of Venezuela in the Oval Office, during which she presented the President with her Nobel Peace Prize in recognition and honor.🕊️

English

18.7K

16.5K

84.6K

10.4M

KIMI-乔任梁💬🌟🔥@KimiMeme_VIP·16 Oca

@WhiteHouse @MariaCorinaYA 👀❤️❤️❤️❤️

QME

KIMI-乔任梁💬🌟🔥@KimiMeme_VIP·16 Oca

@WhiteHouse @MariaCorinaYA Aren’t you embarrassed posting this?👀❤️

English

553

KIMI-乔任梁💬🌟🔥@KimiMeme_VIP·16 Oca

@nikitabier @ripchillpill @nikitabier ↑↑ 👍

KIMI-乔任梁💬🌟🔥@KimiMeme_VIP·16 Oca

@nikitabier @ripchillpill At the beginning, group members should also be able to freely modify the deletion time of private messages. Instead, only administrators should have the right to do so. I hope this feature can replace Telegram or WeChat in China. Thank you.

English

Nikita Bier@nikitabier·15 Oca

We are revising our developer API policies: We will no longer allow apps that reward users for posting on X (aka “infofi”). This has led to a tremendous amount of AI slop & reply spam on the platform. We have revoked API access from these apps, so your X experience should start improving soon (once the bots realize they’re not getting paid anymore). If your developer account was terminated, please reach out and we will assist in transitioning your business to Threads and Bluesky.

English

13K

4.4K

47.2K

14.1M

KIMI-乔任梁💬🌟🔥@KimiMeme_VIP·15 Oca

@nikitabier @ripchillpill success.

English

KIMI-乔任梁💬🌟🔥@KimiMeme_VIP·26 Ara

@TrustWallet 😯😯....

217

Trust Wallet@TrustWallet·26 Ara

We’ve identified a security incident affecting Trust Wallet Browser Extension version 2.68 only. Users with Browser Extension 2.68 should disable and upgrade to 2.69. Please refer to the official Chrome Webstore link here: chrome.google.com/webstore/detai… Please note: Mobile-only users and all other browser extension versions are not impacted. We understand how concerning this is and our team is actively working on the issue. We’ll keep sharing updates as soon as possible.

English

819

896

2.9M

KIMI-乔任梁💬🌟🔥@KimiMeme_VIP·13 Ara

@_yuanwenLi @RFindercoin @grok 好的👌🏻你那边也可以发乔任梁相关内容，评论互动等等，，

中文

1.2K

李雯渊🔥@_yuanwenLi·19 Nis

@KimiMeme_VIP @RFindercoin @grok 国内豆包➕Deepseek 就可以生成啊

中文

Retard Finder Coin@RFindercoin·19 Nis

Donald PUMP on $RFC

English

167

14.2K

KIMI-乔任梁💬🌟🔥@KimiMeme_VIP·6 Ara

@lofipepenft Hello, I want your help. I saw your pepe cross the chain to bnb.ARBITRAGEUR，We also want to achieve cross-chain, can you help, in return, you say, we can promise you, thank you very much!

English

LO-FI PEPE NFT@lofipepenft·26 Oca

ZXX

9.6K

KIMI-乔任梁💬🌟🔥@KimiMeme_VIP·6 Ara

I’m stuck at 600 followers 😭😭

English

Descubrir

@elonmusk @_avichawla @RonFilipkowski @WhiteHouse @MariaCorinaYA @nikitabier @ripchillpill @TrustWallet