KIMI-乔任梁💬🌟🔥

65 posts

KIMI-乔任梁💬🌟🔥 banner
KIMI-乔任梁💬🌟🔥

KIMI-乔任梁💬🌟🔥

@KimiMeme_VIP

网址:https://t.co/NLFS4vJGno 委派给: @jianjun43860305 和 @Aileen971107 管理

Se unió Aralık 2010
16 Siguiendo161 Seguidores
Tweet fijado
KIMI-乔任梁💬🌟🔥
KIMI-乔任梁💬🌟🔥@KimiMeme_VIP·
更新合集:⭐️购买合集 - linktr.ee/KimiMeme.VIP ⭐️⭐️🔥芝麻交易所Gate Alaha Buy - gate.com/zh/alpha/sol-5…
KIMI-乔任梁💬🌟🔥@KimiMeme_VIP

注册购买视频合集教程,关注:@_Sicongwang @Si6399 @Si6366 电报群:t.me/JNBSi6399 芝麻:gateio24.com/referral/invit… OKX⭐:okx.com/join/71550227 币安:accounts.suitechsui.online/register?ref=R… 合约CA:5RAdU74DWZEreD7n2V9UAsrx47k8za6QSB3s5KFMpump

中文
0
1
0
9.9K
KIMI-乔任梁💬🌟🔥 retuiteado
Elon Musk
Elon Musk@elonmusk·
@_avichawla Impressive work from Kimi
English
297
194
3.5K
460.9K
Avi Chawla
Avi Chawla@_avichawla·
Big release from Kimi! They just released a new way to handle residual connections in Transformers. In a standard Transformer, every sub-layer (attention or MLP) computes an output and adds it back to the input via a residual connection. If you consider this across 40+ layers, the hidden state at any layer is just the equal-weighted sum of all previous layer outputs. Every layer contributes with weight=1, so every layer gets equal importance. This creates a problem called PreNorm dilution, where as the hidden state accumulates layer after layer, its magnitude grows linearly with depth. And any new layer's contribution gets progressively buried in the already-massive residual. This means deeper layers are then forced to produce increasingly large outputs just to have any influence, which destabilizes training. Here's what the Kimi team observed and did: RNNs compress all prior token information into a single state across time, leading to problems with handling long-range dependencies. And residual connections compress all prior layer information into a single state across depth. Transformers solved the first problem by replacing recurrence with attention. This was applied along the sequence dimension. Now they introduced Attention Residuals, which applies a similar idea to depth. Instead of adding all previous layer outputs with a fixed weight of 1, each layer now uses softmax attention to selectively decide how much weight each previous layer's output should receive. So each layer gets a single learned query vector, and it attends over all previous layer outputs to compute a weighted combination. The weights are input-dependent, so different tokens can retrieve different layer representations based on what's actually useful. This is Full Attention Residuals (shown in the second diagram below). But here's the practical problem with this idea. Full AttnRes requires keeping all layer outputs in memory and communicating them across pipeline stages during distributed training. To solve this, they introduce Block Attention Residuals (shown in the third diagram below). The idea is to group consecutive layers into roughly 8 blocks. Within each block, layer outputs are summed via standard residuals. But across blocks, the attention mechanism selectively combines block-level representations. This drops memory from O(Ld) to O(Nd), where N is the number of blocks. Layers within the current block can also attend to the partial sum of what's been computed so far inside that block, so local information flow isn't lost. And the raw token embedding is always available as a separate source, which means any layer in the network can selectively reach back to the original input. Results from the paper: - Block AttnRes matches the loss of a baseline LLM trained with 1.25x more compute. - Inference latency overhead is less than 2%, making it a practical drop-in replacement - On a 48B parameter Kimi Linear model (3B activated) trained on 1.4T tokens, it improved every benchmark they tested: GPQA-Diamond +7.5, Math +3.6, HumanEval +3.1, MMLU +1.1 The residual connection has mostly been unchanged since ResNet in 2015. This might be the first modification that's both theoretically motivated and practically deployable at scale with negligible overhead. More details in the post below by Kimi👇 ____ Find me → @_avichawla Every day, I share tutorials and insights on DS, ML, LLMs, and RAGs.
Avi Chawla tweet media
Kimi.ai@Kimi_Moonshot

Introducing 𝑨𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔: Rethinking depth-wise aggregation. Residual connections have long relied on fixed, uniform accumulation. Inspired by the duality of time and depth, we introduce Attention Residuals, replacing standard depth-wise recurrence with learned, input-dependent attention over preceding layers. 🔹 Enables networks to selectively retrieve past representations, naturally mitigating dilution and hidden-state growth. 🔹 Introduces Block AttnRes, partitioning layers into compressed blocks to make cross-layer attention practical at scale. 🔹 Serves as an efficient drop-in replacement, demonstrating a 1.25x compute advantage with negligible (<2%) inference latency overhead. 🔹 Validated on the Kimi Linear architecture (48B total, 3B activated parameters), delivering consistent downstream performance gains. 🔗Full report: github.com/MoonshotAI/Att…

English
80
218
2.3K
348.6K
Ron Filipkowski
Ron Filipkowski@RonFilipkowski·
Grinning like a 5-year old getting his participation trophy at the post-season T-ball pizza party.
Ron Filipkowski tweet media
English
4.1K
4.3K
29.8K
1.2M
The White House
The White House@WhiteHouse·
President Donald J. Trump meets with María Corina Machado of Venezuela in the Oval Office, during which she presented the President with her Nobel Peace Prize in recognition and honor.🕊️
The White House tweet media
English
18.7K
16.5K
84.6K
10.4M
KIMI-乔任梁💬🌟🔥
KIMI-乔任梁💬🌟🔥@KimiMeme_VIP·
@nikitabier @ripchillpill At the beginning, group members should also be able to freely modify the deletion time of private messages. Instead, only administrators should have the right to do so. I hope this feature can replace Telegram or WeChat in China. Thank you.
English
1
0
0
54
Nikita Bier
Nikita Bier@nikitabier·
We are revising our developer API policies: We will no longer allow apps that reward users for posting on X (aka “infofi”). This has led to a tremendous amount of AI slop & reply spam on the platform. We have revoked API access from these apps, so your X experience should start improving soon (once the bots realize they’re not getting paid anymore). If your developer account was terminated, please reach out and we will assist in transitioning your business to Threads and Bluesky.
English
13K
4.4K
47.2K
14.1M
Trust Wallet
Trust Wallet@TrustWallet·
We’ve identified a security incident affecting Trust Wallet Browser Extension version 2.68 only. Users with Browser Extension 2.68 should disable and upgrade to 2.69. Please refer to the official Chrome Webstore link here: chrome.google.com/webstore/detai… Please note: Mobile-only users and all other browser extension versions are not impacted. We understand how concerning this is and our team is actively working on the issue. We’ll keep sharing updates as soon as possible.
English
819
896
3K
2.9M
KIMI-乔任梁💬🌟🔥
KIMI-乔任梁💬🌟🔥@KimiMeme_VIP·
@lofipepenft Hello, I want your help. I saw your pepe cross the chain to bnb.ARBITRAGEUR,We also want to achieve cross-chain, can you help, in return, you say, we can promise you, thank you very much!
English
0
0
0
59