M1NDB3ND3R

698 posts

M1NDB3ND3R banner
M1NDB3ND3R

M1NDB3ND3R

@mindbender08

20 y/o | Linux Enthusiast | Python & C Coder | Exploring Cybersecurity , AI and Blockchain

Katılım Temmuz 2022
484 Takip Edilen197 Takipçiler
M1NDB3ND3R
M1NDB3ND3R@mindbender08·
@charmcli Does loading on demand gives any advantages of is it similar to defer loading
English
1
0
1
66
Charm
Charm@charmcli·
MCPs, without the config. Crush now loads them on demand with Docker.
English
7
4
68
8.4K
M1NDB3ND3R
M1NDB3ND3R@mindbender08·
@thdxr I don't know why do we need ai in password manager 😔
English
0
0
0
153
dax
dax@thdxr·
ai password manager is that anything?
English
146
1
379
56.9K
M1NDB3ND3R retweetledi
Avi Chawla
Avi Chawla@_avichawla·
Big release from Kimi! They just released a new way to handle residual connections in Transformers. In a standard Transformer, every sub-layer (attention or MLP) computes an output and adds it back to the input via a residual connection. If you consider this across 40+ layers, the hidden state at any layer is just the equal-weighted sum of all previous layer outputs. Every layer contributes with weight=1, so every layer gets equal importance. This creates a problem called PreNorm dilution, where as the hidden state accumulates layer after layer, its magnitude grows linearly with depth. And any new layer's contribution gets progressively buried in the already-massive residual. This means deeper layers are then forced to produce increasingly large outputs just to have any influence, which destabilizes training. Here's what the Kimi team observed and did: RNNs compress all prior token information into a single state across time, leading to problems with handling long-range dependencies. And residual connections compress all prior layer information into a single state across depth. Transformers solved the first problem by replacing recurrence with attention. This was applied along the sequence dimension. Now they introduced Attention Residuals, which applies a similar idea to depth. Instead of adding all previous layer outputs with a fixed weight of 1, each layer now uses softmax attention to selectively decide how much weight each previous layer's output should receive. So each layer gets a single learned query vector, and it attends over all previous layer outputs to compute a weighted combination. The weights are input-dependent, so different tokens can retrieve different layer representations based on what's actually useful. This is Full Attention Residuals (shown in the second diagram below). But here's the practical problem with this idea. Full AttnRes requires keeping all layer outputs in memory and communicating them across pipeline stages during distributed training. To solve this, they introduce Block Attention Residuals (shown in the third diagram below). The idea is to group consecutive layers into roughly 8 blocks. Within each block, layer outputs are summed via standard residuals. But across blocks, the attention mechanism selectively combines block-level representations. This drops memory from O(Ld) to O(Nd), where N is the number of blocks. Layers within the current block can also attend to the partial sum of what's been computed so far inside that block, so local information flow isn't lost. And the raw token embedding is always available as a separate source, which means any layer in the network can selectively reach back to the original input. Results from the paper: - Block AttnRes matches the loss of a baseline LLM trained with 1.25x more compute. - Inference latency overhead is less than 2%, making it a practical drop-in replacement - On a 48B parameter Kimi Linear model (3B activated) trained on 1.4T tokens, it improved every benchmark they tested: GPQA-Diamond +7.5, Math +3.6, HumanEval +3.1, MMLU +1.1 The residual connection has mostly been unchanged since ResNet in 2015. This might be the first modification that's both theoretically motivated and practically deployable at scale with negligible overhead. More details in the post below by Kimi👇 ____ Find me → @_avichawla Every day, I share tutorials and insights on DS, ML, LLMs, and RAGs.
Avi Chawla tweet media
Kimi.ai@Kimi_Moonshot

Introducing 𝑨𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔: Rethinking depth-wise aggregation. Residual connections have long relied on fixed, uniform accumulation. Inspired by the duality of time and depth, we introduce Attention Residuals, replacing standard depth-wise recurrence with learned, input-dependent attention over preceding layers. 🔹 Enables networks to selectively retrieve past representations, naturally mitigating dilution and hidden-state growth. 🔹 Introduces Block AttnRes, partitioning layers into compressed blocks to make cross-layer attention practical at scale. 🔹 Serves as an efficient drop-in replacement, demonstrating a 1.25x compute advantage with negligible (<2%) inference latency overhead. 🔹 Validated on the Kimi Linear architecture (48B total, 3B activated parameters), delivering consistent downstream performance gains. 🔗Full report: github.com/MoonshotAI/Att…

English
78
221
2.3K
343.2K
M1NDB3ND3R
M1NDB3ND3R@mindbender08·
@Dhanush_Nehru People now will curse ai i guess for the his friend who uses ai and saved his job
English
1
0
2
314
Dhanush N
Dhanush N@Dhanush_Nehru·
Layoffs Layoffs Layoffs Everywhere
English
13
0
21
3.3K
M1NDB3ND3R
M1NDB3ND3R@mindbender08·
@Dhanush_Nehru But agent are using a method called defer_load to lazy search tools and send to model so the context is not filled
English
1
0
1
20
Dhanush N
Dhanush N@Dhanush_Nehru·
MCP was supposed to be the universal plug for AI tools. Connect once, use everywhere. But in reality: 🔴 Every tool you add fills up the AI's "thinking space" with instructions 🔴 The more tools, the less room to actually do the work 🔴 Auth between services is a mess So teams are ditching the fancy protocol and going back to direct API calls and CLIs. Simpler. Faster. More reliable. The best technology is often the one that gets out of the way.
Morgan@morganlinton

The cofounder and CTO of Perplexity, @denisyarats just said internally at Perplexity they’re moving away from MCPs and instead using APIs and CLIs 👀

English
3
0
1
244
Vivo
Vivo@vivoplt·
be honest, which AI tool is best for coding?
Vivo tweet mediaVivo tweet mediaVivo tweet mediaVivo tweet media
English
522
93
2.4K
360.8K
Yigit Konur
Yigit Konur@yigitkonur·
the new “mission” (preview) feature in @FactoryAI is really interesting (aka “droid” on the CLI). if you’re into one‑shotting projects, you should definitely check it out. right now i have opus + gpt‑5.4 collaborating as orchestrator/worker/validator agents, all working together to refactor a typescript project. it’s been running for 6+ hours. really curious why it takes that long and burns 30M+ tokens. hoping the results will amaze me, because i already spent all my credits in the first hour. now i’m using my codex sub and keeping the droid subs as the orchestrator only. will update this tweet with the results!
Yigit Konur tweet media
English
11
3
57
6.5K
M1NDB3ND3R
M1NDB3ND3R@mindbender08·
@ZohoWorkplace @moulidorai It's one of the best password manager which offer enterprise features for free and I don't why people aren't using it 😭
English
1
0
2
40
Zoho Workplace
Zoho Workplace@ZohoWorkplace·
Is your password manager asking for a raise? That’s your cue to switch teams 😉 Try Zoho Vault, included at no extra cost with your Zoho Workplace suite. 🚀 Secure and share your team's passwords, cards, and other confidential information with confidence. 🔐 Try Zoho Vault today 👉🏻 zurl.co/yWNZ7
Zoho Workplace tweet media
English
1
3
4
329
M1NDB3ND3R
M1NDB3ND3R@mindbender08·
@KalGrinberg is droid better than claude or opencode , I see in terminal bench its better but i see posts only on these two so i am confused
English
0
0
0
41
Dhanush N
Dhanush N@Dhanush_Nehru·
Let settle this. Golang or Rust?
Dhanush N tweet mediaDhanush N tweet media
English
12
1
10
1.1K
Dhanush N
Dhanush N@Dhanush_Nehru·
what next?
Dhanush N tweet media
English
4
0
4
238
M1NDB3ND3R
M1NDB3ND3R@mindbender08·
@zack0x01 its going to public or private if public willing to contribute
English
1
0
0
65
M1NDB3ND3R
M1NDB3ND3R@mindbender08·
@zack0x01 is it from scratch or using existing coding agents and pilling it up with mcp servers , custom tools calling , skills and sub-agents and etc...
English
1
0
0
370
M1NDB3ND3R
M1NDB3ND3R@mindbender08·
@inkdrop_app I just converted it to inr ₹ 45,45,37,50,500.00😅 Hope someday i could afford it
English
0
0
1
15
Zed
Zed@zeddotdev·
Coming this Wednesday...
Zed tweet media
English
33
11
400
40.2K
M1NDB3ND3R
M1NDB3ND3R@mindbender08·
@Dhanush_Nehru its going to be greatest if concentrated on physics , medical science , quantum computing and try to solve meaning full solutions for real-world problems . Technologies are evolving day2day but we are not integrating to solve problems
English
1
0
1
24
Dhanush N
Dhanush N@Dhanush_Nehru·
The way AI is evolving it will either be the greatest invention humanity ever built or the last mistake we ever make.
English
8
1
11
276
TCM Security
TCM Security@TCMSecurity·
@mindbender08 Bookmark the link to keep it on hand - and we will share when we open internships in the future!
English
1
0
1
12
Dhanush N
Dhanush N@Dhanush_Nehru·
@mindbender08 First gen to ship code without Stack Overflow is crazy to think about
English
1
0
0
10
Dhanush N
Dhanush N@Dhanush_Nehru·
being the last generation to use stack overflow
Dhanush N tweet media
English
11
1
20
572