retto

547 posts

retto banner
retto

retto

@rettooooo

software engineer, building things. founder @Vidbytee

United States Katılım Ekim 2023
87 Takip Edilen23 Takipçiler
Eddy Quan
Eddy Quan@waronweakness·
I've started using Claude. It's great but I can see how someone can spend 20 hours a day on this thing and feel like they accomplished something when they've done nothing.
English
159
88
4.2K
263.4K
James Grugett
James Grugett@jahooma·
Introducing Freebuff: the free coding agent 100% free, up to 10x as fast as Claude Code npm install -g freebuff
English
96
73
697
73.7K
retto
retto@rettooooo·
@TFTC21 Millions of agents*
English
0
0
0
167
TFTC
TFTC@TFTC21·
Jensen Huang: "If that $500,000 engineer did not consume at least $250,000 worth of tokens, I am going to be deeply alarmed. This is no different than a chip designer who says 'I'm just going to use paper and pencil. I don't think I'm going to need any CAD tools.'"
English
441
575
7.6K
2.4M
retto
retto@rettooooo·
I NEED MORE COMPUTE
English
0
0
0
8
retto
retto@rettooooo·
@yacineMTB context is king, and if you can distill your knowledge and transform it into coherent context for these models than you now have something no one else does
English
0
0
0
109
kache
kache@yacineMTB·
as we automate knowledge work, the leverage of having knowledge itself has never been higher
English
23
23
419
10.9K
retto
retto@rettooooo·
@beffjezos not enough compute in the world unfortunately for online rl
English
0
0
0
9
Beff (e/acc)
Beff (e/acc)@beffjezos·
Whichever is the lab that will offer continuous learning / online RL per unique agent for enterprise will absolutely print money. Virtual headcount for all companies will become very real. Could charge $5k+ per month per continuous agent easily
English
40
49
518
42.2K
retto
retto@rettooooo·
@_avichawla whats the tldr on model performance?
English
0
0
0
44
Avi Chawla
Avi Chawla@_avichawla·
Big release from Kimi! They just released a new way to handle residual connections in Transformers. In a standard Transformer, every sub-layer (attention or MLP) computes an output and adds it back to the input via a residual connection. If you consider this across 40+ layers, the hidden state at any layer is just the equal-weighted sum of all previous layer outputs. Every layer contributes with weight=1, so every layer gets equal importance. This creates a problem called PreNorm dilution, where as the hidden state accumulates layer after layer, its magnitude grows linearly with depth. And any new layer's contribution gets progressively buried in the already-massive residual. This means deeper layers are then forced to produce increasingly large outputs just to have any influence, which destabilizes training. Here's what the Kimi team observed and did: RNNs compress all prior token information into a single state across time, leading to problems with handling long-range dependencies. And residual connections compress all prior layer information into a single state across depth. Transformers solved the first problem by replacing recurrence with attention. This was applied along the sequence dimension. Now they introduced Attention Residuals, which applies a similar idea to depth. Instead of adding all previous layer outputs with a fixed weight of 1, each layer now uses softmax attention to selectively decide how much weight each previous layer's output should receive. So each layer gets a single learned query vector, and it attends over all previous layer outputs to compute a weighted combination. The weights are input-dependent, so different tokens can retrieve different layer representations based on what's actually useful. This is Full Attention Residuals (shown in the second diagram below). But here's the practical problem with this idea. Full AttnRes requires keeping all layer outputs in memory and communicating them across pipeline stages during distributed training. To solve this, they introduce Block Attention Residuals (shown in the third diagram below). The idea is to group consecutive layers into roughly 8 blocks. Within each block, layer outputs are summed via standard residuals. But across blocks, the attention mechanism selectively combines block-level representations. This drops memory from O(Ld) to O(Nd), where N is the number of blocks. Layers within the current block can also attend to the partial sum of what's been computed so far inside that block, so local information flow isn't lost. And the raw token embedding is always available as a separate source, which means any layer in the network can selectively reach back to the original input. Results from the paper: - Block AttnRes matches the loss of a baseline LLM trained with 1.25x more compute. - Inference latency overhead is less than 2%, making it a practical drop-in replacement - On a 48B parameter Kimi Linear model (3B activated) trained on 1.4T tokens, it improved every benchmark they tested: GPQA-Diamond +7.5, Math +3.6, HumanEval +3.1, MMLU +1.1 The residual connection has mostly been unchanged since ResNet in 2015. This might be the first modification that's both theoretically motivated and practically deployable at scale with negligible overhead. More details in the post below by Kimi👇 ____ Find me → @_avichawla Every day, I share tutorials and insights on DS, ML, LLMs, and RAGs.
Avi Chawla tweet media
Kimi.ai@Kimi_Moonshot

Introducing 𝑨𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔: Rethinking depth-wise aggregation. Residual connections have long relied on fixed, uniform accumulation. Inspired by the duality of time and depth, we introduce Attention Residuals, replacing standard depth-wise recurrence with learned, input-dependent attention over preceding layers. 🔹 Enables networks to selectively retrieve past representations, naturally mitigating dilution and hidden-state growth. 🔹 Introduces Block AttnRes, partitioning layers into compressed blocks to make cross-layer attention practical at scale. 🔹 Serves as an efficient drop-in replacement, demonstrating a 1.25x compute advantage with negligible (<2%) inference latency overhead. 🔹 Validated on the Kimi Linear architecture (48B total, 3B activated parameters), delivering consistent downstream performance gains. 🔗Full report: github.com/MoonshotAI/Att…

English
78
221
2.3K
343.5K
TheStandupPod
TheStandupPod@thestanduppod·
Most 2025 AI Improvement is the Harness
English
3
1
50
3.1K
retto
retto@rettooooo·
@naval coding went to 0
English
0
0
0
3
Naval
Naval@naval·
Coding an app is the new starting a podcast.
English
1.5K
2.4K
27.3K
2.8M
retto
retto@rettooooo·
The space of AI agents/AI in general at the application layer is going to converge to master prompts or building your own harnesses. Two avenues, you pick which one you want to go down. Skills/MCP/tools = all useless if you don't have fine-grained control over them. You have just been given a 6-month heads-up, do what you please with this information.
English
0
0
0
32
retto
retto@rettooooo·
@naval all of knowledge work is next
English
0
0
0
6
Naval
Naval@naval·
Software was eaten by AI.
English
2.2K
2.1K
21.4K
105.6M
kache
kache@yacineMTB·
something i learned from my wife, who recently learned how to sew: do not do beginner projects. if what you want to make is difficult to make, you should just try to make it. don't do a slow learning process. don't start with the basics. start with the advanced
English
183
573
10.3K
207.9K
retto
retto@rettooooo·
@heyshrutimishra then you scale that up to 100 agents and 5,000 research papers and skip 800 weeks
English
0
0
1
901
Shruti
Shruti@heyshrutimishra·
The new academic wealth gap isn't your university. It's not even your advisor's connections. It's who knows Claude can turn 50+ research papers into a thesis chapter in 3 hours and who's still manually coding qualitative data. I just watched a sociology PhD skip 8 weeks of analysis. Here are the 9 prompts they used:
English
71
124
1.1K
815.8K
retto
retto@rettooooo·
@GergelyOrosz were leaving the quality problems for opus 7 to fix
English
0
0
0
6
Gergely Orosz
Gergely Orosz@GergelyOrosz·
When it comes to AI agents / AI tooling + coding, I hear an awful lot of talk about: Efficiency Iteration speed / PR output rate / lines of codes produced I hear zero mentions about: Quality Customer obsession This will bite back, and it probably already is...
English
101
65
897
120.6K
retto
retto@rettooooo·
@KingBootoshi The AI is for the builders not the coders.
English
0
0
0
9
BOOTOSHI 👑
BOOTOSHI 👑@KingBootoshi·
i don't understand programmers saying ai is making them cognitively weaker when it comes to coding are you not a software architect?? are you not doing any kind of design whats-so-ever? why the hell would i EVER want to write the grunt shit?? thats what agents are for????????
English
48
7
184
8.5K
retto
retto@rettooooo·
@GilFeig the real problem is the limited context windows and the models horrible ability to stay accurate as we scale the context windows. None of this would be a problem if context windows grew to 100 mil with stable performance
English
2
0
0
214
Gil Feig
Gil Feig@GilFeig·
If you hate MCP, you don't understand MCP. You have a bad setup/harness and blame it on MCP, which is nothing more than an extremely thin wrapper. The argument that it's slow is rooted in poor implementations that load up thousands of tools and eat your entire context window. But that's not necessary and it's not how the best AI companies are using MCP in practice. MCP is great because it's a lightweight standardization similar to REST and API protocols. It gives the world the ability to provide a minimal, single interface into their products. It then plugs nicely into all LLMs. MCP or not, we need that. Good MCP has 2 tools. Search tools, and run tool. You don't need anything more than that. You expose millions of tools to your agent, and they run quickly and eat close to zero context. CLI tools are great too. But they're heavier to produce, heavier to set up, require running agents to have CLI access, and give up a ton of security guarantees. MCP was invented when CLIs already existed for a reason. Many reasons. We're not suddenly enlightened by CLIs. Nothing has changed.
English
33
26
200
14K
retto
retto@rettooooo·
@Andercot the only thing that survives is injecting tools into your agent
English
0
0
0
33
punarv
punarv@ycocerious·
What do you guys do while your claude code is running?
English
501
2
293
58.9K
retto
retto@rettooooo·
The only thing stopping the full-scale automation of all knowledge work is the limited context window of the models. Train at 100mil context window length with no context rot and they can do anything. Think of all the things they can do today with 100k tokens of high quality context, now image 90mil tokens of the same high quality context.
English
0
0
0
35