retto

1.2K posts

retto banner
retto

retto

@rettooooo

software engineer, building things. founder @Vidbytee

United States Katılım Ekim 2023
106 Takip Edilen102 Takipçiler
Behnam
Behnam@OrganicGPT·
Unbelievable! @deepseek_ai is so darn cheap you don't even need a coding plan. I added $3 to my DS account many days ago and half of it has still remained. Do we really need $100/mo or $200/mo coding plans?!
Behnam tweet media
English
14
0
56
3.6K
retto
retto@rettooooo·
@andersonbcdefg what happens when you think "coding is solved" and use claude to build claude code
English
0
0
1
146
retto
retto@rettooooo·
@beffjezos its unfortunate that working memory is very limited in our meat brains
English
0
0
0
9
retto
retto@rettooooo·
@deedydas definitely a paradigm work exploring
English
0
0
0
12
Deedy
Deedy@deedydas·
I'm convinced that adding "Open-" to your company name instantly 10x's your odds of success. OpenAI OpenEvidence OpenTable OpenRouter OpenCode OpenDoor OpenGov OpenWeb OpenText OpenView OpenSea OpenStore OpenFX OpenSpace OpenArt OpenHands OpenPipe OpenNote
Deedy tweet media
English
67
6
254
28.6K
retto
retto@rettooooo·
@skcd42 is the repo open sourced?
English
0
0
1
13
skcd
skcd@skcd42·
Bug fixes shipping to Grok Build 0.2.3 (release notes will be available in the TUI) - add “Yes, and don't ask again for anything (always-approve mode)" - add alpha/stable to welcome screen - JetBrains/JediTerm terminal detection so TUI does not get confused and detect it as some other terminal - persist model ID instead of display name for default_model - clamp Q&A height to prevent ratatui buffer overflow - better UX for tmux inside ssh copy-paste issues - store vim mode persistently in the config.toml to prevent restart loss - memory usage improvements for managing chat history on the hot path
English
201
205
1.2K
304.6K
retto
retto@rettooooo·
@PeterDiamandis orders of magnitude away from the token-usage that power users need
English
0
1
1
6
Peter H. Diamandis, MD
Peter H. Diamandis, MD@PeterDiamandis·
Power users will soon want 1,000 concurrent agents. Engineers, architects, designers... all orchestrating swarms. Compare that demand to the current supply. It's peanuts.
English
71
27
435
19.8K
retto
retto@rettooooo·
@naval there will be business models solely revolved around agent generated revenue and not human generated revenue in the coming months/years
English
0
0
0
28
Naval
Naval@naval·
Software went from desktop-first to mobile-first, now going to agent-first.
English
309
367
4.7K
159.9K
Shensi Ding
Shensi Ding@shensi·
Introducing Merge Gateway - Build Your Own Router. You're three sprints into your coding assistant. You pick the most hyped model, integrate, test, deploy. A month later, a new model drops. Now you re-test, re-integrate, re-deploy. Your product didn't change, but the benchmark did. That's how most AI teams operate. Chasing a "best" defined by people who've never seen their product. There is no best model. There's only the right one for your product, users, and use-cases. Build Your Own Router runs on your definition of good. Pick your benchmarks, weigh them, add your own evals. @merge_api routes every request to your winner. 👉$100 in credits to the first 200 people that comment merge.dev/gateway
English
226
467
2K
3.4M
retto
retto@rettooooo·
@tunguz Every piece of the harness now becomes a controllable hyperparameter, ANYTHING in the context window can be tuned and tweaked for optimal performance
English
0
0
0
96
Bojan Tunguz
Bojan Tunguz@tunguz·
This is the next level.
Muratcan Koylan@koylanai

Gradient descent for SKILL.md files sounds interesting, maybe a bit complex but it's becoming a real part of agent harness. SkillOpt is one of the first papers to treat markdown skill files as trainable parameters and provides a proper optimization framework for them. A few things I learned that you should consider too. 1. The validation gate is the only thing that matters in a self-editing loop. Held-out set, strict improvement, ties rejected. End-to-end, their best skills land with 1 to 4 accepted edits total. If your "self-improving agent" is accepting most of what it proposes, you're shipping slop. 2. Bounded edits are better than full rewrites. 4 to 8 edits per step is the sweet spot. Remove the budget and performance collapses. This is the textual analog of learning rate, and it transfers to any LLM-as-author loop. If you're using an agent to refactor your docs, your prompts, or your skills, cap the diff size. 3. Compactness wins. Median final skill: ~920 tokens. Skills do not need to be long. They need to be high-signal. Most skill files I see are bloated because length feels like effort. It isn't. 4. The harness is becoming less important; the skill is becoming more important. A Codex-trained skill ported into Claude Code hit +59.7 points on SpreadsheetBench. Procedural knowledge is more general than the runtime that produced it. 5. Frozen model + trained context is the practical adaptation. GPT-5.4-nano with a SkillOpt'd skill ≈ frontier behavior on procedural benchmarks. Cheaper, portable, inspectable, zero inference-time cost. This is the answer to "how do we adapt a frontier model for our domain" for almost everyone who isn't training their own models. 6. Verification is the bottleneck. Every gate in this paper depends on an auto-grader. That works for benchmarks. It fails for writing, design, and strategy, exactly the open-ended work we want to automate. Whoever builds the verifier for open-ended tasks owns the next stage. There are also two leassons I learned while shipping v2.3.0 of my Context Engineering Agent Skills repo, measured across composer-2, claude-opus-4-7, gpt-5.5, and gemini-3.1-pro via the @cursor_ai SDK: - Description and body are two different surfaces. The router only sees the description. The agent sees the body once activated. They can quietly disagree, and only end-to-end task tests catch it. - Aggregate accuracy is the wrong unit. When I rewrote three descriptions, the corpus average moved ~1pp. Individual skills moved 23–25pp. Per-skill effect size is where the action is. Also, in Feb 2026 I shared a piece called Personal Brain OS arguing that the markdown file is a first-class substrate for agent state. SkillOpt is the optimizer-shaped version of that same argument: not "store memory in files" but "treat files as trainable parameters with proper optimization machinery around them." That's the move from static to measured. The fast/slow split they describe already lives implicitly in the digital-brain-skill repo: - voice-guide and tone-of-voice.md are slow-state (rarely touched) - posts.jsonl and bookmarks.jsonl are fast-state What SkillOpt adds that I didn't have is a protected section invariant, a structural guarantee that fast edits cannot overwrite slow lessons. Removing that mechanism cost them 22 points on SpreadsheetBench. Worth borrowing. If you're building agents, SkillOpt: Executive Strategy for Self-Evolving Agent Skills is a good paper to read: arxiv.org/pdf/2605.23904

English
11
55
1.1K
309.2K
retto
retto@rettooooo·
@edwards345 @elonmusk 10x faster and cheaper engineering leads to solving bigger and harder problems
English
0
1
2
12
retto
retto@rettooooo·
@dair_ai build the harness, give your agent access to edit the harness dynamically at runtime depending on request, seems to be the current SOTA paradigm around building harnesses at the moment
English
0
0
0
16
DAIR.AI
DAIR.AI@dair_ai·
System scaling is the next real bottleneck in agentic AI. If you build agent orchestration layers, this is a clean map of where the engineering leverage actually sits. The labs own the model. You own the harness, and that is increasingly where agent quality is won or lost. The default mental model still puts all the weight on the foundation model. Bigger model, better agent. But agent behavior actually emerges from the whole stack around it. Memory substrate, context constructor, skill routing, orchestration loop, and the verification and governance layer. This new research calls that stack the harness and argues we should treat it as a first-class object of design and evaluation. It names three core bottlenecks to scale. Context governance, trustworthy memory, and dynamic skill routing. It also ships CheetahClaws, a Python-native reference harness, and compares it with Claude Code and OpenClaw. Paper: arxiv.org/abs/2605.26112 Learn to build effective AI agents in our academy: academy.dair.ai
DAIR.AI tweet media
English
13
11
79
5.1K
retto
retto@rettooooo·
@kimmonismus when in doubt if you want to improve your models take inspiration from the human brain and built a sparse attention mechanism from the ground up
English
0
0
0
105
Chubby♨️
Chubby♨️@kimmonismus·
MiniMax just teased their Sparse Attention architecture for M3. The benchmarks show 9.7x prefilling speedup and 15.6x decoding speedup at 1M tokens vs M2. MiniMax deliberately went back to full attention for M2 because efficient attention wasn't production-ready. Their pretrain lead wrote a whole blog post about it in March. Now they're showing a new two-stage approach, lightweight index branch for block selection, then sparse attention only on relevant KV blocks. Really interesting. And tbh I'm always happy when open source receives new wins.
MiniMax (official)@MiniMax_AI

#MSA #OpenSource #M3 🫣😎

English
23
37
726
46.3K
retto
retto@rettooooo·
@SkylerMiao7 the time to scale context windows is now, too much drift when you use 50 subagents, the context needs to be in 1 window to achieve maximum performance
English
0
0
0
312
Skyler Miao
Skyler Miao@SkylerMiao7·
Something BIG is coming
Skyler Miao tweet media
English
162
272
2.7K
541.4K
retto
retto@rettooooo·
@shrav_10 and the labs are working on recursive self improvement so once they achieve that then we are all out of a job
English
1
0
0
706
Shravani
Shravani@shrav_10·
One of my friends at a tech company said the vibe inside offices right now is weird. Junior devs think AI will take their jobs. Senior devs think they’ll become outdated in 2 years. Managers are scared companies won’t need so many layers anymore. So everyone’s acting extra “AI-positive” publicly while privately panicking.
English
77
157
2.8K
272.8K
retto
retto@rettooooo·
@scaling01 did they not test codex on high?
English
0
0
0
394
nic
nic@nicdunz·
we are honestly so close to agi
English
36
2
172
16.4K
Mello
Mello@mellometrics·
Imagine telling someone 50 years ago you'd have a machine that could explain any concept in human history, write code, diagnose disease, and reason through problems with you for $20/month They would literally think you were describing god This exists. You have access to it. Yet most of you are using it to summarize emails The arbitrage between people who actually use AI and people who "use AI" is the widest it will ever be That window is open right now and it will close faster than most people think Tldr; start using AI for more than just mundane tasks
English
43
10
158
12.2K
retto
retto@rettooooo·
theoretically I can agree, but the only question I have is why has this not happened already? these models are sometimes already trillions of params in size and trained on the corpus of human knowledge already. There are also alot of examples of extensive scaffolding to give the models all of the context they could ever need. Is it really a model capability problem or is there some architectural failure point that these models have that is making them struggle to truly discover new science
English
0
0
0
27
Rihard Jarc
Rihard Jarc@RihardJarc·
It's clear that growth for coding tools such as Claude Code has decelerated from the pace it was since the start of the year. It might be compute- constrain related or due to many clients blowing their full-year AI budgets. Monitoring this trend very closely with all the alt data. I will provide regular updates.
Rihard Jarc tweet media
English
126
166
1.1K
276.1K