sun 🐶

724 posts

sun 🐶

sun 🐶

@sunncynn

Founder | DS turn SWE | Certified AWS Solution Architecture Professional | Building AI for local business

Katılım Ağustos 2021
1.1K Takip Edilen117 Takipçiler
sun 🐶 retweetledi
sun 🐶
sun 🐶@sunncynn·
@saltyAom ︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎
0
0
0
50
SaltyAom
SaltyAom@saltyAom·
︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎
2
0
5
1.8K
SaltyAom
SaltyAom@saltyAom·
︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎
24
0
95
9.1K
sun 🐶 retweetledi
Lance Martin
Lance Martin@RLanceMartin·
i co-wrote the Anthropic engineering blog on Claude Managed Agents, and wanted to share some thoughts on agent harnesses + infrastructure for long-horizon tasks ... 🧵 anthropic.com/engineering/ma…
Lance Martin tweet media
English
31
113
980
90.1K
sun 🐶
sun 🐶@sunncynn·
@timbo_xyz Bro i am also building startup in thailand, sometimes i go there too
English
1
0
1
60
timbo ⚡
timbo ⚡@timbo_xyz·
Working from Plant Workshop Cafe in Bangkok today 🇹🇭 Located in Ratchathewi and filled with plants creating a nice work environment Seats are limited (under 20) and it's empty at open but Google says it gets busy Easy access to outlets and internet speeds of 41 down / 10 up Full coffee menu but very limited snack options An iced Americano will run you 65 baht (~$2) A place nearby wanted to charge 40/hr for parking, but a Grab driver told me to just park in front on the sidewalk We'll see what happens to my bike 😅
timbo ⚡ tweet mediatimbo ⚡ tweet mediatimbo ⚡ tweet mediatimbo ⚡ tweet media
Khlong Tan Nuea, Thailand 🇹🇭 English
7
50
194
10K
sun 🐶 retweetledi
Matt Van Horn
Matt Van Horn@mvanhorn·
v3 of @slashlast30days is here. 20,000+⭐ on GitHub. The biggest upgrade yet. An AI agent-led search engine scored by upvotes, likes, and real money - not editors. Reddit comments, X posts, and YouTube transcripts are now FREE. No API keys needed for the core sources. v3 killer feature: intelligent search. Before it searches, a Python pre-research brain resolves X handles, subreddits, TikTok hashtags, and YouTube channels for your topic. It finds the RIGHT places to search before the LLM judge assembles the report. Shout out to @jeffreysperling for building this engine New in v3: - Free Reddit, X, and YouTube (no API keys) - Intelligent pre-research engine - Best Takes (the funniest Reddit comments are first-class) - Cross-source cluster merging - Single-pass comparisons (X vs Y in 5 min, not 12) - GitHub person-mode - ELI5 mode
English
53
64
930
247.2K
sun 🐶 retweetledi
Garry Tan
Garry Tan@garrytan·
How I get my claw to be a durable AI agent I never have to instruct twice Paste this into your OpenClaw's AGENTS.md or send it as a message: You are not allowed to do one-off work. If I ask you to do something and it's the kind of thing that will need to happen again, you must: 1. Do it manually the first time (3-10 items) 2. Show me the output and ask if I like it 3. If I approve, codify it into a SKILL.md file in workspace/skills/ 4. If it should run automatically, add it to cron with `openclaw cron add` Every skill must be MECE — each type of work has exactly one owner skill. No overlap, no gaps. Before creating a new skill, check if an existing one already covers it. If so, extend it instead. The test: if I have to ask you for something twice, you failed. The first time I ask is discovery. The second time means you should have already turned it into a skill running on a cron. When building a skill, follow this cycle: - Concept: describe the process - Prototype: run on 3-10 real items, no skill file yet - Evaluate: review output with me, revise - Codify: write SKILL.md (or extend existing) - Cron: schedule if recurring - Monitor: check first runs, iterate Every conversation where I say "can you do X" should end with X being a skill on a cron — not a memory of "he asked me to do X that one time." The system compounds. Build it once, it runs forever.
English
137
168
2.3K
244.7K
sun 🐶 retweetledi
The Best
The Best@Thebestfigen·
This is the best advertisement I’ve ever seen.
English
82
941
3.8K
147.3K
sun 🐶 retweetledi
Chayenne Zhao
Chayenne Zhao@GenAI_is_real·
We're Not Wasting Tokens — We're Wasting the Design Margin of the Entire Inference Stack A few days ago I read a post by Fuli Luo on Twitter, discussing Anthropic's decision to cut off third-party harnesses (OpenClaw) from using Claude subscriptions, and the design thinking behind MiMo's Token Plan pricing. Her core argument: global compute capacity is seriously falling behind the token demand created by agents. The way forward isn't selling tokens cheaper in a race to the bottom — it's the co-evolution of "more efficient agent harnesses" and "more powerful, efficient models." I read it several times over. People who build inference engines have long been frustrated by how wastefully agent frameworks burn through tokens. She articulated something the industry has tacitly acknowledged but rarely stated plainly — and she did it with precision and restraint: the compute allocation crisis we face today is not fundamentally about insufficient compute. It's about tokens being spent in the wrong places. I want to push this one layer deeper, from my own perspective. I'm a heavy user of Claude Code — I make no attempt to hide that. You can check that all the latest code in SGLang Omni was built with Claude Code powering my workflow. Its commercial success is beyond question; it genuinely gave many people (myself included) their first real experience of "coding with an agent." But I'm also an inference engine developer — my day job is figuring out how to push prefix cache hit rates higher, how to make KV cache memory layouts more efficient, how to drive down the cost of every single inference request. So when I plugged Claude Code into a local inference engine and started observing the actual request patterns it generates, my reaction was — how to put it — like a water engineer who spent months designing a conservation system, only to watch someone water their garden with a fire hose. I measured Claude Code's cache hit rate on my local serving engine over the course of a day. The numbers were painful. This isn't a case of "decent but room to improve." It's a case of "the prefix cache mechanisms we carefully engineered at the inference layer are being almost entirely defeated." Fuli Luo mentioned that OpenClaw's context management is poor — firing off multiple rounds of low-value tool calls within a single user query, each carrying over 100K tokens of context window. Frankly, Claude Code's own context management is nowhere near making proper use of prefix cache or any of the other optimizations we've built into inference engines. Many people have already noticed — for example, the resume feature has a bug that causes KV cache misses entirely, which is borderline absurd. I'll say it plainly: the way sessions construct their context was never seriously designed with cache reuse in mind from the start. Perhaps Anthropic has internal trade-offs we can't see — after all, they control both ends of the stack, model and inference, and can theoretically do optimizations at the API layer that are invisible to us. But from the external behavior I can observe, enormous volumes of tokens are being spent on: re-transmitting already-processed context, re-parsing already-confirmed tool call results, and maintaining an ever-inflating conversation history with extremely low information density. If this is merely to earn more on inference token charges, I find it genuinely regrettable. But many Claude Code users are on subscriptions — burning more tokens is fundamentally a cost burden for Anthropic, not revenue. I honestly don't understand what purpose such inefficient context management serves for Claude Code. Here's a bold hypothesis: for those long sessions that consume 700K+ tokens, there is certainly a way to restructure the session's context so it accomplishes the exact same task with 10% of the tokens. Not by sacrificing quality, but through smarter context compression, more rational prefix reuse strategies, and more precise tool call scheduling. This isn't theoretical speculation — anyone who has worked on inference engine optimization, upon seeing current agent framework request patterns, would arrive at a similar conclusion. Fuli Luo is right: global compute capacity can't keep up with the token demand agents are creating. But I'd add that a significant portion of that gap is an illusion of prosperity — artificial demand manufactured by the crude design of agent frameworks. Here's an analogy I keep coming back to. I've always liked bringing up RAM bloat — in 1969, 64KB of memory sent Apollo to the moon. In 2026, I open a single webpage and 500MB of memory usage is nothing unusual. Every generation of hardware engineers pushes memory capacity higher, and every generation of software engineers lavishly fills it to the brim. People have gotten used to this cycle, even come to see it as the normal cost of progress. But LLM inference is different. The cost of RAM bloat is your computer running a bit slower, spending a couple hundred bucks on a memory upgrade — users barely notice. The cost of token bloat is real money — GPU cluster electricity bills, user subscription fees, the industry's entire compute budget. And this cost scales exponentially as agent usage grows. If we don't establish the engineering discipline that "tokens should be used efficiently" in the early days of the agent era, the cost of catching up later, once scale kicks in, will be beyond imagination. Fuli Luo notes that Anthropic cutting off third-party harness subscription access is objectively forcing these frameworks to improve their context management. I agree with that assessment, but my gut feeling is that this shouldn't stop at "third-party frameworks need to be more frugal with tokens." It should trigger a more fundamental reflection: what kind of agent-inference co-design do we actually need? Right now, agent frameworks and inference engines are essentially fully decoupled — agent frameworks treat the inference engine as a stateless API, sending the full context with every request. Meanwhile, the inference engine does its best with prefix matching, caching whatever it can. This architecture is simple and general-purpose, but brutally inefficient for long sessions. If agent frameworks could be aware of the inference engine's cache state and proactively construct cache-friendly requests — if inference engines could understand the session semantics of agents and make smarter cache eviction decisions — once that information channel between the two opens up, the potential gains in token efficiency are enormous. Of course, maybe I'm overthinking this. Maybe the market's ultimate answer is: compute gets cheap enough, waste is fine. Just like the RAM story — in the end, everyone chose "memory is big enough, no need to optimize." But I don't think the token economy will follow the same path, at least not in the near term — because the supply elasticity of GPU compute is far lower than that of DRAM. Under compute constraints, token efficiency isn't a "nice to have" optimization — it's the core competitive advantage that determines who survives. Most people love hearing "we made the model bigger," "we stretched the context window to a million tokens," "we stacked HBM to new heights" — these narratives are sexy, shareable, fundable. But I seriously believe that "finding ways to reduce the reckless waste of tokens" is a profoundly underestimated direction. This isn't a defensive optimization. It's an offensive capability — whoever first achieves an order-of-magnitude reduction in token consumption at equivalent quality can serve ten times the users on the same compute budget, or deliver ten times the agent depth to a single user. The agent era doesn't belong to whoever burns the most compute. It belongs to whoever uses it most wisely. This line from Fuli Luo resonates deeply with me. But I want to press further: who gets to define "wisely"? The people building models? The people building inference engines? The people building agent frameworks? I think the answer is — all three must come to the table together. And right now, we're nowhere close.
Fuli Luo@_LuoFuli

Two days ago, Anthropic cut off third-party harnesses from using Claude subscriptions — not surprising. Three days ago, MiMo launched its Token Plan — a design I spent real time on, and what I believe is a serious attempt at getting compute allocation and agent harness development right. Putting these two things together, some thoughts: 1. Claude Code's subscription is a beautifully designed system for balanced compute allocation. My guess — it doesn't make money, possibly bleeds it, unless their API margins are 10-20x, which I doubt. I can't rigorously calculate the losses from third-party harnesses plugging in, but I've looked at OpenClaw's context management up close — it's bad. Within a single user query, it fires off rounds of low-value tool calls as separate API requests, each carrying a long context window (often >100K tokens) — wasteful even with cache hits, and in extreme cases driving up cache miss rates for other queries. The actual request count per query ends up several times higher than Claude Code's own framework. Translated to API pricing, the real cost is probably tens of times the subscription price. That's not a gap — that's a crater. 2. Third-party harnesses like OpenClaw/OpenCode can still call Claude via API — they just can't ride on subscriptions anymore. Short term, these agent users will feel the pain, costs jumping easily tens of times. But that pressure is exactly what pushes these harnesses to improve context management, maximize prompt cache hit rates to reuse processed context, cut wasteful token burn. Pain eventually converts to engineering discipline. 3. I'd urge LLM companies not to blindly race to the bottom on pricing before figuring out how to price a coding plan without hemorrhaging money. Selling tokens dirt cheap while leaving the door wide open to third-party harnesses looks nice to users, but it's a trap — the same trap Anthropic just walked out of. The deeper problem: if users burn their attention on low-quality agent harnesses, highly unstable and slow inference services, and models downgraded to cut costs, only to find they still can't get anything done — that's not a healthy cycle for user experience or retention. 4. On MiMo Token Plan — it supports third-party harnesses, billed by token quota, same logic as Claude's newly launched extra usage packages. Because what we're going for is long-term stable delivery of high-quality models and services — not getting you to impulse-pay and then abandon ship. The bigger picture: global compute capacity can't keep up with the token demand agents are creating. The real way forward isn't cheaper tokens — it's co-evolution. "More token-efficient agent harnesses" × "more powerful and efficient models." Anthropic's move, whether they intended it or not, is pushing the entire ecosystem — open source and closed source alike — in that direction. That's probably a good thing. The Agent era doesn't belong to whoever burns the most compute. It belongs to whoever uses it wisely.

English
16
39
216
35.2K
sun 🐶
sun 🐶@sunncynn·
@NathanFlurry I have custom image for sandbox (preinstall needed package like libreoffice cal) can we use this ?
English
0
0
0
52
Nathan Flurry 🔩
Nathan Flurry 🔩@NathanFlurry·
We're working with more and more companies replacing AI SDKs with Claude Code, OpenCode, and Pi in prod @rasbt's post today is hands down the best articulation on *why* harnesses matter for all use cases (link below)
Nathan Flurry 🔩 tweet media
English
8
4
145
9.7K
sun 🐶 retweetledi
Venelin K.
Venelin K.@venelinkochev·
Pro tip: add a Cloudflare WAF rule to block common scanner paths like .env, .git, wp-login they get blocked at the edge and never touch your server
English
52
92
1.6K
252.1K
sun 🐶
sun 🐶@sunncynn·
@paulg We have a feeling that if we answer too concisely, it makes others feel bad, or they will think we are angry or seem rude. That's why in Thai chat apps we use a lot of stickers to answer in formal business conversations. (Or just to end the conversation)
English
0
0
1
791
Paul Graham
Paul Graham@paulg·
I had just been noticing today that Thai speakers seem to spend longer talking about things than I'd expect.
Paul Graham tweet media
English
181
321
2.6K
630K
sun 🐶
sun 🐶@sunncynn·
@MilksandMatcha We are builder from SEA, we want to build openclaw tuned for manage xlsx
English
0
0
0
8
0xSero
0xSero@0xSero·
Do you want to try Droid? I’m doing a giveaway 3 people will win 100M Factory credits each.Thats 5 months of their 20$ a month subscription. Winners selected randomly from comments in 48 hours.
0xSero tweet media
English
1.1K
36
794
79.6K
sun 🐶 retweetledi
sun 🐶 retweetledi