sun 🐶 (@sunncynn) - Twitter Profili | Zamantika Mersobahis Locabet

sun 🐶 retweetledi

Will@athcanft·1d

how i went from $200 MRR to $10K MRR: + added a viral feature made running ads that don't look like ads super easy i break it all down here 👇x.com/athcanft/statu…

Will@athcanft

x.com/i/article/2042…

English

7

4

91

16.2K

sun 🐶 retweetledi

Samuel Spitz@samuel_spitz·1d

The fastest moving companies all vibecode their launch videos now There’s no time to wait for an agency to make a video Instead, everyone is using Remotion or Replit Animation

Ramp Labs@RampLabs

Introducing Latent Briefing, a way for agents to quickly share their relevant memory directly. Result: 31% fewer tokens used, same accuracy. Multi-agent systems are powerful, but can be wildly inefficient. They pass context as tokens, so costs explode and signal gets lost. We built an algorithm that allows agents to communicate KV cache to KV cache.

English

29

36

912

143.9K

sun 🐶@sunncynn·1d

@saltyAom ︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎

0

50

SaltyAom@saltyAom·2d

︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎

2

0

5

1.8K

SaltyAom@saltyAom·2d

︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎︎ ︎ ︎ ︎

24

0

95

9.1K

sun 🐶 retweetledi

Lance Martin@RLanceMartin·2d

i co-wrote the Anthropic engineering blog on Claude Managed Agents, and wanted to share some thoughts on agent harnesses + infrastructure for long-horizon tasks ... 🧵 anthropic.com/engineering/ma…

English

31

113

980

90.1K

sun 🐶@sunncynn·2d

@timbo_xyz Bro i am also building startup in thailand, sometimes i go there too

English

1

0

1

60

timbo ⚡@timbo_xyz·2d

Working from Plant Workshop Cafe in Bangkok today 🇹🇭 Located in Ratchathewi and filled with plants creating a nice work environment Seats are limited (under 20) and it's empty at open but Google says it gets busy Easy access to outlets and internet speeds of 41 down / 10 up Full coffee menu but very limited snack options An iced Americano will run you 65 baht (~$2) A place nearby wanted to charge 40/hr for parking, but a Grab driver told me to just park in front on the sidewalk We'll see what happens to my bike 😅

Khlong Tan Nuea, Thailand 🇹🇭 English

7

50

194

10K

sun 🐶 retweetledi

Matt Van Horn@mvanhorn·3d

v3 of @slashlast30days is here. 20,000+⭐ on GitHub. The biggest upgrade yet. An AI agent-led search engine scored by upvotes, likes, and real money - not editors. Reddit comments, X posts, and YouTube transcripts are now FREE. No API keys needed for the core sources. v3 killer feature: intelligent search. Before it searches, a Python pre-research brain resolves X handles, subreddits, TikTok hashtags, and YouTube channels for your topic. It finds the RIGHT places to search before the LLM judge assembles the report. Shout out to @jeffreysperling for building this engine New in v3: - Free Reddit, X, and YouTube (no API keys) - Intelligent pre-research engine - Best Takes (the funniest Reddit comments are first-class) - Cross-source cluster merging - Single-pass comparisons (X vs Y in 5 min, not 12) - GitHub person-mode - ELI5 mode

English

53

64

930

247.2K

sun 🐶 retweetledi

Aden Libin@adensdk·3d

This video took me 30 minutes to make It has 3.7M views, 427.7k likes, 152.2k saves & 1.6k comments on TikTok It makes my app $2,000+ every month still after 6 months I spent $0 BTW Stop overcomplicating it, you can literally steal this format if you want.

Aden Libin@adensdk

My faceless tiktok account promoting my app gets 600k views per month I haven’t posted since November 2025 It makes me $2,000+/month still.

English

57

69

1.5K

208.3K

sun 🐶 retweetledi

Garry Tan@garrytan·4d

How I get my claw to be a durable AI agent I never have to instruct twice Paste this into your OpenClaw's AGENTS.md or send it as a message: You are not allowed to do one-off work. If I ask you to do something and it's the kind of thing that will need to happen again, you must: 1. Do it manually the first time (3-10 items) 2. Show me the output and ask if I like it 3. If I approve, codify it into a SKILL.md file in workspace/skills/ 4. If it should run automatically, add it to cron with `openclaw cron add` Every skill must be MECE — each type of work has exactly one owner skill. No overlap, no gaps. Before creating a new skill, check if an existing one already covers it. If so, extend it instead. The test: if I have to ask you for something twice, you failed. The first time I ask is discovery. The second time means you should have already turned it into a skill running on a cron. When building a skill, follow this cycle: - Concept: describe the process - Prototype: run on 3-10 real items, no skill file yet - Evaluate: review output with me, revise - Codify: write SKILL.md (or extend existing) - Cron: schedule if recurring - Monitor: check first runs, iterate Every conversation where I say "can you do X" should end with X being a skill on a cron — not a memory of "he asked me to do X that one time." The system compounds. Build it once, it runs forever.

English

137

168

2.3K

244.7K

sun 🐶@sunncynn·5d

@MilksandMatcha Hiii

0

4

Sarah Chieng@MilksandMatcha·1 Nis

Giving away 5 Codex Pro plans Each person will get 3 months of free Codex Pro (highest tier). Winners will be selected from comments in 48 hours, comment below why you want it.

OpenAI@OpenAI

Today, we closed our latest funding round with $122 billion in committed capital at an $852B post-money valuation. The fastest way to expand AI’s benefits is to put useful intelligence in people’s hands early and let access compound globally. This funding gives us resources to lead at scale. openai.com/index/accelera…

English

3.9K

146

3.5K

584.8K

sun 🐶 retweetledi

The Best@Thebestfigen·5 Nis

This is the best advertisement I’ve ever seen.

English

82

941

3.8K

147.3K

sun 🐶 retweetledi

Chayenne Zhao@GenAI_is_real·6 Nis

We're Not Wasting Tokens — We're Wasting the Design Margin of the Entire Inference Stack A few days ago I read a post by Fuli Luo on Twitter, discussing Anthropic's decision to cut off third-party harnesses (OpenClaw) from using Claude subscriptions, and the design thinking behind MiMo's Token Plan pricing. Her core argument: global compute capacity is seriously falling behind the token demand created by agents. The way forward isn't selling tokens cheaper in a race to the bottom — it's the co-evolution of "more efficient agent harnesses" and "more powerful, efficient models." I read it several times over. People who build inference engines have long been frustrated by how wastefully agent frameworks burn through tokens. She articulated something the industry has tacitly acknowledged but rarely stated plainly — and she did it with precision and restraint: the compute allocation crisis we face today is not fundamentally about insufficient compute. It's about tokens being spent in the wrong places. I want to push this one layer deeper, from my own perspective. I'm a heavy user of Claude Code — I make no attempt to hide that. You can check that all the latest code in SGLang Omni was built with Claude Code powering my workflow. Its commercial success is beyond question; it genuinely gave many people (myself included) their first real experience of "coding with an agent." But I'm also an inference engine developer — my day job is figuring out how to push prefix cache hit rates higher, how to make KV cache memory layouts more efficient, how to drive down the cost of every single inference request. So when I plugged Claude Code into a local inference engine and started observing the actual request patterns it generates, my reaction was — how to put it — like a water engineer who spent months designing a conservation system, only to watch someone water their garden with a fire hose. I measured Claude Code's cache hit rate on my local serving engine over the course of a day. The numbers were painful. This isn't a case of "decent but room to improve." It's a case of "the prefix cache mechanisms we carefully engineered at the inference layer are being almost entirely defeated." Fuli Luo mentioned that OpenClaw's context management is poor — firing off multiple rounds of low-value tool calls within a single user query, each carrying over 100K tokens of context window. Frankly, Claude Code's own context management is nowhere near making proper use of prefix cache or any of the other optimizations we've built into inference engines. Many people have already noticed — for example, the resume feature has a bug that causes KV cache misses entirely, which is borderline absurd. I'll say it plainly: the way sessions construct their context was never seriously designed with cache reuse in mind from the start. Perhaps Anthropic has internal trade-offs we can't see — after all, they control both ends of the stack, model and inference, and can theoretically do optimizations at the API layer that are invisible to us. But from the external behavior I can observe, enormous volumes of tokens are being spent on: re-transmitting already-processed context, re-parsing already-confirmed tool call results, and maintaining an ever-inflating conversation history with extremely low information density. If this is merely to earn more on inference token charges, I find it genuinely regrettable. But many Claude Code users are on subscriptions — burning more tokens is fundamentally a cost burden for Anthropic, not revenue. I honestly don't understand what purpose such inefficient context management serves for Claude Code. Here's a bold hypothesis: for those long sessions that consume 700K+ tokens, there is certainly a way to restructure the session's context so it accomplishes the exact same task with 10% of the tokens. Not by sacrificing quality, but through smarter context compression, more rational prefix reuse strategies, and more precise tool call scheduling. This isn't theoretical speculation — anyone who has worked on inference engine optimization, upon seeing current agent framework request patterns, would arrive at a similar conclusion. Fuli Luo is right: global compute capacity can't keep up with the token demand agents are creating. But I'd add that a significant portion of that gap is an illusion of prosperity — artificial demand manufactured by the crude design of agent frameworks. Here's an analogy I keep coming back to. I've always liked bringing up RAM bloat — in 1969, 64KB of memory sent Apollo to the moon. In 2026, I open a single webpage and 500MB of memory usage is nothing unusual. Every generation of hardware engineers pushes memory capacity higher, and every generation of software engineers lavishly fills it to the brim. People have gotten used to this cycle, even come to see it as the normal cost of progress. But LLM inference is different. The cost of RAM bloat is your computer running a bit slower, spending a couple hundred bucks on a memory upgrade — users barely notice. The cost of token bloat is real money — GPU cluster electricity bills, user subscription fees, the industry's entire compute budget. And this cost scales exponentially as agent usage grows. If we don't establish the engineering discipline that "tokens should be used efficiently" in the early days of the agent era, the cost of catching up later, once scale kicks in, will be beyond imagination. Fuli Luo notes that Anthropic cutting off third-party harness subscription access is objectively forcing these frameworks to improve their context management. I agree with that assessment, but my gut feeling is that this shouldn't stop at "third-party frameworks need to be more frugal with tokens." It should trigger a more fundamental reflection: what kind of agent-inference co-design do we actually need? Right now, agent frameworks and inference engines are essentially fully decoupled — agent frameworks treat the inference engine as a stateless API, sending the full context with every request. Meanwhile, the inference engine does its best with prefix matching, caching whatever it can. This architecture is simple and general-purpose, but brutally inefficient for long sessions. If agent frameworks could be aware of the inference engine's cache state and proactively construct cache-friendly requests — if inference engines could understand the session semantics of agents and make smarter cache eviction decisions — once that information channel between the two opens up, the potential gains in token efficiency are enormous. Of course, maybe I'm overthinking this. Maybe the market's ultimate answer is: compute gets cheap enough, waste is fine. Just like the RAM story — in the end, everyone chose "memory is big enough, no need to optimize." But I don't think the token economy will follow the same path, at least not in the near term — because the supply elasticity of GPU compute is far lower than that of DRAM. Under compute constraints, token efficiency isn't a "nice to have" optimization — it's the core competitive advantage that determines who survives. Most people love hearing "we made the model bigger," "we stretched the context window to a million tokens," "we stacked HBM to new heights" — these narratives are sexy, shareable, fundable. But I seriously believe that "finding ways to reduce the reckless waste of tokens" is a profoundly underestimated direction. This isn't a defensive optimization. It's an offensive capability — whoever first achieves an order-of-magnitude reduction in token consumption at equivalent quality can serve ten times the users on the same compute budget, or deliver ten times the agent depth to a single user. The agent era doesn't belong to whoever burns the most compute. It belongs to whoever uses it most wisely. This line from Fuli Luo resonates deeply with me. But I want to press further: who gets to define "wisely"? The people building models? The people building inference engines? The people building agent frameworks? I think the answer is — all three must come to the table together. And right now, we're nowhere close.

Fuli Luo@_LuoFuli

Two days ago, Anthropic cut off third-party harnesses from using Claude subscriptions — not surprising. Three days ago, MiMo launched its Token Plan — a design I spent real time on, and what I believe is a serious attempt at getting compute allocation and agent harness development right. Putting these two things together, some thoughts: 1. Claude Code's subscription is a beautifully designed system for balanced compute allocation. My guess — it doesn't make money, possibly bleeds it, unless their API margins are 10-20x, which I doubt. I can't rigorously calculate the losses from third-party harnesses plugging in, but I've looked at OpenClaw's context management up close — it's bad. Within a single user query, it fires off rounds of low-value tool calls as separate API requests, each carrying a long context window (often >100K tokens) — wasteful even with cache hits, and in extreme cases driving up cache miss rates for other queries. The actual request count per query ends up several times higher than Claude Code's own framework. Translated to API pricing, the real cost is probably tens of times the subscription price. That's not a gap — that's a crater. 2. Third-party harnesses like OpenClaw/OpenCode can still call Claude via API — they just can't ride on subscriptions anymore. Short term, these agent users will feel the pain, costs jumping easily tens of times. But that pressure is exactly what pushes these harnesses to improve context management, maximize prompt cache hit rates to reuse processed context, cut wasteful token burn. Pain eventually converts to engineering discipline. 3. I'd urge LLM companies not to blindly race to the bottom on pricing before figuring out how to price a coding plan without hemorrhaging money. Selling tokens dirt cheap while leaving the door wide open to third-party harnesses looks nice to users, but it's a trap — the same trap Anthropic just walked out of. The deeper problem: if users burn their attention on low-quality agent harnesses, highly unstable and slow inference services, and models downgraded to cut costs, only to find they still can't get anything done — that's not a healthy cycle for user experience or retention. 4. On MiMo Token Plan — it supports third-party harnesses, billed by token quota, same logic as Claude's newly launched extra usage packages. Because what we're going for is long-term stable delivery of high-quality models and services — not getting you to impulse-pay and then abandon ship. The bigger picture: global compute capacity can't keep up with the token demand agents are creating. The real way forward isn't cheaper tokens — it's co-evolution. "More token-efficient agent harnesses" × "more powerful and efficient models." Anthropic's move, whether they intended it or not, is pushing the entire ecosystem — open source and closed source alike — in that direction. That's probably a good thing. The Agent era doesn't belong to whoever burns the most compute. It belongs to whoever uses it wisely.

English

16

39

216

35.2K

sun 🐶@sunncynn·5 Nis

@NathanFlurry I have custom image for sandbox (preinstall needed package like libreoffice cal) can we use this ?

English

0

52

Nathan Flurry 🔩@NathanFlurry·4 Nis

This is why we're building agentOS as a new WebAssembly virtual machine to run coding agents in production efficiently github.com/rivet-dev/agen…

English

1

0

12

1.6K

Nathan Flurry 🔩@NathanFlurry·4 Nis

We're working with more and more companies replacing AI SDKs with Claude Code, OpenCode, and Pi in prod @rasbt's post today is hands down the best articulation on *why* harnesses matter for all use cases (link below)

English

8

4

145

9.7K

sun 🐶 retweetledi

Venelin K.@venelinkochev·30 Mar

Pro tip: add a Cloudflare WAF rule to block common scanner paths like .env, .git, wp-login they get blocked at the edge and never touch your server

English

52

92

1.6K

252.1K

sun 🐶@sunncynn·4 Nis

@paulg We have a feeling that if we answer too concisely, it makes others feel bad, or they will think we are angry or seem rude. That's why in Thai chat apps we use a lot of stickers to answer in formal business conversations. (Or just to end the conversation)

English

0

1

791

Paul Graham@paulg·4 Nis

I had just been noticing today that Thai speakers seem to spend longer talking about things than I'd expect.

English

181

321

2.6K

630K

sun 🐶@sunncynn·2 Nis

@MilksandMatcha We are builder from SEA, we want to build openclaw tuned for manage xlsx

English

0

8

sun 🐶@sunncynn·1 Nis

@0xSero @matanSF Broooooooo

0

3

0xSero@0xSero·1 Nis

Do you want to try Droid? I’m doing a giveaway 3 people will win 100M Factory credits each.Thats 5 months of their 20$ a month subscription. Winners selected randomly from comments in 48 hours.

English

1.1K

36

794

79.6K

sun 🐶@sunncynn·31 Mar

@0xSero @thdxr Hi bro

Indonesia

0

0xSero@0xSero·31 Mar

Giving away 5 Opencode Go subs Winners selected randomly from comments in 24 hours.

OpenCode@opencode

we’ve signed Zero Data Retention agreements with all providers for Go all models now follow a zero-retention policy your data is not used for training

English

2.2K

78

2.4K

222.5K

sun 🐶 retweetledi

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex·27 Mar

> And a major innovation:Smart bionic sweat gland cooling: evaporates water for ~10W active heat dissipation oh no, we're about to hear how robots are destroying our precious water too! Seriously though, this is important. Xiaomi is positioned to be one of the leaders here.

CyberRobo@CyberRobooo

Holy…S😳 Xiaomi's New CyberOne is so human-like Although this update features a bionic hand, I was immediately drawn to it. Let's look at the changes in the hand： It can handle industrial precision tasks like turning screws, plus delicate operations such as pinching feathers and throwing balloons. Behind the performance: >Volume cut by 60%:now almost identical in size/shape to a real human hand >Big leap in degrees of freedom (+50% total, +83% active:22-27DOF) >Full-palm tactile sensors over 8200 mm² for precise grip even without vision >150,000+ grip cycles durability (61-hour test) And a major innovation:Smart bionic sweat gland cooling: evaporates water for ~10W active heat dissipation Using tactile gloves to capture real human data, they’re training smoother, human-like grasps with imitation and reinforcement learning. Elon has said that humanoid hands and true AI are the most difficult aspects of building humanoid robots. It seems that Xiaomi is also getting close.

English

3

8

104

8.7K

sun 🐶 retweetledi