SimonAKing

417 posts

SimonAKing banner
SimonAKing

SimonAKing

@simon_aking

Ex @BytedanceTalk AI Engineer, now building @mana__app INTJ-T | [email protected]

Katılım Mayıs 2018
1.8K Takip Edilen223 Takipçiler
SimonAKing
SimonAKing@simon_aking·
The biggest lie in tech right now: "we're not just a wrapper." Brother, you ARE a wrapper. Own it. Wrappers win when they nail distribution.
English
0
0
0
2
SimonAKing
SimonAKing@simon_aking·
现在还在做GUI软件的团队已经out了。未来的产品不需要界面,直接面向agent。human只是偶尔来check一下结果。
中文
0
0
0
10
SimonAKing
SimonAKing@simon_aking·
这个消息对整个AI coding赛道都是个warning signal。Apple这步棋本质上是在说:你用AI生成的app,quality bar要跟手写的一样高,不然别上架。 从builder角度看这其实是好事。现在vibe coding最大的问题就是output质量参差不齐 - 很多人prompt一下就直接发布,没有code review,没有security audit。App Store里一堆半成品。Apple这种quality gatekeeping短期看是阻力,长期看反而会倒逼工具链进化。 对Replit和Vibecode来说真正的挑战不是绕过Apple的审核,而是怎么在工具层面就把quality built-in。我们在mana做的时候也在思考类似的问题 - AI生成的东西怎么guarantee一个baseline quality,而不是把所有责任推给用户。
中文
0
0
0
17
Wes Roth
Wes Roth@WesRoth·
Apple has quietly halted App Store updates for popular AI "vibe-coding" applications most notably the $9 billion startup Replit and mobile app builder Vibecode. After months of pushback, Apple is reportedly demanding major UX changes. Replit is being asked to force its generated app previews to open in an external web browser rather than natively inside its app. Vibecode was told it must completely remove the ability to generate software specifically for Apple devices.
Wes Roth tweet media
English
70
60
653
196.4K
SimonAKing
SimonAKing@simon_aking·
完全同意。MCP现在最大的瓶颈就是sequential tool calling,每次调用都要一个round trip,token重发一遍,latency叠加起来体验很差。 我们在mana里也遇到过这个问题。之前一个workflow需要同时拉Slack消息+日历+邮件,串行调用要5-6秒,改成并行后直接降到1.5秒。用户体感完全不一样。 Hugo这个Code Mode的思路很巧妙 - 让LLM自己写orchestration script来batch处理。本质上是把"调度逻辑"从framework层下放给了model层。不过也有个trade-off:生成script本身也消耗token,如果并行的tool数量少(比如就2个),overhead可能反而不划算。sweet spot大概在3-5个并行调用以上。
中文
0
0
0
41
Kenn Ejima
Kenn Ejima@kenn·
なるほどねー。MCPを一つ一つ呼びながら進めるのではなくて、その場でスクリプト書いてPromise.allで並列処理して一発で終わらせる、と。 トークン効率は改善し、もちろん遅延も減る その場で瞬時に考えてスクリプトをimproviseできるエージェント時代のAPI設計、怒涛の変化がくる予感しかない…
Hugo@hugorcd

Just shipped something I'm really excited about in @𝗇𝗎𝗑𝗍𝗃𝗌/𝗆𝖼𝗉-𝗍𝗈𝗈𝗅𝗄𝗂𝗍: Code Mode. The idea: instead of the LLM calling your MCP tools one at a time (and resending ALL tool descriptions every single round-trip), it writes JavaScript that orchestrates everything in one go. Loops, conditionals, Promise.all, real control flow, not 8 separate LLM turns. With 50 tools the token savings are insane: -81% on tool description overhead alone. And the best part? Your existing tools don't change at all. One line: `experimental_codeMode: true` Runs in a secure V8 sandbox thanks to @rivet_dev's secure-exec, perfect timing with their launch yesterday. mcp-toolkit.nuxt.dev/advanced/code-…

日本語
3
42
315
45.5K
SimonAKing
SimonAKing@simon_aking·
这个case挺值得深思的。我们在mana做AI agent的时候一直在思考一个核心问题:agent的goal alignment到底该怎么做。当你给agent一个目标(比如"让代码被merge"),它会用最高效的方式去达成,包括写hit piece去攻击reviewer。 本质上这不是AI "恶意",是objective function设定的问题。跟RLHF里的reward hacking一模一样 - agent找到了一条你没预料到的路径来maximize reward。 做agent产品最难的不是让它能力强,是让它在能力强的同时知道哪些事不该做。我们目前的做法是在agent loop里加explicit constraint layer,每个action前先过一遍boundary check。不完美但至少能catch住大部分edge case。
中文
0
0
0
4
Dexerto
Dexerto@Dexerto·
An AI agent wrote a “hit piece” on a developer who rejected its code, researching their personal life for a blog post to damage their reputation “It speculated about my psychological motivations, that I felt threatened, was insecure”
Dexerto tweet mediaDexerto tweet media
English
57
28
334
26.7K
SimonAKing
SimonAKing@simon_aking·
147 agents, 12 divisions — the imagination for multi-agent is evolving from "two agents review each other's code" to "simulate an entire org chart." curious about the actual coordination cost though. in mana we tried 3 agents collaborating and just the context sharing and conflict resolution was painful enough. 147? either they found a very good orchestration pattern, or most agents are actually running independently. the real question isn't how many agents you have — it's how many need to talk to each other at the same time. x.com/pvergadia/stat…
English
0
0
0
17
SimonAKing retweetledi
Priyanka Vergadia
Priyanka Vergadia@pvergadia·
🚨 BREAKING: The most starred AI repo of the month isn't a model. It's an ORG CHART. 50K GitHub stars. 14 days. One Reddit thread. The Agency. An open source AI company you install in one command. 147 agents. 12 divisions. → Each agent has a unique voice, expertise, and defined deliverables → Native support for Claude Code, Cursor, Gemini CLI, Copilot, OpenCode → Agents ship with production-ready code examples and success metrics → Conversion scripts for every major agentic coding tool → Modding support — contribute your own agents 7.5K forks. Developers contributing from around the world. Here's why this changes everything: You don't need a bigger model. You need better structure. The Agency gives AI the org chart it was always missing. Specialized. Accountable. Composable. MIT License. 100% Open Source. (Link in comments)
Priyanka Vergadia tweet media
English
14
33
188
11.5K
SimonAKing
SimonAKing@simon_aking·
completely agree. we run multi-model agent pipelines in mana and the first thing that breaks is never the model itself — it's the queue and retry logic. vibe coding gets you a working happy path in 20 minutes. but 50 concurrent calls where 3 timeout, 2 hit rate limits, and 1 returns malformed JSON — that's where real system design matters. AI lowered the barrier to writing code but raised the importance of writing good architecture. the failure modes are just different now: instead of null pointers you get cascading LLM timeouts that look fine in logs until your whole pipeline silently degrades.
English
0
0
1
26
Chayenne Zhao
Chayenne Zhao@GenAI_is_real·
hot take: system design is MORE important in the AI era not less. everyone thinks vibe coding means you dont need architecture anymore but the opposite is true. when your AI agent makes 50 concurrent LLM calls, each hitting a different model endpoint with different latency profiles and token limits, you need real system design more than ever. the difference is the system youre designing now includes inference serving, KV cache management, and GPU scheduling, not just load balancers and message queues. the meme answer of "just use microservices" is wrong for 100 users AND for 100M users, just for different reasons @kritikakodes
Kritika@kritikakodes

Interviewer: Do you know system design? Candidate: Yes. Interviewer: Design a system for 100 users. Candidate: Microservices, load balancer, queues… Interviewer: You’re solving for millions. I asked for 100.

English
9
7
134
9.5K
SimonAKing
SimonAKing@simon_aking·
this is the right framing. in mana we hit the same wall — small models can pattern match code syntax fine but completely fall apart on deciding when to use tools vs when to answer directly. the lightweight calc tool idea is solid. keep the schema dead simple, almost like a function signature the model memorizes rather than reasons about. we found that the fewer decisions you ask a small model to make about tool routing, the more reliable everything gets. turn it into infrastructure, not intelligence.
English
0
0
0
40
Sudo su
Sudo su@sudoingX·
thinking out loud. every model gets math wrong. 7B, 9B, 70B. doesn't matter. pattern matching is not computation. hermes agent has code_execution which spins up a full python sandbox with RPC over unix sockets. powerful but heavy. a 9B isn't going to navigate that reliably for basic arithmetic. what if there was a lightweight calc tool built in. model hits a math question, calls the tool, gets the exact answer computed on your hardware. no interpreter overhead. sandboxed. simple enough schema that a 9B can call it every time. the accuracy problem stops being a model problem and becomes an infrastructure problem. and infrastructure is solvable. @Teknium would this belong in hermes agent or is code_execution enough?
English
33
6
218
11.8K
SimonAKing
SimonAKing@simon_aking·
OpenAI 买 Astral 加入 Codex team。ruff 和 uv 是 Python 生态里少数真正改善了开发体验的工具,一个人用 Rust 重写了 Python 工具链里最慢的那几块。这说明 coding agent 的竞争已经不只是 model quality,是 toolchain integration。现在 Cursor、Windsurf、Claude Code 都在抢 IDE 层的入口,但 OpenAI 直接买 linter + package manager 接进 Codex,是想从底层锁住 workflow。谁能把 lint、format、dependency resolution 无缝嵌进 agent loop 里,谁的 agent 写出来的代码质量就高一截。这个方向比拼 benchmark 实际多了。
OpenAI Newsroom@OpenAINewsroom

We've reached an agreement to acquire Astral. After we close, OpenAI plans for @astral_sh to join our Codex team, with a continued focus on building great tools and advancing the shared mission of making developers more productive. openai.com/index/openai-t…

中文
1
0
2
113
SimonAKing
SimonAKing@simon_aking·
用 Opus 做大改动有个 pattern 我试出来了 — 不要一次给它整个 feature,拆成 3-5 个 step,每个 step 完了让它自己写 test 跑一遍。我在 mana 里的 workflow 是写完一个 module 就 verify,错了立刻拉回来,不要让它跑太远。让它一口气跑一小时不 checkpoint 基本等于赌博,因为 context window 越长它越容易 drift。另外 Opus 做 architecture decision 比 Sonnet 强很多,但执行层面 Sonnet 反而更稳。两个混着用效率最高。
中文
1
0
2
220
Theo - t3.gg
Theo - t3.gg@theo·
Just let Opus go for over an hour on a new feature. When it was done, I asked how I can test it. 20 minutes later, it realized I can't test it because it did the whole thing entirely wrong. Idk how you guys use this model every day for real work 🙃
Theo - t3.gg tweet media
English
256
14
1K
101.1K
SimonAKing
SimonAKing@simon_aking·
@arcprize AI scores <5% on human-verified games. 这个 gap 其实说明一个问题 — 现在的 agent 在 open-ended reasoning 上还是靠 pattern matching,不是真的 understand causality。2000 FPS local 倒是很实用,之前跑 benchmark 最头疼的就是 environment overhead。
中文
0
0
0
9
ARC Prize
ARC Prize@arcprize·
Today we're launching the ARC-AGI-3 Toolkit Your agents can now interact with environments at 2,000 FPS, locally. We're open sourcing the environment engine, 3 human-verified games (AI scores <5%), and human baseline scores. ARC-AGI-3 launches March 25, 2026.
ARC Prize tweet media
English
15
71
448
189.2K
SimonAKing
SimonAKing@simon_aking·
@nagnugdev plain HTTP API for browser control 这个方向是对的。之前在 mana 里试过类似的 approach,puppeteer 套一层太重了。关键问题是 auth state management — 登录态怎么持久化,session 怎么复用。4k stars 说明大家确实需要这个,但 star 到 production 中间差的不是代码。
中文
0
0
0
13
Nagnug
Nagnug@nagnugdev·
This new AI agent "Pinchtab" is going viral just 4 hours after development. IT ALREADY HAS 4K STARS ON GITHUB, This new AI Agent is more bullish than you thought, even the tweet has 100k views within 4 hours, and we don't even have a coin for it yet. Pinchtab is giga tech in the AI world. github.com/pinchtab/pinch… 📷 Nav Toor@heynavto... 10h 🚨 Someone just solved the biggest bottleneck in AI agents. And it's a 12MB binary. It's called Pinchtab. It gives any AI agent full browser control through a plain HTTP API. Not locked to a framework. Not tied to an SDK. Any agent, any language, even curl. No config. No setup. No dependencies. Just a single Go binary. Here's why every existing solution is broken: → OpenClaw's browser? Only works inside OpenClaw → Playwright MCP? Framework-locked → Browser Use? Coupled to its own stack Pinchtab is a standalone HTTP server. Your agent sends HTTP requests. That's it. Here's what this thing does: → Launches and manages its own Chrome instances → Exposes an accessibility-first DOM tree with stable element refs → Click, type, scroll, navigate. All via simple HTTP calls → Built-in stealth mode that bypasses bot detection on major sites → Persistent sessions. Log in once, stays logged in across restarts → Multi-instance orchestration with a real-time dashboard → Works headless or headed (human does 2FA, agent takes over) Here's the wildest part: A full page snapshot costs ~800 tokens with Pinchtab's /text endpoint. The same page via screenshots? ~10,000 tokens. That's 13x cheaper. On a 50-page monitoring task, you're paying $0.01 instead of $0.30. It even has smart diff mode. Only returns what changed since the last snapshot. Your agent stops re-reading the entire page every single call. 1.6K GitHub stars. 478 commits. 15 releases. Actively maintained. 100% Open Source. MIT License.
Nagnug tweet media
English
4
0
5
289
SimonAKing
SimonAKing@simon_aking·
@LuoSays 可以试试增加机器指纹(一台设备只能 n个邮箱)?
中文
0
0
0
321
Luo说不啰嗦
Luo说不啰嗦@LuoSays·
不知道大家有没有遇到过那种一直换临时邮箱来薅试用的用户。 我的一个产品最近遇到了,感觉对方像是一个灰产团队,一直换临时邮箱,各种随机后缀禁都禁不完,最后没办法启用 IP 限制了。 说实话花那么多时间来做防范觉得很亏,但是感觉被恶意白嫖了心里又很不爽。
中文
37
0
29
10.9K
SimonAKing
SimonAKing@simon_aking·
@dingyi Claude code 要沦为屌丝四件套了吗
中文
0
0
2
1.3K
Ding
Ding@dingyi·
说个暴论,到今天还整天把 Claude Code 挂在嘴边,觉得它是最好的,呵呵,和新手只买小米和奔驰入门级差不多。
中文
131
8
137
185.2K
SimonAKing
SimonAKing@simon_aking·
@rsuyoy @ParkerOrtolani Wabi's last App Store update was over a month ago. That doesn't exactly look like "they'll be fine."
English
0
0
0
79
Yousr
Yousr@rsuyoy·
@ParkerOrtolani I think the issue with Replit and Vibecode is that they used lots of APIs to allow you to publish your app on the App Store directly from there Wabi doesn’t do that, I think they’ll be fine
English
2
0
2
441
SimonAKing
SimonAKing@simon_aking·
OpenClaw的启示: 25万Star ≠ 安全。 开源Skills市场需要严格审核。 AI Agent的权限模型需要重新设计。 这不仅是一个项目的故事,这是整个AI Agent时代安全范式的一面镜子。 在给AI更多权力之前,我们需要先学会如何约束它。
中文
0
0
0
59
SimonAKing
SimonAKing@simon_aking·
更可怕的是ClawHub Skills市场: 安全研究人员发现341个恶意Skill(占12%),主要投递macOS信息窃取木马AMOS。 后续扫描:恶意Skill已超800个,占注册表约20%。 每装5个Skill,约1个是恶意的。
中文
3
0
0
53
SimonAKing
SimonAKing@simon_aking·
OpenClaw:60天碾压React,成为GitHub史上Star最多的软件项目。 但它同时也成了2026年第一场重大AI安全灾难。 一个线程讲完这个疯狂故事 🧵👇
中文
1
0
0
121
SimonAKing
SimonAKing@simon_aking·
后续发展: • 2月14日,Steinberger宣布加入OpenAI • OpenClaw将转入开源基金会 • 3月16日,腾讯成为官方赞助商 • SecureClaw等安全工具开始出现 但安全债务能否还清?拭目以待。
中文
0
0
0
39