Eric Wong

942 posts

Eric Wong banner
Eric Wong

Eric Wong

@EricWongDEV

时光荏苒,岁月如梭

China शामिल हुए Ekim 2025
50 फ़ॉलोइंग60 फ़ॉलोवर्स
Eric Wong
Eric Wong@EricWongDEV·
Code Plan 到期了,求推荐。
中文
0
0
0
13
Roland的思考日记
我记得我小时候 人都是5点下班 6点吃饭 7点看新闻联播 7点半看电视剧 9点半睡觉 怎么我成了大人就不一样了
中文
91
45
1.3K
183K
Eric Wong
Eric Wong@EricWongDEV·
@skywind3000 感觉不能理解,在我看来AI工具让我的效率高了很多。可能跟使用的工具有关。以前,想法到代码实现,需要不少时间,遇到问题就更久。现在基本上就是想法给AI,负责逻辑审核就好。而且可以同时并行多个。
中文
0
0
0
282
LIN WEI
LIN WEI@skywind3000·
转:我的公司在11个月前推出了人工智能工具。从那以后,我做的每件事都花了更长的时间。 我不能把这些话说出口。 并非因为有政策。根本没有政策。还有比政策更糟糕的情况:那就是热情。 公司 Slack 上有一个名为#ai -wins 的频道,人们会在上面发布人工智能输出的截图,并配上诸如“这帮我节省了一个小时”之类的文字说明。一位副总裁每次全体员工大会的开场白都是“采用速度最快的公司才能最终获胜”。一位总监将他的团队名称从“运营”改为“智能运营”。现在,同行评审的题目变成了:“本季度您是如何利用人工智能工具来提升工作流程的?” 如果答案是“我没有,因为我以前更快”,那么这是一个职业选择。 所以我利用了这一点。 电子邮件: 在使用工具前,这花费的时间就是写一封邮件所需的时间。我没有计时,也没人计时。邮件写好发出去了,一切都很顺利。 现在我写好邮件,选中文本点击“使用AI增强”。AI会重写我的邮件。它把“我们周四可以见面吗?”替换成“我很乐意探讨一下能否找到一个双方都方便的时间来对齐此事”。我皱着眉头看完改过的,又删了它继续发送原始邮件。 这需要4分钟而不是2分钟。多出来的2分钟是用来润色的。我每天这样做11次。也就是说,我每天要花22分钟来否定那些已经完成的句子的改进意见。 在#ai -wins 中,我发布了重写后的截图。我没有发布我删除它的那部分。有 23 人用火箭表情符号点赞。 这就是 adoption 会议: 现在我们每次会议都配了 AI 笔记员。它会自动加入会议,进行录音、转录和总结。每次会议结束后,我都会收到一份三段式的会议总结。 我读了摘要,这花了3分钟。我当时也在会议上。我知道发生了什么。我现在看到的是机器对我亲身经历的事情的记录。有时候记录会出错。上周二,它把关于第三季度营收的评论归于我。那条评论是我的经理说的。我花了4分钟纠正了记录。 在 AI 记录员出现前,我不用每次会议后花7分钟去纠正机器人对我亲眼所见内容的记忆。我每周参加11次会议,这意味着每周要花77分钟来监督一份根本没人要求的转录稿。 我之前提过这件事,经理说:“想想那些没参加会议的人” 那些没参加会议的人根本不会看纪要。我查过了,已读回执显示打开率只有个位数。会议纪要的存在并非因为它们有用,而是因为它们就在那里,我读它们也是出于同样的原因。 文件: 我每周都会写一份工作进度报告,在用这些工具前,这只要10分钟。我把发生的事写下来,然后发出去。我的经理会快速浏览一下,这套系统运行良好。 现在我打开AI写作助手,输入要点,它生成了一份草稿。草稿上写着“多个工作流程均取得了显著进展”。但我并没有在多个工作流程中取得显著进展。我更新了电子表格,并发送了4封邮件。 我重写了草稿,把事情的真相说了出来。然后我用语法工具检查了一下。它建议我把“done”改成“completed”,把“next week”改成“in the forthcoming period”。我点击了9次“忽略”。然后我发送了原本10分钟就能写好的版本。现在整个过程要花30分钟。 过去11个月,我每周都这样做。我为一项原本不需要额外20分钟的任务多花了20分钟。我称之为效率。过去11个月,我一直都这么称呼它。这就是效率现在的含义:为了达到同样的结果,你花费了额外的时间,而这个过程却更长。没有人质疑过这个定义。我也没有把它提交给别人审查。 我曾经做过一次记录,持续了两周。记录了每项任务的耗时,包括使用AI之前和之后。结果显示:使用AI后,所有任务的耗时都更长。每项任务都如此,而且差距不小,增幅在 40% 到 200% 之间。 我删除了日志文件。 我删掉了那份文件,因为它用赤裸裸的数字表明,AI工具让我的工作效率降低了。在一家将人工智能应用作为战略重点的公司里,这样的文件根本不应该存在。我不能把它发给我的经理,因为他是人工智能推广的倡导者。我不能把它发布到#ai -wins版块。我也不能在会议上提起它,因为会议记录员会把它誊下来,然后总结会写着 “[姓名]对人工智能工具的有效性表示担忧”,而这份总结会是所有人首先阅读的内容。 所以我做了大家都在做的事。 我用了工具,也花了额外的时间,还在#ai -wins版块发帖。我在绩效考核里写着“利用人工智能简化了每周报告流程”,结果经理给了我4分(满分5分)的创新分。我根本没做任何创新。我只是在已经完成的流程里加了些步骤,把简单的事情弄得更复杂,还用一些以前有意义的词语来解释这种差异。 每周在#ai版块,都会有人发截图。然后有 20 个人用火箭表情符号点赞。但没人会发自己删除输出结果并重新执行任务的部分。没人会发回滚过程。没人会发前后对比计时器。没人会发。因为在任何一家认定人工智能是未来的公司里,“在AI工具出现之前,我的工作效率更高”这句话是绝对不能说出口的。 每家公司都认定人工智能是未来。 所以我们悄悄地利用这些技巧,增加步骤,称之为优化。结果就是完成的工作量略微减少,速度略微放慢,步骤略微增加,然后把这些都当作进步来汇报。 我的年度考核下个月就要开始了。。。
Peter Girnus 🦅@gothburz

My company rolled out AI tools 11 months ago. Since then, every task I do takes longer. I am not allowed to say this out loud. Not because there is a policy. There is no policy. There is something worse than a policy. There is enthusiasm. There is a Slack channel called #ai-wins where people post screenshots of AI outputs with captions like "this just saved me an hour." There is a VP who opens every all-hands with "the companies that adopt fastest win." There is a Director who renamed his team from Operations to Intelligent Operations. There is a peer review question that now asks: "How have you leveraged AI tools to enhance your workflow this quarter?" If the answer is "I haven't, because I was faster before," that is a career decision. So I leverage. Emails. Before the tools, I wrote emails. This took the amount of time it takes to write an email. I did not measure it. Nobody measured it. The email got written and sent and it was fine. Now I write the email. Then I highlight the text and click "Enhance with AI." The AI rewrites my email. It replaces "Can we meet Thursday?" with "I'd love to explore the possibility of finding a mutually convenient time to align on this." I read the rewrite. I delete the rewrite. I send my original email. This takes 4 minutes instead of 2. The 2 extra minutes are the enhancement. I do this 11 times a day. That is 22 minutes I spend each day rejecting improvements to sentences that were already finished. In #ai-wins I posted a screenshot of the rewrite. I did not post the part where I deleted it. 23 people reacted with the rocket emoji. That is adoption. Meetings. We have an AI notetaker in every meeting now. It joins automatically. It records. It transcribes. It summarizes. After each meeting I receive a 3-paragraph summary of the meeting I just attended. I read the summary. This takes 3 minutes. I was in the meeting. I know what happened. I am reading a machine's account of something I experienced firsthand. Sometimes the account is wrong. Last Tuesday it attributed a comment about Q3 revenue to me. My manager made that comment. I spent 4 minutes correcting the transcript. Before the notetaker, I did not spend 7 minutes after each meeting correcting a robot's memory of something I personally witnessed. I attend 11 meetings a week. That is 77 minutes per week supervising a transcription nobody requested. I mentioned this once. My manager said "think about the people who weren't in the meeting." The people who weren't in the meeting do not read the summaries. I checked. The read receipts show single-digit opens. The summaries exist not because they are useful but because they are there. I read them for the same reason. Documents. I write a weekly status update. Before the tools, this took 10 minutes. I typed what happened. I sent it. My manager skimmed it. The system worked. Now I open the AI writing assistant. I give it my bullet points. It produces a draft. The draft says "Significant progress was achieved across multiple workstreams." I did not achieve significant progress across multiple workstreams. I updated a spreadsheet and sent 4 emails. I rewrite the draft to say what actually happened. Then I run my rewrite through the grammar tool. It suggests I change "done" to "completed" and "next week" to "in the forthcoming period." I click Ignore 9 times. Then I send the version I would have written in 10 minutes. The process now takes 30. I have been doing this every week for 11 months. I have added 20 minutes to a task that did not need 20 more minutes. I call this efficiency. I have been calling it efficiency for 11 months. That is what efficiency means now. It means the additional time you spend to arrive at the same outcome through a longer process. Nobody has questioned this definition. I have not offered it for review. I kept a log once. 2 weeks. Every task, timed. Before-AI and after-AI. The after number was larger in every case. Every single one. Not by a little. The range was 40 to 200 percent. I deleted the log. I deleted it because it was a document that said, in plain numbers, that the AI tools make me slower. And a document like that has no place in a company where AI adoption is a strategic priority. I could not send it to my manager. He championed the rollout. I could not post it in #ai-wins. I could not raise it in a meeting because the notetaker would transcribe it and the summary would read "[Name] expressed concerns about AI tool efficacy" and that summary would be the first one anyone actually reads. So I do what everyone does. I use the tools. I spend the extra time. I post in #ai-wins. I write "leveraged AI to streamline weekly reporting" in my review and my manager gives me a 4 out of 5 for innovation. I have innovated nothing. I have added steps to processes that were already finished. I have made simple things longer and labeled the difference with words that used to mean something. Every week in #ai-wins someone posts a screenshot. And 20 people react with the rocket emoji. And nobody posts the part where they deleted the output and did the task themselves. Nobody posts the revert. Nobody posts the before-and-after timer. Nobody will. Because "I was better at my job before the AI tools" is a sentence that cannot be said out loud in any company that has decided AI is the future. Every company has decided AI is the future. So we leverage. Quietly. Adding steps. Calling them optimization. Getting slightly less done, slightly more slowly, with slightly more steps, and reporting it as progress. My yearly review is next month. There is a new section this year. "AI Impact Assessment." It asks me to quantify the hours saved by AI tools per week. I will write a number. The number will be positive. It will not be true. But the AI writing assistant will help me phrase it convincingly. That is the one thing it does well.

中文
4
0
26
11.2K
idoubi
idoubi@idoubicc·
电脑版微信已支持 ClawBot,如果你看不到入口,可以试试这种方式👇 1. 访问 weclaw 项目主页,复制安装命令,在你的电脑安装 weclaw github.com/fastclaw-ai/we… 2. 运行 weclaw login,输出微信登录二维码 3. 截图微信登录二维码,发给文件传输助手,点击查看大图,右键->识别图中二维码 4. 按照提示,升级新版本微信,重启后就有 ClawBot 了 5. 运行 weclaw start,手机微信扫描二维码,然后就可以在电脑微信 ClawBot 日常使用了 ------ 修改 weclaw 配置文件,你可以一次性接入多个 Agent,在微信 ClawBot 通过命令切换 Agent 对话,效率拉满。😄 PS:weclaw 是基于微信官方接口(不是逆向、不是 hack)实现的桥接服务,无封号风险,可放心接入,无需小号。
idoubi tweet media
idoubi@idoubicc

WeClaw 已支持自定义 Agent 的触发命令,只需要在配置文件:~/.weclaw/config.json 加一个 aliases { "codex": { "type": "acp", "command": "path-to/codex-acp", "aliases": ["gpt", "sam", "奥特曼"] }, "gemini": { "type": "acp", "command": "path-to/gemini", "aliases": ["pichai", "哈萨比斯"], "args": [ "--acp" ] } } 升级到新版本可用👇 weclaw update github.com/fastclaw-ai/we…

中文
7
28
115
18.8K
向阳乔木
向阳乔木@vista8·
读到一篇讲如何自动迭代优化Skill的论文。 好像有点东西。 地址见评论区,可以复制地址让Claude Code学习下。
向阳乔木 tweet media向阳乔木 tweet media
中文
16
73
365
42.5K
Frank Wang 玉伯
Frank Wang 玉伯@lifesinger·
今天在一个大会圆桌上,直接打断了一个嘉宾的发言,然后替主持人当了一小会主持人,让圆桌准时结束。 离开会场后,无比轻松。细思原因,是发现自己,又拓展了按常识做事的边界。 按常识做事是指:尊重常识并有行动。比如困了就睡、渴了就喝水、走路不看手机、吃饭专心等等。 今天的开心之处,是发现自己可以做到:听不下去就不听、想打断就有勇气去打断、做完后不忐忑而是平静。这感觉很奇妙。 好的创业,让团队和自己有成长。好的产品,让用户有成长。 对自己今天的行动,无比开心 + 轻松。
Frank Wang 玉伯 tweet media
中文
28
1
133
13K
Eric Wong
Eric Wong@EricWongDEV·
@berryxia 目前看来就是一个私聊工具,感觉和原先公众号私聊对话也每太大区别。
中文
0
0
0
230
Berryxia.AI
Berryxia.AI@berryxia·
微信ClawBot 已经登录Mac端,版本是4.1.8.67 . 需要更新后重启微信客户端即可,速度还不错。 还可以再github上进行一些开源软件做更多的好玩的探索。 评论区留下你现在在用的维信ClawBot的插件?
Berryxia.AI tweet media
中文
7
3
19
6.4K
Eric Wong
Eric Wong@EricWongDEV·
@lipeng0820 今天,老板发给我一张超长图,大概是40页PPT拼起来的,让我用 AI 把里面模板提出来。我想说,草!
中文
2
0
4
1.7K
SimbaLee
SimbaLee@lipeng0820·
现阶段不怕AI产生幻觉。害怕公司管理层和老板产生幻觉。 “这事儿AI搞一下很容易的”
中文
29
29
400
42.1K
Tw93
Tw93@HiTw93·
@savigny1779 气死了 我和你讲我发的时候说我不是原创 投诉微信 难用的一批 算了算了 洗稿的人也就只有洗稿的水平了
中文
9
0
26
6.1K
Sa-눈_눈
Sa-눈_눈@savigny1779·
之前关注了一个公众号,Agent开发主题,文章质量特别高,更新速度也贼快,一度让我非常敬佩。 直到今天发现是洗稿的。 刚好昨天才看完@HiTw93 大大的文章,一眼傻了。于是留言质疑标原创,给我来个文末有原文链接……
Sa-눈_눈 tweet mediaSa-눈_눈 tweet media
中文
3
0
6
6.7K
Eric Wong
Eric Wong@EricWongDEV·
@VondaShaye 我知道,按照你说的,他生成代码时候应该是调用tool call来执行检查,检查结果再返回给ai,没问题输出,有问题继续循环,这里面会浪费好几次来回的token
中文
0
0
0
19
Eric Wong
Eric Wong@EricWongDEV·
大家好,最近在测试大模型glm-5时,模型居然生成错误的 Lua 语法。 ctx.callback(session, function() -- .... }) 好奇,Lua 的语法已经够简单了,为什么还能生成错误的语法,那么大模型在生成 Rust 语言时,如何确保生成代码语法没问题呢?
中文
4
0
5
5.9K
探姬 | Hello-CTF 🚩
探姬 | Hello-CTF 🚩@ProbiusOfficial·
昨晚上根本没睡好 今天讲课讲一半突然心狂跳 有点害怕了。。。 但是我找不到人替啊😰
探姬 | Hello-CTF 🚩 tweet media
中文
5
0
29
1.9K
Eric Wong
Eric Wong@EricWongDEV·
@NataliyaRo4040 49斤?这绝对有问题了?男的骨架本身重。
中文
1
0
0
1K
NataliyaRose
NataliyaRose@NataliyaRo4040·
他又又又瘦了,只有49kg了。 我妈刚和我聊,真要过日子的话,必须要让他把这日夜颠倒的作息调整过来。 “首先,他晚上干活,我和龙龙都经常被噼里啪啦的键盘声吵醒,那肯定也会影响你的正常休息。” “其次,每天熬夜也不好好吃饭,身体只会越来越瘦,状态越来越差,赚了钱有什么用,都没心思花。” “再次,他白天睡大觉,以后做家务带小孩的这些任务,只能落在你一个人身上,你还要上班,多辛苦啊。” 他爸妈说,想邀请我妈劳动节去福建玩。 我妈问我:女儿,你确定就是这个人了吗? 这可不是去玩这么简单,我去了,就是你一辈子的事了。 所以不难看出来,我妈对我对象,还不是特别满意,她不想让我受婚姻的苦。 但是她支持我做的任何决定,一直以来都是,我爱我妈☺️
中文
26
1
48
6.7K
Nice奈斯
Nice奈斯@Nicebabycat·
人民网征集给 ai 取名字, 评论高赞:硅头
Nice奈斯 tweet media
中文
110
20
786
202.4K
Eric Wong
Eric Wong@EricWongDEV·
@Tim8nbm 你看,群里有人发病例,我投诉后,鸟微信怎么回复。
Eric Wong tweet media
中文
0
0
0
31
TimTimmy
TimTimmy@Tim8nbm·
昨天在微博看到张雪峰病情被泄露,其实挺有感触的。 在国内,名人一旦出事,医疗隐私被泄露,好像总是反复出现。 但如果这件事放在澳洲,其实是一条非常清晰的红线,而且是有人要为此付出代价的那种红线。 对医生来说: 一旦被投诉到 Australian Health Practitioner Regulation Agency 后果不是道歉这么简单。严重的话,停牌甚至吊销执照,基本等于职业结束 对医院来说: 直接触碰 Privacy Act 1988 Office of the Australian Information Commissioner 会介入 对应的就是:罚款、强制整改、声誉受损 除此之外,病人或者家属是可以直接起诉要求大额赔偿的。 希望类似的事情,能真正推动隐私保护这件事,从“大家都知道重要”,变成“没人敢碰的红线”。
中文
21
7
74
31.3K
Eric Wong
Eric Wong@EricWongDEV·
@VondaShaye 那他是生成后在你本地调用工具检查?
中文
1
0
0
35
Rick Morty
Rick Morty@VondaShaye·
@EricWongDEV 我是在AGENTS.md里面写清楚 改了以后直接调对应工具比如ruff 来自动检查语法,通过了再自己写测试来跑。总的来说,大语言模型的根本原理就造成了它有一定概率出现幻觉,与训练的多少有关,但也没法完全避免
中文
1
0
0
53
Eric Wong
Eric Wong@EricWongDEV·
@vista8 之前写过类似的文字,当时我是被所谓的灵魂文件恶心到了。明明都特么是system role content,非得加各种乱七八糟的名字。
中文
0
0
0
321
向阳乔木
向阳乔木@vista8·
让AI重写,内容如下: AI 圈最喜欢的游戏:给旧东西起个新名字 最近 AI 圈在传一篇关于「Harness Engineering」的长文,几万字,几乎可以确定是 AI 写的。 SGLang 社区的工程师 Chayenne Zhao 读完第一反应不是「好概念」。 而是心里想:这帮人除了给旧东西起新名字,还有别的想法吗? 这个吐槽深得我心。 从 Prompt Engineering 到 Context Engineering,现在又是 Harness Engineering。 每隔几个月,就有人造个新词,写一篇万字长文,引几个大厂案例,整个社区开始嗡嗡作响。 但你真的看进去,说的都是同一件事: 设计模型运行的环境,给它什么信息,用什么工具,怎么管理记忆,怎么拦截错误。 ChatGPT 发布第一天这件事就存在了。 换个名字不代表它变成了新学科。 抱怨归抱怨,Chayenne 随后写了她自己真实踩过的坑,这部分才是文章的价值所在。 她在为 SGLang 社区构建一个多智能体系统,自动回答用户的技术问题,比如怎么在 8 张 GPU 上部署 DeepSeek-V3,GLM-5 的 INT4 和 FP8 差距大不大。 最开始的想法是最朴素的:做一个全知 Agent,把 SGLang 所有文档、代码、Cookbook 全部塞进去,什么都能答。 结果当然失败了。 上下文窗口不是内存,塞得越多,模型注意力越分散,答案越差。 一个 Agent 同时要理解量化、PD 分解、扩散服务、硬件兼容性,结果哪个都不深。 最后跑通的设计是: 把 SGLang 文档按功能边界拆成若干独立的「子领域专家 Agent」,上面放一个 Expert Debating Manager,负责接收问题,分解子问题,查路由表激活对应 Agent,并行求解,再汇总答案。 这三条经验很实在: ① 信息给 Agent 要精不要多。 ② 复杂系统拆成专项子模块,不要造全知 Agent。 ③ 所有知识必须活在仓库里,口头约定不存在,路由和约束必须是结构化的,不能靠模型自己判断。 这些原则在传统软件工程里叫做关注点分离、单一职责、文档即代码。 现在搬到 LLM 环境里,有些人觉得这值得一个新名字。 Chayenne 觉得不值得给个新名字。 我同意后半段,但新名字这件事我看法不同,稍后说。 文章最后 Chayenne 抛了一个她自己也没想清楚的问题: 如果模型能力持续指数级增长,会不会有一天,模型强大到可以自己构建运行环境? 她提到 OpenClaw 这个项目,一个月内代码从 40 万行涨到 100 万行,主要由 AI 自己驱动完成。 那这个项目的环境是谁搭的,是人,还是 AI? 这个问题切中肯綮。 我们现在讨论的所有「工程实践」,包括 Harness Engineering 这个词背后真实存在的那些经验,前提都是人在主动设计 Agent 的运行环境。 但如果这件事本身将来也可以外包给 AI,那今天讨论的这些原则,两年后还剩多少是有效的? Chayenne 的答案是:至少今天,这还是人的工作,而且是最有价值的那种。 我觉得这个回答没问题,但藏着一种隐隐的不确定感,她自己也感受到了。 回到新名字这件事,我理解 Chayenne 的烦躁,但「给旧东西起新名字」这件事本身有时候不是坏事。 Prompt Engineering 这个词被造出来之前,做这件事的人是存在的,但圈子里没有形成共同语言,讨论和积累都很低效。 新词的价值在于它把散落的实践收敛成一个可以对话的概念。 问题在于,当造词速度超过真正的认知积累速度,那就是在消费注意力,不是在推进理解。 Harness Engineering 这个词现在是后者,但 Chayenne 在 how-to-sglang 上的踩坑记录,是实实在在的前者。
Chayenne Zhao@GenAI_is_real

Today I read a lengthy piece on Harness Engineering — tens of thousands of words, almost certainly AI-written. My first reaction wasn't "wow, what a powerful concept." It was "do these people have any ideas beyond coining new terms for old ones?" I've always been annoyed by this pattern in the AI world — the constant reinvention of existing concepts. From prompt engineering to context engineering, now to harness engineering. Every few months someone coins a new term, writes a 10,000-word essay, sprinkles in a few big-company case studies, and the whole community starts buzzing. But if you actually look at the content, it's the same thing every time: Design the environment your model runs in — what information it receives, what tools it can use, how errors get intercepted, how memory is managed across sessions. This has existed since the day ChatGPT launched. It doesn't become a new discipline just because someone — for whatever reason — decided to give it a new name. That said, complaints aside, the research and case studies cited in the article do have value — especially since they overlap heavily with what I've been building with how-to-sglang. So let me use this as an opportunity to talk about the mistakes I've actually made. Some background first. The most common requests in the SGLang community are How-to Questions — how to deploy DeepSeek-V3 on 8 GPUs, what to do when the gateway can't reach the worker address, whether the gap between GLM-5 INT4 and official FP8 is significant. These questions span an extremely wide technical surface, and as the community grows faster and faster, we increasingly can't keep up with replies. So I started building a multi-agent system to answer them automatically. The first idea was, of course, the most naive one — build a single omniscient Agent, stuff all of SGLang's docs, code, and cookbooks into it, and let it answer everything. That didn't work. You don't need harness engineering theory to explain why — the context window isn't RAM. The more you stuff into it, the more the model's attention scatters and the worse the answers get. An Agent trying to simultaneously understand quantization, PD disaggregation, diffusion serving, and hardware compatibility ends up understanding none of them deeply. The design we eventually landed on is a multi-layered sub-domain expert architecture. SGLang's documentation already has natural functional boundaries — advanced features, platforms, supported models — with cookbooks organized by model. We turned each sub-domain into an independent expert agent, with an Expert Debating Manager responsible for receiving questions, decomposing them into sub-questions, consulting the Expert Routing Table to activate the right agents, solving in parallel, then synthesizing answers. Looking back, this design maps almost perfectly onto the patterns the harness engineering community advocates. But when I was building it, I had no idea these patterns had names. And I didn't need to. 1. Progressive disclosure — we didn't dump all documentation into any single agent. Each domain expert loads only its own domain knowledge, and the Manager decides who to activate based on the question type. My gut feeling is that this design yielded far more improvement than swapping in a stronger model ever did. You don't need to know this is called "progressive disclosure" to make this decision. You just need to have tried the "stuff everything in" approach once and watched it fail. 2. Repository as source of truth — the entire workflow lives in the how-to-sglang repo. All expert agents draw their knowledge from markdown files inside the repo, with no dependency on external documents or verbal agreements. Early on, we had the urge to write one massive sglang-maintain.md covering everything. We quickly learned that doesn't work. OpenAI's Codex team made the same mistake — they tried a single oversized AGENTS.md and watched it rot in predictable ways. You don't need to have read their blog to step on this landmine yourself. It's the classic software engineering problem of "monolithic docs always go stale," except in an agent context the consequences are worse — stale documentation doesn't just go unread, it actively misleads the agent. 3. Structured routing — the Expert Routing Table explicitly maps question types to agents. A question about GLM-5 INT4 activates both the Cookbook Domain Expert and the Quantization Domain Expert simultaneously. The Manager doesn't guess; it follows a structured index. The harness engineering crowd calls this "mechanized constraints." I call it normal engineering. I'm not saying the ideas behind harness engineering are bad. The cited research is solid, the ACI concept from SWE-agent is genuinely worth knowing, and Anthropic's dual-agent architecture (initializer agent + coding agent) is valuable reference material for anyone doing long-horizon tasks. What I find tiresome is the constant coining of new terms — packaging established engineering common sense as a new discipline, then manufacturing anxiety around "you're behind if you don't know this word." Prompt engineering, context engineering, harness engineering — they're different facets of the same thing. Next month someone will probably coin scaffold engineering or orchestration engineering, write another lengthy essay citing the same SWE-agent paper, and the community will start another cycle of amplification. What I actually learned from how-to-sglang can be stated without any new vocabulary: Information fed to agents should be minimal and precise, not maximal. Complex systems should be split into specialized sub-modules, not built as omniscient agents. All knowledge must live in the repo — verbal agreements don't exist. Routing and constraints must be structural, not left to the agent's judgment. Feedback loops should be as tight as possible — we currently use a logging system to record the full reasoning chain of every query, and we've started using Codex for LLM-as-a-judge verification, but we're still far from ideal. None of this is new. In traditional software engineering, these are called separation of concerns, single responsibility principle, docs-as-code, and shift-left constraints. We're just applying them to LLM work environments now, and some people feel that warrants a new name. I don't know how many more new terms this field will produce. But I do know that, at least today, we've never achieved a qualitative leap on how-to-sglang by swapping in a stronger model. What actually drove breakthroughs was always improvements at the environment level — more precise knowledge partitioning, better routing logic, tighter feedback loops. Whether you call it harness engineering, context engineering, or nothing at all, it's just good engineering practice. Nothing more, nothing less. There is one question I genuinely haven't figured out: if model capabilities keep scaling exponentially, will there come a day when models are strong enough to build their own environments? I had this exact confusion when observing OpenClaw — it went from 400K lines to a million in a single month, driven entirely by AI itself. Who built that project's environment? A human, or the AI? And if it was the AI, how many of the design principles we're discussing today will be completely irrelevant in two years? I don't know. But at least today, across every instance of real practice I can observe, this is still human work — and the most valuable kind.

中文
14
6
93
24.1K