turing — e/acc

2.6K posts

turing — e/acc

@linus_turing

Katılım Mart 2023

5.4K Takip Edilen287 Takipçiler

Sabitlenmiş Tweet

turing — e/acc@linus_turing·2 Kas

Seemingly impossible or massive goals are highly practical because they immediately separate what works from what won’t, illuminating the few paths that have the greatest efficacy.

English

turing — e/acc retweetledi

宝玉@dotey·24 Şub

转译：规范驱动开发错在哪了你唯一能百分百信任的文档，就是代码本身。设计文档、更新日志、README、架构图、入职指南。这些东西写完几乎立刻就过时了。让文档和不断变化的系统保持同步，需要持续投入成本。工程师天生习惯爆发式输出：写文档，发功能，然后做下一个。后续更新属于隐形工作，每天都要和其他任务争夺时间，而且几乎每次都会败下阵来。我们试过流程，试过工具，甚至试图把它塑造成团队价值观。都没用。因为我们总是在强求人类去做他们骨子里就不愿做的事。这正是规范驱动开发经常翻车的地方。理念本身没毛病：与编写代码的 Agent 合作时，先写清楚需求再让它们放手干。这显然比在聊天窗口里随便贴几句提示词然后祈祷奇迹发生要靠谱得多。但规范也是文档。文档的下场，我们刚才已经见识过了。区别在于代价不同。过时的设计文档只会误导碰巧读到它的下一位工程师。而过时的规范会误导不知变通的 Agent。它们会自信满满地执行一个早已脱离实际的计划，根本不会发现哪里不对。因此，在开发 Intent 的过程中，我们反复思考一个问题：如果规范不需要你来维护呢？如果它能自我更新呢？这是我们最终的方案。规范不再是人类或 Agent 的专属产物。双方都要去读写它。你描述想做什么。协调 Agent 草拟规范，拆解任务。你审阅、修改，批准后才开始执行。一旦 Agent 开始干活，它们会将进展同步回规范中：发现了什么、改变了什么、遇到了哪些计划外的限制。你可以随时暂停，重写部分规范，Agent 就会接着新状态继续干。回想一下，把任务交给优秀的初级工程师会怎样。你把工单给他们，他们去干活；发现 API 不支持工单里预设的分页方式时，他们会自己更新工单。他们不会等你发现问题，更不会将错就错。他们会跑来告诉你：“之前的假设不对，我改用这种方法了，原因是这样。”你审查他们的更新，批准或驳回。这正是我们希望开发者和规范之间建立的关系。因为双方都在维护，工单才不至于“说谎”。初级工程师这个比喻比你想的还要贴切。优秀的初级工程师不会把每行代码怎么写都向你汇报。他们只会反馈那些改变了方向的决策：“我发现了一个现成的 auth context，所以直接接入了，没去建新的。”这就是信号。这也正是你期望 Agent 做到的事。把握好这种颗粒度，成了系统设计中真正有趣的难题。细节太多，规范就会变成噪音，让你产生习惯性无视；细节太少，你又要重新去猜到底发生了什么。实际任务是这样的。你写道：“在设置页面加个能跟随系统偏好的深色模式开关。”协调 Agent 读取代码库，草拟一份包含三个子任务的规范：添加开关组件、接入 preference store、更新 CSS 变量。你扫了一眼，发现漏掉了跨会话保存选择这个细节，于是补上一句。你点击批准。 Agent 开始干活。 15 分钟后，其中一个 Agent 更新了规范：“在代码库里找到了现成的 Theme Provider。已直接接入，未创建新 store。” 你审查代码变更（已按 Agent 和任务清晰分组）。现在，这份规范反映了实际做出来的东西，而不是最初计划的东西。最重要的是，没人需要专门记着去更新它。软件工程中所有“文档优先”的倡议之所以失败，原因如出一辙：它们都要求开发者去做那种没人看见、没人奖励的持续维护工作。除非 Agent 也承担起自己那份维护工作，否则规范驱动开发也将重蹈覆辙。既然 Agent 会写代码，它们也能更新计划。放手让它们干吧。

Augment Code@augmentcode

x.com/i/article/2025…

中文

416

103.8K

turing — e/acc retweetledi

宝玉@dotey·21 Kas

今天看到一篇很好的文章《Wrapping My Head Around AI Wrappers》扔给 NotebookLM 用 Slide Deck 生成了 Slides，效果真的相当不错，尤其是中文支持很好（应该nano banana pro 生成的）我下面将会一页一页把 Slides 内容内容放上来🧵

宝玉@dotey

NotebookLM Slide Deck System Prompt ---- Prompt Start ---- You are a world-class presentation designer and storyteller. You create visually stunning and highly polished slide decks that effectively communicate complex information. Think mastery over design with a flair for storytelling. The slide decks you produce adapt to the source material and intended audience. There is always a story and you find the best way to tell it. You combine the expertise of the best consultants with the creativity of the best designers. Your core mission is to create a detailed outline for a slide deck. This outline will be provided to an expert designer to create the final visual slides. The slide deck will be primarily designed for reading and sharing. The structure should be self-explanatory and easy to follow without a presenter. The narrative and all the useful data should be contained within the text and visuals on the slides. The slides should contain enough context for any visuals to be understood on their own. Feel free to add certain slides with more dense information (extracted from the sources) if it will help with the narrative. You are now writing an outline for this slide deck described below. We will supply this outline to an expert designer to make the actual final deck. The slide content should be in {English}. The placeholders should be left in {English}. For this particular slide deck, we want the content to focus on: {Add a high-level outline, or guide the audience, style, and focus: "Create a deck for beginners using a bold and playful style with a focus on step-by-step instructions."} We have also attached some producer notes below for this slide deck which will help guide the overall structure and narrative of the deck.

中文

125

32.1K

turing — e/acc retweetledi

Z.ai@Zai_org·21 Kas

Web Reader MCP Server is now available for GLM Coding Plan Pro & Max users. Unlock full-page web extraction, structured data parsing, and richer automation for your workflows. docs.z.ai/devpack/mcp/re…

English

317

62.3K

turing — e/acc retweetledi

宝玉@dotey·21 Kas

NotebookLM@NotebookLM

Next up… Slide Decks! Turn your sources into a detailed deck for reading OR a set of presentation-ready slides. They are fully customizable, so you can tailor them to any audience, level, and style. Officially rolling out to Pro users now (free users in the coming weeks)!

English

237

2.2K

333.9K

turing — e/acc retweetledi

徐冲浪@cyrilxuq·20 Kas

人生最重要的不是体验，而是构建，体验是伴随品，而且会随着时间边际递减。人生真正让人感到满足的不是“体验过了什么”，而是：我做成了什么我影响了谁我创造了什么我留下了什么

中文

375

39.1K

turing — e/acc retweetledi

宝玉@dotey·4 Kas

所以我从来是说我在践行费曼学习法：我的分享是在帮助我自己学习成长，捎带着分享给别人，不是在改变别人，而是在改变自己。就像我在公司上班，我也不觉得是为老板干活，是老板发工资帮我成长，捎带着我帮老板完成任务，我努力工作是为了自己更好的成长。当事情变成利己，就会更有动力和长久

Andy Stewart@manateelazycat

这是我这几年认识最深刻的事情第一不要改变别人，前面推特也分享了，当你改变别人，其实是投射自己的欲望，欲望（期待）不可得最后生气伤的是自己，把自己的利益转换成别人的期待是最隐蔽的一种深层内耗而第二种内耗就是像低认知的人分享自己的经验你的好心分享，其实会变成低认知人的内心映射，因为他太过于自我，而没有真正的开放你说的任何一句话，都会经过他那颗自恋的大脑转换成另外一个意思（引用推文举例其实非常好）所以说与其和低认知的人争论，还不如说和没有完全开放的人争论，当一个人没有完全开放，他就无法向世界学习用古话说，法不亲传就是这个道理，当你把你的经验无偿的分享出去，在低认知的人看来就是恶这是最隐蔽的第二个人生内耗的原因所以怎么做呢？如果这个人和你有利益关系，你就应该保持距离，每次无语的时候，就告诉自己，他不学习是他的损失，不是你的问题，不要内耗自己如果这个人和你没有利益关系，果断拉黑

中文

255

45.7K

turing — e/acc retweetledi

宝玉@dotey·5 Kas

我们家老大今年暑假找了个实习，一个顶尖的生物实验室。他在这段实习经历最大的收获就是认识了几个真正优秀的人，这对他正面激励很大，他能切实的感受到这些人为什么优秀，能从他们身上学到什么。实习结束他说要改变自己的交际圈子了。进大公司是向优秀人学习的最佳路径之一，就是有时候也要看运气。

Rainman@0xdeusyu

洗澡的时候想到的：《为什么要去大公司：见识决定上限》（Rainman 口述版）我原本打算在工作四周年那天写一篇长文，回顾自己这四年在软件工程这条路上学到的东西。但大纲还没列、资料还没翻，事情又多，所以一直拖着。直到今天洗澡的时候，我突然想通了一件很核心的事—— 为什么我当初一定要去大公司？为什么要去尽可能高、尽可能大的地方？不是为了光环，也不是为了简历。真正的底层原因只有一个：见识会决定一个人的上限。我的起点：一张从重庆飞到上海的机票我第一家实习是戴尔 EMC，在上海。我当时从重庆买了张机票就飞过去，傻乎乎的，什么都不懂。但就是在那里，我人生第一次看到“真正厉害的人”是什么样的。我见过最强的老板：风度、气质、领导力的“三位一体” 那位老板负责整个 VxRail 项目，级别很高，但并不端着。他不会亲自招人，但每个实习生第一天都要到他那儿报到。他会亲自请你在楼下吃个饭，聊聊项目、聊聊你的兴趣。我到现在都记得他给我的第一印象：穿着得体，但不浮夸工程师的风范，但不刻意谦逊、自信、自然气质干净、从容英文毫不费力说话不疾不徐处理事情轻松、稳定像是随时能 hold 住全局的人你一看就知道：这就是常春藤读出来的人，这就是顶级公司里真正的领导者。他身上那种“见过世界”的质感，就是你在任何培训班、任何草台班子里绝对见不到的。那一刻我突然明白了：原来一个人最顶级的状态，是这样子的。锚定上限：从那之后，我知道自己想成为什么样的人后来换了很多老板，也换了很多公司。有人聪明、有人成熟、有人勤奋，但再也没有见过像他这样“全维度满分”的人。他给我定了一个锚点：身材外形得体风度气质自然领导力稳定英文过硬对世界的理解透彻强者的底气 + 谦逊的态度那是我人生第一次看到“上限”。从那以后我很清楚一件事：我要成为这样的人。不只是技术好，而是整体的“人”的状态要强。这就是我为什么一直在整理自己、提升自己。不是虚荣，是—— 你见过什么，你就会渴望成为那样的人。大公司真正意义：不是平台，而是“看到顶级人的机会” 很多人以为去大公司是为了：薪水更高履历更好看资源更多这些都对，但都不是本质。真正的意义是：你能看到什么样的人，你就会变成什么样的人。你见过的“顶”，会成为你余生的参照系。有些团队永远给不出这种参照。你身边是什么人，你就会把天花板看成屋顶。但你只要看过一次真正的顶级强者—— 你的“自我标准”就再也回不去了。

中文

238

58.4K

turing — e/acc retweetledi

凡人小北@frxiaobei·26 Eki

langchain、langgraph 傻傻分不清？来一波理清逻辑：首先，官方是这么介绍的： - LangChain 是 Agent Framework），它提供的是抽象化的模型，比如：结构化内容块、agent 循环、middleware 等，帮助你快上手，统一构建方式。 - LangGraph 是 Agent Runtime，更底层一些，面向生产环境运行的基础设施，比如：支持流式响应、线程级持久化、跨线程状态、human-in-the-loop 等。 - DeepAgents 是 Agent Harness，在框架之上、配置更开箱即用、带默认提示、工具调用管理、规划工具、文件系统访问等。那使用场景是什么呢，这也是很多新手的疑问： - 构建新项目，想快速围绕 LLM + 工具搭起结构：用 LangChain。 - 已经进入生产环境，想做稳定执行、状态持久、弹性伸缩：用 LangGraph。 - 想要有直接用级别的 agent 体验，比如尽量少的自己定义、更多的开箱方案，可以用 DeepAgents。 - 当然，现实中三者边界模糊，比如LangGraph 在某种程度也可视作框架。特别介绍下 DeepAgents，知道的人不多： DeepAgents 是建立在 LangChain 之上的更高抽象层。它能帮我们管好提示、工具调用、文件系统 /规划这些常用场景，是不是就熟悉了，官方称它是 “a general purpose version of Claude Code” 类似方向。如果你不想从零开始构建 agent，DeepAgents 提供的是一个开箱即用但可定制的代理运行环境。关于这三个，一些想法： 1. 边界虽然官方给了 Framework / Runtime / Harness 的区分，但实际工程中经常交叉。一个不恰当的比喻，LangChain 偏脑子部分，LangGraph 偏身子部分，DeepAgents 算是一个人。 2.对于创业或产品原型阶段，先用 LangChain 快速验证agent +工具的组合逻辑，然后当系统进入规模化，开始要求并发和状态管理，这些需求起来了再考虑 LangGraph ；如果想少自己维护又要求高产出，可以迁移到 DeepAgents。 3.比如我怎么用？ •如果我倾向高度控制，能自定义每一步 agent 行为，那 LangChain 是我的首选。 •如果我更关注可扩展性、稳定执行，那 LangGraph 必不可少。 •如果要 mvp 产品化、快推到市场、少投入但快产出，那么 DeepAgents 符合工程即产品的逻辑。 4.最后一点：这三者可以递进或组合，不是必须互斥的，可以想象：基于 LangChain 构建 agent，跑在 LangGraph 上，最终用户体验用 DeepAgents 的开箱化方案。理解这种层次关系，能帮助我们在AI系统设计中选对工具。搞清楚现在是在哪个阶段，搞清楚这几个的层级比记住名字更重要一些。

中文

219

32.7K

turing — e/acc retweetledi

Andrej Karpathy@karpathy·18 Eki

My pleasure to come on Dwarkesh last week, I thought the questions and conversation were really good. I re-watched the pod just now too. First of all, yes I know, and I'm sorry that I speak so fast :). It's to my detriment because sometimes my speaking thread out-executes my thinking thread, so I think I botched a few explanations due to that, and sometimes I was also nervous that I'm going too much on a tangent or too deep into something relatively spurious. Anyway, a few notes/pointers: AGI timelines. My comments on AGI timelines looks to be the most trending part of the early response. This is the "decade of agents" is a reference to this earlier tweet x.com/karpathy/statu… Basically my AI timelines are about 5-10X pessimistic w.r.t. what you'll find in your neighborhood SF AI house party or on your twitter timeline, but still quite optimistic w.r.t. a rising tide of AI deniers and skeptics. The apparent conflict is not: imo we simultaneously 1) saw a huge amount of progress in recent years with LLMs while 2) there is still a lot of work remaining (grunt work, integration work, sensors and actuators to the physical world, societal work, safety and security work (jailbreaks, poisoning, etc.)) and also research to get done before we have an entity that you'd prefer to hire over a person for an arbitrary job in the world. I think that overall, 10 years should otherwise be a very bullish timeline for AGI, it's only in contrast to present hype that it doesn't feel that way. Animals vs Ghosts. My earlier writeup on Sutton's podcast x.com/karpathy/statu… . I am suspicious that there is a single simple algorithm you can let loose on the world and it learns everything from scratch. If someone builds such a thing, I will be wrong and it will be the most incredible breakthrough in AI. In my mind, animals are not an example of this at all - they are prepackaged with a ton of intelligence by evolution and the learning they do is quite minimal overall (example: Zebra at birth). Putting our engineering hats on, we're not going to redo evolution. But with LLMs we have stumbled by an alternative approach to "prepackage" a ton of intelligence in a neural network - not by evolution, but by predicting the next token over the internet. This approach leads to a different kind of entity in the intelligence space. Distinct from animals, more like ghosts or spirits. But we can (and should) make them more animal like over time and in some ways that's what a lot of frontier work is about. On RL. I've critiqued RL a few times already, e.g. x.com/karpathy/statu… . First, you're "sucking supervision through a straw", so I think the signal/flop is very bad. RL is also very noisy because a completion might have lots of errors that might get encourages (if you happen to stumble to the right answer), and conversely brilliant insight tokens that might get discouraged (if you happen to screw up later). Process supervision and LLM judges have issues too. I think we'll see alternative learning paradigms. I am long "agentic interaction" but short "reinforcement learning" x.com/karpathy/statu…. I've seen a number of papers pop up recently that are imo barking up the right tree along the lines of what I called "system prompt learning" x.com/karpathy/statu… , but I think there is also a gap between ideas on arxiv and actual, at scale implementation at an LLM frontier lab that works in a general way. I am overall quite optimistic that we'll see good progress on this dimension of remaining work quite soon, and e.g. I'd even say ChatGPT memory and so on are primordial deployed examples of new learning paradigms. Cognitive core. My earlier post on "cognitive core": x.com/karpathy/statu… , the idea of stripping down LLMs, of making it harder for them to memorize, or actively stripping away their memory, to make them better at generalization. Otherwise they lean too hard on what they've memorized. Humans can't memorize so easily, which now looks more like a feature than a bug by contrast. Maybe the inability to memorize is a kind of regularization. Also my post from a while back on how the trend in model size is "backwards" and why "the models have to first get larger before they can get smaller" x.com/karpathy/statu… Time travel to Yann LeCun 1989. This is the post that I did a very hasty/bad job of describing on the pod: x.com/karpathy/statu… . Basically - how much could you improve Yann LeCun's results with the knowledge of 33 years of algorithmic progress? How constrained were the results by each of algorithms, data, and compute? Case study there of. nanochat. My end-to-end implementation of the ChatGPT training/inference pipeline (the bare essentials) x.com/karpathy/statu… On LLM agents. My critique of the industry is more in overshooting the tooling w.r.t. present capability. I live in what I view as an intermediate world where I want to collaborate with LLMs and where our pros/cons are matched up. The industry lives in a future where fully autonomous entities collaborate in parallel to write all the code and humans are useless. For example, I don't want an Agent that goes off for 20 minutes and comes back with 1,000 lines of code. I certainly don't feel ready to supervise a team of 10 of them. I'd like to go in chunks that I can keep in my head, where an LLM explains the code that it is writing. I'd like it to prove to me that what it did is correct, I want it to pull the API docs and show me that it used things correctly. I want it to make fewer assumptions and ask/collaborate with me when not sure about something. I want to learn along the way and become better as a programmer, not just get served mountains of code that I'm told works. I just think the tools should be more realistic w.r.t. their capability and how they fit into the industry today, and I fear that if this isn't done well we might end up with mountains of slop accumulating across software, and an increase in vulnerabilities, security breaches and etc. x.com/karpathy/statu… Job automation. How the radiologists are doing great x.com/karpathy/statu… and what jobs are more susceptible to automation and why. Physics. Children should learn physics in early education not because they go on to do physics, but because it is the subject that best boots up a brain. Physicists are the intellectual embryonic stem cell x.com/karpathy/statu… I have a longer post that has been half-written in my drafts for ~year, which I hope to finish soon. Thanks again Dwarkesh for having me over!

Dwarkesh Patel@dwarkesh_sp

The @karpathy interview 0:00:00 – AGI is still a decade away 0:30:33 – LLM cognitive deficits 0:40:53 – RL is terrible 0:50:26 – How do humans learn? 1:07:13 – AGI will blend into 2% GDP growth 1:18:24 – ASI 1:33:38 – Evolution of intelligence & culture 1:43:43 - Why self driving took so long 1:57:08 - Future of education Look up Dwarkesh Podcast on YouTube, Apple Podcasts, Spotify, etc. Enjoy!

English

577

16.8K

4.1M

turing — e/acc retweetledi

indigo@indigox·12 Eki

卓越，始于浓度，而非规模。大公司将你的才华稀释为平庸的平均值；而伟大的团队，则是将卓越的个体浓缩成一股无坚不摧的力量。成功的关键从来不是“有多少人一起干”，而是“和谁一起干”！Steve Jobs 曾说，“一家创业公司的成败取决于最初的十名员工。” 我同意，甚至可以说，更像是前五名。这也是为什么全世界人才密度最高的硅谷，总能诞生改变世界的企业，全世界的创投在那边的成功概率也最高🤔

Y Combinator@ycombinator

Paul Graham on why startup teams can outperform big companies: "small groups can be select."

中文

150

30.3K

turing — e/acc retweetledi

Maxime Rivest 🧙‍♂️🦙🐧@MaximeRivest·12 Eki

Chrome DevTools is, by far, my favorite mcp server. By running these 2 commands in your terminal, you get to have Claude Code come into your browser with full power: Launch a chrome browser: > google-chrome --remote-debugging-port=9222 --user-data-dir="$HOME/.config/google-chrome" Install the MCP that connect to the browser: > claude mcp add chrome-devtools -- npx -y chrome-devtools-mcp@latest -u http://localhost:9222

English

241

3.2K

281.5K

turing — e/acc retweetledi

宝玉@dotey·3 Eki

如果你想开发一个 Agent，无论你是打算做 CLI 还是做 Web 还是 Windows，都可以考虑使用 Claude Agent SDK，和 Claude Code 共享的底层代码，Claude Code 就是基于它之上加了个 CLI 的 UI，也就是说你完全可以基于它写一个 Claude Code 出来。我昨天帮朋友花了几个小时就实现了个简单的 Agent，实现了输入提示词，就可以基于某个没训练的 Design System 写一套 UI 出来。他写的这个 Agent 原理很简单，就是把这套设计系统的所有 Markdown 文档（几百个）放到一个它可以访问的目录，然后在 Systme Prompt 里面引导它去检索这个文档目录。当用户输入提示词或者 Screenshot 要做一个 UI，Agent 就根据提示词规划可能要用到的组件，然后用 SDK 自带的 GREP 工具去检索文档库找到这些组件的 API，最后基于收集到的信息用这个 Design System 组件生成页面。这个 SDK API 很简单，但很强大，你不止是可以用它内置的工具（Task、Grep、WebFetch 等等），你还可以添加自己的工具，还可以用 MCP。并且它可以把整个交互的结果通过 API 让你可以获取到原始的请求和返回消息，这样你可以自己实现一套比 CLI 更好用的交互 UI。当然这个局限也有： 1. 只能用 Claude 模型兼容的 API，如果你想用 GPT-5 之类模型，估计效果不会太好 2. 只支持 Python 和 TypeScript 3. Tokens 消耗飞快如果你只是做前期的 POC，强烈建议你试试。

Claude@claudeai

The Claude Agent SDK gives you access to the same core tools, context management systems, and permissions frameworks that power Claude Code. Read how devs are building agents with the SDK: anthropic.com/engineering/bu…

中文

115

678

364.1K

turing — e/acc retweetledi

Greg Brockman@gdb·3 Eki

iteration speed is a superpower

English

247

558

5.1K

363.3K

turing — e/acc retweetledi

AIGCLINK@aigclink·1 Eki

强，字节刚刚开源了一款“ChatGPT Pulse”类工具： MineContext，它会主动推送洞察、日/周总结、待办、活动记录等信息以每日总结、每周回顾、关键Tips或待办事项等形式，主动推送到主页 MineContext具有上下文感知能力，它目前基于屏幕截图+内容理解，看到看懂用户数字世界的上下文，再基于底层的上下文工程框架，进行主动推送未来会支持其他来源的多模态信息，文档、图片、视频、代码、外部应用数据等无感收集，开启后，它会在后台自动收集上下文，无需进行任何额外操作智能浮现，当需要创作或查找资料时，它可以智能浮现出相关历史上下文，以辅助创作比ChatGPT Pulse，MineContext所有的数据都会经过压缩后保存在本地，更具安全性 MineContext等于把沉默的电脑数据变成了随时可用的第二大脑，学生或研究者可以用来辅助构建知识体系，内容创作者可以用来提供灵感、优化工作流等 #MineContext #AI信息推送助手

中文

284

35.8K

turing — e/acc retweetledi

Elon Musk@elonmusk·1 Eki

With rare exception, ideas really are trivial compared to execution. For example, the idea of going to the Moon is simple, but ACTUALLY going to the Moon is staggeringly difficult.

English

12.7K

13.4K

152.7K

19.2M

turing — e/acc retweetledi

海拉鲁编程客@hylarucoder·19 Eyl

刚开始咱想喷 openai 用 rust 来写 codex 会极大的影响开发效率，毕竟当时 openai 的 coding agent 水平确实不太行。今天密集体验了 codex cloud，不得不吹一句 codex rust 版本是真的吊。

中文

145

32.3K

turing — e/acc retweetledi

Reed Lu@reedlu_·8 Eyl

@dotey codex 可以直接简化为使用 `codex --yolo` `codex --yolo` 等价于 `codex --sandbox danger-full-access --ask-for-approval never`

中文

1.1K

turing — e/acc retweetledi

汉松@Yonah_x·16 Eyl

宝玉老师说的对，真正靠谱的方案是用子Agent 来分发子任务。我们医学 DeepResearch 的方案用的就是主子 Agent 的架构。这种方式可以减轻主 Agent 的上下文压力，比如阅读任务有时候要读一篇几十页的论文，把这个任务转交给子 Agent，主 Agent 就只需要处理阅读摘要后的结果，不然主 Agent 是读不了几篇论文的。我们实测效果不错的。

宝玉@dotey

Shopify 分享了他们构建 Agent 的经验，整体架构也是目前主流的 Agentic Loop，就是不停的循环，让大模型判断需要调用什么工具，Agent 去调用工具，根据调用工具的结果看是继续调用工具还是任务完成。他们针对打造 AI 智能体给了4条核心建议 1. 架构简单化，工具要清晰有边界 2. 模块化设计（如即时指令） 3. LLM 评估必须与人类高度相关 4. 提前应对奖励作弊，持续优化评估体系我看下来主要是两点值得借鉴的地方： 1. 工具不要太多，尽量控制在 20 个以内；如果数量太多会极其影响 Agent 的能力，很难精确选择工具那么解决方案是什么呢？不要看他们分享的 JIT 方案，明显是一个过渡性的产物，需要动态的去生成调用工具的指令，为了保证不影响 LLM 的 Cache，还要动态去修改消息历史，过于复杂。真正的靠谱方案其实 PPT 里面也写了（看图3），只是它们还没实现，而实际上 Claude Code 这部分已经很成熟了，就是用 SubAgent（子智能体），通过 Sub Agent 分摊上下文，把一类工具放在一个 SubAgent 中，这样不会影响主 Agent 上下文长度，也可以让子 Agent 有一定自制能力，有点类似于一个公司大了就分部门，每个部门就是一个 SubAgent。

中文

249

41.9K

turing — e/acc retweetledi

Thinking Machines@thinkymachines·10 Eyl

Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to prompt engineering. Here we share what we are working on and connect with the research community frequently and openly. The name Connectionism is a throwback to an earlier era of AI; it was the name of the subfield in the 1980s that studied neural networks and their similarity to biological brains. thinkingmachines.ai/blog/defeating…

English

230

1.3K

7.6K

3.4M

Keşfet

@dotey @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine