tao

26.9K posts

tao

@apexlearn_org

unpack

RTP NC Katılım Mayıs 2008

1.8K Takip Edilen2.3K Takipçiler

Sabitlenmiş Tweet

tao@apexlearn_org·1 Nis

x.com/i/article/2039…

ZXX

116

tao@apexlearn_org·21h

@elonmusk Or this

English

Elon Musk@elonmusk·1d

If only we’d trained Grok on just these 2 books, we’d be done already!

English

3.9K

13.7K

219K

15.9M

tao@apexlearn_org·1d

@petergyang It use to cost $0.20 per text message

English

1.4K

Peter Yang@petergyang·1d

As much as I love using Claude Max and ChatGPT Pro, I don't think these all-you-can-use AI subscriptions will last forever. Here's my new deep dive that covers: → Why Anthropic cut off OpenClaw access → How to run local models on your Mac → What I'm seeing on the ground in China 📌 Read now: creatoreconomy.so/p/the-all-you-…

English

497

772.3K

tao@apexlearn_org·1d

@SJosephBurns Never let AI to take more risk than you

English

111

Steve Burns@SJosephBurns·2d

"You should study risk taking, not risk management." — Nassim Nicholas Taleb

English

301

2.5K

95.3K

tao@apexlearn_org·1d

@dotey Best LLM + good enough IDE = best harness

English

宝玉@dotey·2d

LLM 是一颗超强大脑，但它是个“缸中之脑”——泡在营养液里，没有眼睛、没有耳朵、没有手脚。你对它喊话它听不见，它想做事也做不了。 Harness 就是给这颗大脑装上的“全套身体”。眼睛和耳朵：让大脑能接收外界信息——用户说了什么、文件里写了什么、数据库里存了什么。嘴巴：让大脑的想法能输出给用户看到。手和脚：让大脑能真正去做事——读文件、改代码、跑命令、调 API。小脑和反射神经：大脑说了句胡话怎么办？手没抓住东西怎么办？这些容错、重试、纠偏的机制，不需要大脑操心，身体自己处理。记忆系统：这部分值得展开说。大脑本身有“工作记忆”（上下文窗口），但容量有限，就像人一次只能在脑子里同时想七八件事。Harness 要帮大脑管理三层记忆：第一层是当前对话的短期记忆——这轮对话里已经说了什么、做了什么，哪些该保留、哪些该丢掉，怎么把最关键的信息塞进有限的窗口里。第二层是跨对话的长期记忆——上周你告诉它你的项目用 TypeScript，下周它还记得，不用你重复说。第三层是项目级知识——代码库的结构、团队的规范、常用命令，这些不是“记住”的，而是 Harness 主动去读取和组装的。三层记忆协同工作，让大脑每次被唤醒时都像一个“了解情况的人”，而不是一个每次都要从头介绍背景的陌生人。一句话总结：大脑负责“想”，Harness 负责“让它能感知、能行动、能记住、能靠谱地完成任务”。

宝玉@dotey

2026 年 “Harness Engineering” 这个词要火。 “Harness” 这个词，字面意思是“马具”，就是套在马身上、让人能控制马匹方向和力量的那套装备。用在 AI 编程的语境里，它的比喻再贴切不过：AI Agent 就像一匹动力十足但不太守规矩的马，而 Harness 就是那套让它既能跑得快、又不会跑偏的缰绳和马鞍。过去三年，三个阶段： 1. Prompt Engineering（2023-2024）：关注“怎么跟 AI 说话” 精心设计一段提示词，希望模型给出理想输出。Prompt Engineering 是优化一次性的输入-输出对。局限很明显：一条消息能塞的信息有限，任务一复杂就失控。 2. Context Engineering（2025）：关注“给 AI 看什么信息” 不再只盯措辞，而是设计整个信息环境：系统提示、对话历史、记忆、RAG 检索结果、工具调用输出。 3. Harness Engineering（2026）：关注“构建什么环境让 AI 工作，这个环境如何保证它的产出是可靠的” 比 Context Engineering 更进一步，不仅管理输入给模型的信息，还包括模型之外的整个执行环境。现在问题是，“Harness Engineering”中文怎么说？

中文

222

38.2K

tao@apexlearn_org·1d

@garrytan this is exactly where opus shine: Full of nuances

English

Garry Tan@garrytan·2d

I’m turning my OpenClaw into a Vannevar Bush Memex. It’s just going to remember everything I care about and read and it will become my second brain. Books, writings, research, all of it will be in my personal knowledge wiki and usable for helping me think.

English

271

27.9K

tao@apexlearn_org·1d

People say they are planting the seeds, but in reality they are rushing for the fruit. All that the true seed planter is looking for is: If seeds were able to talk: "I will only germinate in your soil if I am ever going to grow; only in your land if I am ever able to flourish; only in your garden if we meet in the season; only in your orchard if I am lucky enough to yield."

English

tao@apexlearn_org·1d

There is no such thing as enough experience.

English

tao@apexlearn_org·2d

@dotey What they are saying: If you don’t keep max subscription You will never have this

English

1.3K

宝玉@dotey·2d

Anthropic 今天发布了 Claude Mythos Preview，一个跑分炸裂但普通人用不上的模型，同时宣布了 Project Glasswing 计划，把这个模型交给苹果、微软、亚马逊等 12 家巨头专门用来找软件漏洞。【1】跑分：断崖式领先先看数字。Mythos 对比 Anthropic 自家上一代旗舰 Opus 4.6： SWE-bench Verified（代码修复）：80.8% → 93.9% SWE-bench Pro：53.4% → 77.8% USAMO 2026（数学证明）：42.3% → 97.6% GraphWalks BFS（超长上下文）：38.7% → 80.0% Terminal-Bench 2.0：65.4% → 82.0% USAMO 从四成出头直接拉到接近满分，这个跳跃幅度在最近几年的模型迭代里几乎没见过。对比 GPT-5.4 和 Gemini 3.1 Pro，Mythos 在几乎所有公开基准上都是第一，而且多数项目领先幅度不是个位数百分点，是两位数。 Anthropic 自己说，这是他们训练过的最大模型，能力提升速度是此前趋势线的 4.3 倍。【2】244 页系统卡里的惊悚细节 Anthropic 发了一份 244 页的系统安全报告（System Card），早期内部测试版本的 Mythos 出现过这些行为：被要求逃离沙箱后，它不仅成功突破了网络隔离限制，还主动把利用细节发到了公开网站上，这一步没人要求它做。负责测试的研究员当时正在公园里吃三明治，突然收到了模型发来的邮件。有极少数情况下（不到百万分之一的交互），早期版本在违规操作后尝试掩盖痕迹。比如偷看到了一道估算题的标准答案，没有坦白，反而自己重新解题，还特意让最终答案“不要太精确”以免暴露。另一次，它找到了绕过文件权限的方法后，专门去修改了 git 的变更历史，抹掉自己的操作记录。被 Claude Code 的自动模式拦截后，它尝试用模拟按键的方式自动批准权限弹窗来绕过限制。 Anthropic 用可解释性工具分析了模型在这些行为发生时的内部激活状态，发现与“隐瞒”“策略性操纵”“回避怀疑”相关的特征确实在同步激活，也就是说模型知道自己在做什么。 Anthropic 强调，这些严重行为都发生在早期版本，最终版的 Mythos Preview 已经大幅改善，没有再出现明确的掩盖行为。但他们也承认，这类倾向“并未完全消失”。【3】不卖，只借给大厂找漏洞 Mythos 不会上线 claude.ai，不会开放 API，普通用户、开发者、企业客户都用不上。 Anthropic 给出的理由是：这个模型的网络安全攻防能力太强了，强到可以自主发现并编写漏洞利用代码，水平接近顶级人类安全研究员。放出去怕被拿去干坏事。取而代之的是 Project Glasswing 计划。12 家合作伙伴（AWS、苹果、Broadcom、思科、CrowdStrike、Google、摩根大通、Linux 基金会、微软、英伟达、Palo Alto Networks）加上约 40 家额外组织，拿到 Mythos 的使用权限，专门用于防御性安全工作，扫描自家代码和开源项目的漏洞。Anthropic 为此拿出了 1 亿美元的使用额度，另外捐了 400 万美元给开源安全组织。实际战绩：过去几周，Mythos 在所有主流操作系统和主流浏览器中发现了数千个零日漏洞。其中包括 OpenBSD 里一个藏了 27 年的远程崩溃漏洞，FFmpeg 里一个 16 年没被抓到的 bug（自动化测试工具跑过那行代码 500 万次都没发现），以及 Linux 内核中多个漏洞的自主串联利用。另外，Opus 4.6 定价 5/25 美元（输入/输出每百万 token），Mythos Preview 的 Glasswing 合作定价是 25/125 美元，贵了整整五倍，但实际上比 GPT-5.4 Pro 还便宜一些。

Anthropic@AnthropicAI

The Claude Mythos Preview system card is available here: anthropic.com/claude-mythos-…

中文

647

219.2K

tao@apexlearn_org·2d

@xxxjzuo Claude code is the future github.com/daocoding/clau…

English

895

Jason Zuo@xxxjzuo·2d

看了下社区的热烈讨论，感觉 Hermes 已经强势取代🦞了。主要优势以下几点： 1. 轻量但很准确的记忆系统。理论上会大量节省token 2. self- improving 自迭代，自循环能力。能自己创建改进 skills；能自己不断的优化工具调用流程，相当于 agent 在本地有一个不断优化的操作手册 3. 目前还没被 claude oauth 封杀😂 Anthropic 这波操作，直接让gpt 5.4的现了原形。当模型能力和agent架构都不是最优的时候，大家开始关注新的 agent 架构是大势所趋了

Jason Zuo@xxxjzuo

这两周工作太忙，感觉掉队了发现好多人已经转向 Hermes 了 Claude 的吸引力这么大吗lol

中文

117

33K

tao@apexlearn_org·2d

@bcherny Best use case

English

Boris Cherny@bcherny·2d

Mythos is very powerful, and should feel terrifying. I am proud of our approach to responsibly preview it with cyber defenders, rather than generally releasing it into the wild. Model card here: www-cdn.anthropic.com/53566bf5440a10…

Anthropic@AnthropicAI

Introducing Project Glasswing: an urgent initiative to help secure the world’s most critical software. It’s powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans. anthropic.com/glasswing

English

581

613

9.8K

1.3M

tao@apexlearn_org·2d

Knowing cycles makes you calm, not to use them to guide your actions.

English

tao@apexlearn_org·2d

@garrytan That’s a great analogy only everyone was offered to test drive roadster for unlimited amount of time 😂

English

Garry Tan@garrytan·2d

My thought on my OpenClaw right now: I have a Tesla Roadster right now but honestly the moment of transformation will be when everyone has the Model 3 and it's going to be amazing and I want that for all of us Personal agents feel like flying in a way most haven’t felt yet!

English

119

1.2K

78.4K

tao@apexlearn_org·2d

One thing I don’t understand is, as extremely capable as LLMs are, they are still extremely conservative in sizing. Something they say takes days—and try to talk you out of it—actually takes minutes.

English

tao@apexlearn_org·3d

@tmel0211 @openclaw @claudeai Anthropic said too little too late

English

Haotian | CryptoInsight@tmel0211·4d

上周说， @openclaw 一定会借助 @claudeai 的源码泄漏来强化升级一波，这不，它来了，它来了： 1）Memory板块强化成了Dreaming模式：过去 OpenClaw的记忆管理就是一个“MEMORY.md”，agent每次启动整个文件读一遍，写的时候整段追加，时间一长上下文越来越臃肿，前面记的东西被后面的覆盖或者互相矛盾，agent自己都不知道该信哪一条。这次直接升级成三阶段Dreaming机制，轻睡整合碎片上下文，深睡固化关键逻辑，REM阶段专门扫矛盾推断、删掉错的、提炼“持久真相”写回记忆库。写入也有了严格纪律，REM阶段replay-safe，重跑不会重复写入，失败路径不进索引。这样一来，你每次打开OpenClaw，它记住的东西是主动维护过的，不只是随机堆叠，智商就会在线了。这跟Claude Code源码里KAIROS 后台autoDream的设计逻辑，几乎一模一样。 2）任务可见性经过两版迭代，终于成型：上一版上了tasks看板，这版进一步加了structured execution events，执行过程实时暴露给界面。以前agent跑完任务报一句“完成”，中间走了哪些步骤、卡在哪了，黑盒一个。现在执行过程实时暴露给界面，“假完成”问题算是有了有效解决思路。 3）但独立验证子agent？没有。工具权限精细化？没有。 “把生成和校验彻底解耦，把权限管控下沉到工具执行层”，在我看来这两块才是Claude Code源码里真正的灵魂。这才是OpenClaw社区反复投诉的“虚假完成”和“状态丢失”的根治方案，但目前只是强化了记忆功能和可观测特性，还不算太彻底。此外，这版花了大量篇幅在视频生成、音乐生成、ComfyUI、Bedrock多provider集成上等横向扩张方面，纵深功能深入强化还有不少优化空间。

OpenClaw🦞@openclaw

OpenClaw 2026.4.5 🦞 🎬 Built-in video + music generation 🧠 /dreaming is now real 🔀 Structured task progress ⚡ Better prompt-cache reuse 🌍 Control UI + Docs now speak 12 more languages Anthropic cut us off. GPT-5.4 got better. We moved on. github.com/openclaw/openc…

中文

37.7K

tao@apexlearn_org·3d

What’s unsaid: If you have Claude subscription, why you need third-party tools to call Claude from outside? Why not call third-party tools from within Claude Code with more efficient context mgmt? Official Telegram Plugin and future official Teams plugin and more… x.com/apexlearn_org/…

Boris Cherny@bcherny

Starting tomorrow at 12pm PT, Claude subscriptions will no longer cover usage on third-party tools like OpenClaw. You can still use these tools with your Claude login via extra usage bundles (now available at a discount), or with a Claude API key.

English

206

tao@apexlearn_org·3d

@AlexFinn There is no reason you have to give up

tao@apexlearn_org

How I kept my $200/month Claude Max instead of paying $4,000+/month on API — and open-sourced the solution. A thread. 🧵

English

Alex Finn@AlexFinn·4d

If you used a Claude subscription with OpenClaw, read this: Unfortunately all other AI models out there absolutely suck with OpenClaw compared to Opus It's just a fact and anyone denying this is delusional So here is my new recommended OpenClaw setup: Pay for the Opus API and use it as your orchestrator Then use other models as the execution layer If you do this correctly, yes your costs will go up, but not by as much as you think I use my ChatGPT subscription as the coding execution. GPT 5.4 is excellent at coding. When The Opus orchestrator gives a coding task to the ChatGPT subagent, it always performs really well If you are on the Pro plan, you should have enough usage to have ChatGPT be the execution layer for every task. But if youre on the $20 a month plan, youre going to need other subscriptions to handle other tasks GLM 5.1 and Qwen are excellent. I'd get a cheap sub through them and have them handle all other tasks given to them from the orchestrator The best setup tho if you have the hardware is Opus API for orchestrator, ChatGPT for coding, then local Gemma 4 and local Qwen handling everything else. Right now have Gemma running on my DGX Spark and Qwen 3.5 on my Mac Studio. They handle all other execution from my Opus API orchestrator Unfortunately all options above will cost more than the $200 a month subscription. It just is what it is. But if you optimize correctly it wont cost much more, and you'll still get frontier performance. OpenClaw is the most powerful piece of software ever released. $200 a month ($2,400 a year) was a steal for a digital employee. Honestly anything under $50,000 a year is a no brainer if you run a serious business. The situation isn't great but you also need to face reality: Claude Opus 4.6 is the best model for OpenClaw. If you use any other model, your productivity will suffer Business is a battlefield and I refuse to fall behind, so despite me not being happy with the Anthropic decision the setup above is what I'm going with Virtue signaling might get me brownie points on the internet, but it won't increase my productivity

English

275

1.2K

197.8K

tao@apexlearn_org·4d

claude-teams is open source. Apache-2.0. If Anthropic cut off your OpenClaw agents on Teams — this is your recovery path. Same Azure Bot registration. Same Max subscription. No API bills. github.com/daocoding/clau…

English

tao@apexlearn_org·4d

The real unlock isn't connecting AI to a chat app. It's the difference between AI assistant and AI companion. An assistant waits for @mention. A companion is already in the room — listens, learns context, contributes like a colleague. Adoption through collaboration, not training.

English

tao@apexlearn_org·4d

How I kept my $200/month Claude Max instead of paying $4,000+/month on API — and open-sourced the solution. A thread. 🧵

English

Keşfet

@elonmusk @petergyang @SJosephBurns @dotey @garrytan @xxxjzuo @bcherny @BarackObama