Tim✨

2.6K posts

Tim✨

@timyangnet

Co-Founder Westar Labs | 🛠️ $STC & AI Explorer | Ex-Chief Architect Weibo (NASDAQ:WB) What we hear is opinion; what we see is perspective. 此有故彼有此生故彼生

Katılım Mayıs 2007

1.2K Takip Edilen10K Takipçiler

Tim✨@timyangnet·4d

这篇文章的讨论很有意思，也就是大部分 AI 用户可能都思考过一个问题：长期依赖 coding agent 工具，会不会让人“降智”，甚至丧失社会价值？文章中的实验数据： 1. Anthropic 2026 年初的随机对照试验：让工程师学习全新的 Python 库，一半有 AI 辅助，一半没有。两组完成任务速度相同，但在随后的理解力测试中，AI 组惨败（50% vs 67%）。最有趣的分化在 AI 组内部：利用 AI 提问概念性问题的得分 65%+；而直接复制粘贴生成代码的得分不到 40%。 2. MIT 的研究《玩转 ChatGPT 的大脑》（Your Brain on ChatGPT）：脑电图（EEG）测量显示，外部支持每多一层，大脑的连接度就下降一阶。写完文章后，83% 的大模型使用者甚至无法背诵出自己刚刚写下的哪怕一句话。研究人员称之为认知债（cognitive debt）：今天省下的脑力，明天要在批判性思维上加倍偿还。这些实验证实了 AI 确实在悄悄侵蚀我们的思考。但硬币的另一面，则是那个同样现实的拷问：“既然 AI 都能做了，为什么我还需要理解它？” 这个反向逻辑也完全成立：AI 已经可以高效解决这些问题，且未来的 AI 只会更强大。在这种情况下，即使增加人脑的彻底理解，对任务最终的交付以及社会生产力的提升，似乎并无实质帮助。所以，在 AI 时代人类究竟是否还需要学习？如果需要，又该学习什么？文章有一些结论，但是感觉这依然是一个长期留给所有人去探索的开放命题。

Addy Osmani@addyosmani

x.com/i/article/2055…

中文

827

Tim✨@timyangnet·4d

看起来有吸引力 > pluck 是一个用 Rust 编写的代码检索引擎，原生支持 MCP（Model Context Protocol，模型上下文协议），专为 AI Agent 智能阅读和导航代码而设计。实际测试结果：在典型的代码读取场景中，可减少 84% 到 88% 的 token 消耗，同时保持（甚至提升）对核心关键内容的理解能力。与其直接提供完整文件或随意的代码片段，pluck 使用 Tree-sitter 进行 AST（抽象语法树）级别的分块（chunking）。它能理解函数、类、逻辑块和函数签名。然后，它完美融合了两个领域的优势： - 高级关键词搜索：（带字段的 BM25F） - 语义重排：采用静态嵌入（无需运行时推理）该系统采用两阶段级联 + RRF（倒数排名融合）机制，使自然语言查询和符号搜索的表现同样出色。所有内容都在本地的常驻守护进程（daemon）中进行索引，热响应时间仅为 0.07 毫秒（热启动 p50）。此外，还有一个非常强大的额外功能：基于会话的去重。如果在之前的交互中已经展示过某个代码块，它会用一个轻量级的占位符代替。这在多轮对话中能额外带来 23% 的 token 节省。

Erick@ErickSky

¿Alguna vez has visto cómo un agente de IA se queda corto de contexto… no porque le falte inteligencia, sino porque desperdicia miles de tokens solo para leer código? Usa cat o grep un par de veces en un archivo mediano y, de repente, la ventana de contexto ya está medio llena de información que ni siquiera necesitaba. Hazlo en la siguiente pregunta y el problema se multiplica. En codebases reales, esto se convierte en el cuello de botella silencioso que limita lo que los agentes pueden lograr. [pluck] llega para cambiar esa ecuación. Es un motor de recuperación de código escrito en Rust, nativo de MCP (Model Context Protocol), diseñado específicamente para que los agentes de IA lean y naveguen código de forma inteligente. El resultado medido: entre un 84 % y un 88 % menos tokens en lecturas típicas de código, manteniendo (o incluso mejorando) la capacidad de entender lo que realmente importa. En vez de servir archivos completos o fragmentos arbitrarios, pluck hace chunking a nivel AST con Tree-sitter. Entiende funciones, clases, bloques lógicos y firmas. Luego combina dos mundos: - Búsqueda por palabras clave avanzada (BM25F con campos) - Ranking semántico con embeddings estáticos (sin inferencia en tiempo de ejecución) El sistema usa una cascada de dos etapas + fusión RRF para que las consultas en lenguaje natural y las búsquedas simbólicas funcionen igual de bien. Todo indexado localmente en un daemon persistente que responde en 0,07 ms (p50 en caliente). Y hay una capa extra muy potente: deduplicación por sesión. Si ya mostraste un chunk en una interacción anterior, lo reemplaza por un placeholder ligero. Eso añade otro 23 % de ahorro en conversaciones multi-turno. pluck no se limita a "buscar". Ofrece un conjunto rico de operaciones pensadas para agentes: - read → devuelve un outline inteligente (firmas + cuerpos de helpers inline). Ahorro típico del 85-88 % en archivos grandes. - symbol, peek, impact y deps → navegas grafos de llamadas, imports y dependencias sin tener que reconstruirlos tú. - digest → comprime logs de CI y tests manteniendo los errores clave (71 % menos tokens). - plan → sugiere los siguientes 3-5 pasos de exploración que el agente debería dar. Y lo más importante: siempre existe el modo --raw que devuelve exactamente lo mismo que cat o grep byte por byte. Nunca pierdes la capacidad original. Es un reemplazo inteligente, no una limitación. pluck no es "otra herramienta de búsqueda". Es infraestructura pensada para la era de los agentes de coding. Mientras más eficientes sean recuperando contexto relevante, más lejos podrán llegar antes de chocar contra los límites de tokens. REPOOO👇

中文

3.8K

Tim✨@timyangnet·5d

这个 token 消耗比较的图片对比非常赞，HTML 的 token 消耗肯定大得多，尤其 css 等样式越复杂就越明显。但从用户体验（UX）来看，用户天然喜欢富格式内容。就像短视频里大量出现的 1-2 分钟的真人口播，传递的不过是 140 字的信息，流量消耗却是文字的上万倍。所以从这个角度看，即便会带来 3-5 倍的 token 溢价，用户更多场景选择 HTML 格式可能依然是大势所趋，尤其是 agent 不再局限于 TUI 界面之后。

God of Prompt@godofprompt

Anthropic is pushing HTML artifacts as the future of AI workflows. What they're not telling you: a markdown report costs ~800 tokens. The same content in styled HTML costs 2,500-4,000. That's 3-5x more tokens burned on divs and CSS instead of reasoning and depth. More tokens spent per task means more API calls. More API calls means more revenue. The incentive is right there. I steelmanned every major argument for HTML-first workflows and pressure-tested what holds up. One out of five survived.

中文

986

Tim✨@timyangnet·5d

作为一个 Agent 用户，语言性能在 Agent 时代早已不是最大瓶颈，不太认同“mistake”的说法用 Claude Code / Codex 时，卡住大家的常常是 Harness 编排、Agent Loop 和模型延迟。代码层面的执行速度虽有差异，但大头肯定不在这。原推觉得 npm install 是分发摩擦，但对 agent 来说，海量的 npm 依赖反而是巨大的红利。复杂任务和 Skill 需要丰富的第三方库支撑，这也是大部分 agent 无法离开 npm 或者 uv 的核心原因。

Paul Iusztin@pauliusztin_

Most agentic CLIs are built in TypeScript. Here’s why that’s a mistake (and you should use Go instead): We'll use Michael @maximilien as an example… He built his Weave CLI with Go. He's also the former CTO at IBM and former Chairperson of the NodeJS Foundation. So this is not a “TypeScript is bad” take... He knows the ecosystem deeply. But when he started building Weave CLI, an open-source tool for production RAG across 11 vector databases, the constraints were different. He needed something that could run anywhere. And this is where Go shines. It has no: • npm install • Python virtual envs • uv issues • JVM setup • Broken package registries • On-prem network restrictions Just download the binary, make it executable, and run it. Weave CLI has to: • Spin up vector databases • Ingest documents • Run RAG agents • Compare embeddings • Benchmark configurations • Monitor traces & experiments with Opik by @Cometml For this kind of infrastructure tooling, installation friction is product friction. If users can’t run it easily, they won’t use it. But there's a deeper lesson in this: Don’t pick your stack based on the herd. Pick it based on what the system needs. For frontend-heavy agent apps, TypeScript may be the right choice. For infra-heavy CLIs and TUIs that need to run anywhere, Go is hard to beat. Full Weave CLI case study in Decoding AI Magazine: decodingai.com/p/ship-rag-wit…

中文

4.3K

Tim✨@timyangnet·13 May

这个解读有意思，不知道人类到 AI 带宽主要体现在哪些方面，应该不仅是实时交互吧。 > Thinky 的秘密计划： 1. 提升“人类 <-> AI”的带宽 2. 拔高“人类 + AI”智能的上限 3. 助力人类在新世界中稳居“主角”之位我们目前正处于第一步。交互模型 (Interaction Models) 是为人类打造的卓越实时协作工具。

Soumith Chintala@soumithchintala

Thinky's secret plan: 1: Increase Human<->AI bandwidth 2: Raise ceiling of human+AI intelligence 3: Help humans continue as main-characters in the new world We are at Step 1. Interaction Models are great real-time collaborative tools for humans. Here's a preview:

中文

284

Tim✨@timyangnet·12 May

我现在很多时候打开 Obsidian，其实只是为了预览一下 Markdown。 Codex 改完之后，右侧工作区顺手看一眼，感觉这类轻量预览场景，以后也不用切 Obsidian 了 😀

宝玉@dotey

Codex 的野心，MCP 和 Skill 的下一步这段时间我在密集使用 Codex App、Cursor 等 Agent 应用，有件事越来越觉得有意思。去年大家争的是谁家模型更强，今年争的好像变成了谁家窗口右侧更好用。 Codex、Claude 桌面版、Cursor 3.0、TRAE SOLO，这几家最顶尖的 Agent，在完全没有协商的情况下，几乎同时收敛到了同一个界面布局：左侧是项目和会话列表，中间是和 Agent 的对话，右侧是工作区，放着文件浏览、网页预览、文件变更审查这些功能。肯定不是相互之间的抄袭，更像是当前 Agent 交互的最优解。【1】为什么是三栏传统 Chatbot 只需要两栏，左边会话历史，右边对话窗口，你问它答，用完走人。到了 Agent 时代，Agent 能自己写代码、改文件、调工具了。它做完之后，你得看看有没有做对——右侧工作区就是为这件事出现的。但这只是第一阶段。随着用户越来越多时间是在指挥 Agent，打开 VSCode 这类专业工具的时间自然越来越少。那个问题迟早会冒出来：Agent 帮你写完代码、做完 PPT，你想微调几个字，还要专门切出去打开另一个软件？没有人愿意这样。用户的自然期待是：能不能直接在 Agent 里改？这也是目前 Codex App 呼声最高的功能之一（另一个呼声高的是手机版，马上要出了）。于是各家开始悄悄升级右侧工作区，让它从只能看文件编辑记录，变成了一个多功能区。Codex 在 4 月 16 日的大版本更新里，右侧工作区的改动幅度是所有功能里最大的。交互细节上各家略有差异。Codex 和 Cursor 用 Tab 切换，Claude 用浮动面板。我自己用下来觉得 Codex 最顺手，Claude 的浮动面板方案设计感有余、实用性不足，迟早要改。【2】Codex 的真正野心但如果只把这个变化读成“设计界面进化”，就低估 Codex 了。 Codex 4 月大版本发布时的口号是“Codex for (almost) everything”——几乎任何任务都能做。你可以把它理解成一句广告口号，但更像是一个产品方向的声明。要兑现这句话，Codex 不能只是个擅长写代码的 Agent，它必须能处理各种文件格式，支持各领域的专业工作流，还要让用户能在它里面完成全程闭环，包括最后的人工微调。目前 Codex 还做不到最后一步：生成之后无法编辑，代码、Markdown、PPTX 都不行。这可能是产品上有意为之的克制，可能是技术上还没跑通，也可能是在等一个统一的解决方案出现。我猜是第三种。【3】MCP 和 Skill 都只解决了一半要理解 Codex 在等什么，得先想清楚 Agent 能力拼图里现在差哪一块。 MCP 解决了“连接”问题：Agent 通过统一规范接入各种工具，数据库、日历、代码仓库，都能打通。 Agent Skills 解决了“怎么做”的问题：Agent 学会了它没训练过的领域知识和最佳实践，比如怎么写特定风格的文章，怎么处理某类复杂任务。这两件事做得都还不错。但有一块缺口始终没补上：用户的二次编辑。你让 AI 写完一篇文章，最后还是要自己打开编辑器改几处，毕竟很多时候最后那 5% 的精准度，只有自己动手才能到位。就算将来 AI 再聪明，它也做不到百分百的懂你，还是少不了要手动去做修改。于是最近 Markdown 编辑器又火了，各种 Vibe Coding 出来的 Markdown 产品满天飞。但 Codex 不会自己做一个 Markdown 编辑器，因为每个人的偏好都不一样，做出来永远有人不满意；更何况它也不可能把每个垂直领域的专业编辑器都集成进来。最合理的路，是插件机制。【4】下一步：Agent 版 App Store 把 Agent 做成平台，让社区来贡献插件，就像 VSCode 和 Chrome 那样。 Codex 只需要聚焦在 Agent 调度这一层，把文件预览、二次编辑、垂直领域的专业能力都交给插件来扩展。用户按需安装，做设计的装设计插件，写作者装写作插件。插件机制还能顺手解决一个长期没有答案的问题：Skill 没办法商业化。我自己的 baoyu-skills 快 2 万 Star 了，但从中赚到的钱是 $0。Skill 这东西几乎是透明的，对 Agent 透明，对人也透明，复刻成本极低，不管你写得再好，护城河都很浅。插件不一样。App Store 和 Chrome 插件市场已经跑通了一套收费和版权保护机制，把它移植到 Agent 插件市场完全可行。好插件可以收费，开发者才有持续打磨的动力，生态才真正能转起来。 Codex 现在已经有了一个非常原始的插件市场。从这里到成熟的收费插件生态，还有很长的路，但方向是对的。想做这件事的不止 Codex 一家。Cursor 我能看到类似的影子。唯独 Claude Code 和 Cowork，目前没看到这个方向的产品迹象——也许他们不屑于做，也许只是还没走到这一步。【5】留给中小团队的窗口如果 Codex 真的跑通了插件生态，对中小团队意味着什么？除了自己做一个垂直 Agent，还有另一条路：在 Codex 这样的平台上做插件。不用自己搭 Agent 调度层，不用解决 Token 接入，用户分发也靠平台。你只需要专注在那个“最后一公里”——帮用户把 Agent 生成的结果处理好、编辑好、用得顺手。这个窗口不会开太久。先进去的能拿到冷启动红利，晚进去的只剩存量竞争。时间点不会太远，也许就在这几个月。 Codex 的野心摆在那里，“几乎任何任务”这个口号要真正兑现，插件机制是绕不过去的一步。如果 OpenAI 在这件事上继续犹豫，那才是真的失误。你觉得这个插件生态最后会是哪家先跑通？或者说你觉得有更适合 Agent 的产品表现形式？欢迎留言分享！

中文

692

Tim✨@timyangnet·9 May

细看了一下 Coinbase 这次故障说明，感觉还是挺有行业共性，也就是现有的庞大系统出现同时掉链子的可能性，将来也许不只是出现在 Coinbase 身上。 1. 混沌测试、容灾演练、故障隔离这些事，设计里似乎大家都有。但真正进入事故现场时，主区内本应隔离的故障没有隔离，自动化也没法处理交易所和 Kafka，真正期望容灾系统能工作时候往往又差点东西； 2. AI 时代，大部分技术团队注意力都转向模型使用和 agent，但交易所这类系统的稳定性，最后仍然压在 Kafka、撮合引擎、Quorum、故障转移、数据复制这些传统基础设施上。它们需要有人长期理解线上故障模式，也需要组织愿意持续为这些无聊但要命的能力投入。但是这些系统假设很长时间也没出问题，这些投入其实很难坚守。 3. 网上还有些 Coinbase 裁员导致技术人力不够的猜测，因为缺少实证，就不展开了。以下为原文的故障事件概览：昨天，@coinbase 经历了长达数小时的服务中断，影响了交易、交易所访问以及余额更新。以下是 Coinbase 工程团队针对事故原因、恢复过程以及改进措施的初步解读。 2026 年 5 月 7 日 UTC 时间约 23:50，我们的监控系统检测到内部服务出现级联报价失败，触发了多起 Sev1（一级严重）事故，工程团队随即展开调查。受影响的业务包括现货交易、Prime 机构业务、国际交易所及衍生品交易所。根本原因根本原因： AWS us-east-1 区域内单栋建筑的部分机架发生了过热事件（冷却系统故障）。为了降低延迟，我们按照行业标准在单个可用区运行交易所基础设施的主副本。虽然我们维持着分布式备用系统以应对此类故障，但在本次事件中，主区内原本应被隔离的故障没有被隔离，从而延长了停机时间。故障沿两条路径级联：撮合引擎底层硬件故障：交易所撮合引擎下方的多个硬件组件失效，需要进行恢复和故障转移。 Kafka 集群可用性中断：负责管理 Coinbase 全系统消息传递的分布式 Kafka 集群无法保持可用，需要将包含数个 TiB 数据量的分区故障转移到新的硬件代理（Brokers）上。恢复过程在隔离事故后，自动化工具从受影响区域清空（Drain）了约 10 个 Kubernetes 集群的相关工作负载，以稳定内部服务。大多数服务在诊断后的 30 分钟内恢复正常。但有两个核心组件无法自动清空：交易所：涉及专用硬件和存储。 Kafka：虽然作为托管服务本身具备韧性设计，但在此次特定故障中遇到了独特的问题。 1. 交易所撮合引擎撮合引擎是负责处理订单和维护订单簿的核心系统。它是一个分布式集群，需要法定人数（Quorum）才能安全地选举出主节点并继续处理交易。事故期间，受灾数据中心的基础设施限制导致健康节点不足，无法达成法定人数。因此，零售、高级和机构交易平台的交易均被阻断。恢复工作要求值班和工程团队执行灾难恢复计划，安全地重建法定人数，并在受限的基础设施条件下验证系统健康状况。团队在管理整体事故的同时，完成修复方案的构建、测试、部署与验证。 2. Kafka 恢复 Kafka 的恢复是一项规模更大的任务。我们的主托管 Kafka 分区每天处理数 TB 的数据，其设计初衷是在数据中心发生此类故障时保证业务不间断。然而在此案例中，这些韧性保证失效了，需要手动恢复。我们再次依靠灾难恢复流程，将停滞的分区恢复到新的硬件（代理）上，从而安全地恢复了 Coinbase 跨服务的消息传递。在同步延迟期间，客户看到的余额更新会有所延迟，这在复制进度追赶上后已自动解决。无任何数据丢失。市场重开流程当撮合引擎根据标准操作手册（Runbooks）恢复后，我们谨慎地重新开放了市场：所有产品进入“仅限取消”（Cancel-only）模式。审计各产品状态。将所有市场转入“集合竞价”（Auction）模式。最后正式恢复 Coinbase Exchange 的全面交易。总结与展望做得好的地方：团队表现出色。全公司的事故响应在几分钟内迅速集结，遵循了演练成熟的预案，并利用安全的自动化工具恢复了所有服务。Coinbase 拥有一支强大的资深团队，能够处理这种罕见的失效模式。致客户：我们深知，账户访问受限（即便只是暂时的）是不可接受的。对此我们深表歉意。我们将在未来几周内发布一份完整的根本原因分析报告（RCA）。

rob@rwitoff

Yesterday @coinbase experienced a multi-hour service disruption affecting trading, exchange access, and balance updates. Here's our initial read from Coinbase engineering on what happened, how we recovered, and what we're addressing. At approximately 23:50 UTC on 2026-05-07, our monitoring detected cascading quote failures from internal services that triggered multiple Sev1 incidents that engineering immediately began investigating. Customer-facing impacts included spot trading, Prime, International and derivative exchanges. Root cause: a thermal event (cooling system failure) inside a subset of racks within a single building in AWS us-east-1. We run a primary replica of our exchange infrastructure in a single zone, consistent with industry standards to reduce latency. To prepare for failures like this, we maintain a distributed standby, but during this incident, failures in the primary zone that were designed to be isolated were not, extending the duration of our outage. The failure cascaded down two paths: 1. Multiple hardware components beneath our exchange’s matching engine failed, requiring recovery and failover 2. Distributed Kafka clusters that manage messaging across Coinbase systems failed to remain available, also requiring partition failovers to new hardware brokers with many TiBs of data After isolating the incident: automated tooling drained ~10 Kubernetes clusters worth of related workloads out of the affected zone to stabilize internal services. Most services were back to normal within ~30 minutes of diagnosis. The two things we couldn't automatically drain: the exchange (dedicated hardware and storage) and Kafka (managed service that was designed to be resilient to this, with unique problems). The exchange matching engine is the core system responsible for processing orders and maintaining order books. It is a distributed cluster and requires quorum to safely elect a leader and continue processing trading activity. During the incident, infrastructure-level constraints in the affected datacenter left only a subset of nodes healthy, preventing the cluster from reaching quorum. As a result, trading across Retail, Advanced, and Institutional exchanges were blocked. Recovery required our oncall and engineering teams to execute our disaster recovery plan, restore quorum safely, and validate system health under constrained infrastructure conditions. The team built, tested, deployed, and validated the fix while continuing to manage the broader incident. Kafka recovery was a much larger scale operation. Our primary managed Kafka partitions process many terabytes of data daily and are designed with resiliency guarantees for uninterrupted operation during a datacenter failure just like this. In this case, those guarantees failed and required manual recovery. We again relied on disaster recovery procedures to recover stuck partitions onto new hardware (brokers) that enabled us to safely bring x-service messaging back online across Coinbase. During the lag, customers saw delayed balance streams which resolved automatically once replication caught up. No data lost. Once the engine came back up as part of our standard runbooks, we re-opened markets carefully: all products to cancel-only mode first, audited product states, then moved all markets to auction mode, before restoring trading on Coinbase Exchange. What went right: the team. Incident response across the company came together within minutes, followed well-rehearsed playbooks and used secure automation tooling to recover all services. We have a strong, senior team at Coinbase that worked through rare failure modes to recover all services. To our customers: losing access to your account, even temporarily, is unacceptable. We know that. We're sorry, and we’ll publish a full root cause analysis in the coming weeks 🙏

中文

731

Tim✨@timyangnet·9 May

一句话结论：这些芯片主要适合推理型任务，xAI 自己暂时用不上……

Jukan@jukan05

Why did xAI hand over a 220,000-GPU cluster to Anthropic? The technical backdrop to xAI's decision to hand Colossus 1 over to Anthropic in its entirety is more interesting than it appears. xAI deployed more than 220,000 NVIDIA GPUs at its Colossus 1 data center in Memphis. Of these, roughly 150,000 are estimated to be H100s, 50,000 H200s, and 20,000 GB200s. In other words, three different generations of silicon are mixed together inside a single cluster — a "heterogeneous architecture." For distributed training, however, this configuration is close to a disaster, according to engineers familiar with the setup. In distributed training, 100,000 GPUs must finish a single step simultaneously before the cluster can advance to the next one. Even if the GB200s finish their computation first, the remaining 99,999 chips have to wait for the slower H100s — or for any GPU that has hit a stack-related snag — to catch up. This is known as the straggler effect. The 11% GPU utilization rate (MFU: the share of theoretical FLOPs actually realized) at xAI recently reported by The Information can be read as the numerical fallout of this problem. It stands in stark contrast to the 40%-plus MFU figures achieved by Meta and Google. The problem runs deeper still. As discussed earlier, NVIDIA's NCCL has traditionally been optimized for a ring topology. It works beautifully at the 1,000–10,000 GPU scale, but once you push into the 100,000-unit range, the latency of data traversing the ring once around becomes punishingly long. GPUs need to churn through computations rapidly to keep MFU high, but while they sit waiting endlessly for data to arrive over the network fabric, more than half of the silicon falls into idle. Google sidestepped this bottleneck with its own custom topology (Google's OCS: Apollo/Palomar), but xAI, by my read, has not yet reached that stage. Layer Blackwell's (GB200) "power smoothing" issue on top, and the picture comes into focus. According to Zeeshan Patel, formerly in charge of multimodal pre-training at xAI, Blackwell GPUs draw power so aggressively that the chip itself includes a hardware feature for smoothing power delivery. xAI's existing software stack, however, was optimized for Hopper and does not understand the characteristics of the new hardware; when it imposes irregular loads on the chip, the silicon physically destructs — literally melts. That means the modeling stack must be rewritten from scratch, which in turn means scaling is far harder than most of us imagine. Pulling all of this together points to a single conclusion. xAI judged that training frontier models on Colossus 1 simply was not efficient enough to be worthwhile. It therefore moved its own training workloads wholesale onto Colossus 2, built as a 100% Blackwell homogeneous cluster. Colossus 1, on the other hand — whose mixed architecture is far less crippling for inference, which parallelizes more forgivingly — was leased in its entirety to an Anthropic that desperately needed inference capacity. Many observers point to what looks like a contradiction: Elon Musk poured enormous capital into building Colossus, only to hand the core asset over to a direct competitor in Anthropic. Others read it as xAI capitulating because it is a "middling frontier lab." But these are surface-level reads. Look at the numbers and a different picture emerges. xAI today holds roughly 550,000+ GPUs in total (on an H100-equivalent performance basis), and Colossus 1 (220,000 units) accounts for only about 40% of the total available capacity. Colossus 2 — built entirely on Blackwell — is already operational and continuing to expand. Elon kept the all-Blackwell homogeneous cluster (Colossus 2) for himself and leased out the older, mixed-generation Colossus 1. In other words, he handed the pain of rewriting the stack — the MFU-11% debacle — to Anthropic, while keeping his own focus on training the next generation of models. The real point, then, is this. Elon's objective appears to be positioning ahead of the SpaceXAI IPO at a $1.75 trillion valuation, currently floated for as early as June. The narrative SpaceXAI now needs is that xAI — long the "sore finger" — is not merely a research lab burning cash, but a business with a "neo-cloud" model in the mold of AWS, capable of leasing surplus assets at high yields. From a cost-of-capital perspective, an "AGI cash incinerator" is far less attractive to investors than a "data-center landlord generating cash." As noted above, the most important detail of the Colossus 1 lease is that it is for inference, not training. Unlike training, inference requires far less tightly synchronized inter-GPU communication. Even when the chips are heterogeneous, the workload parcels out cleanly across them in parallel. The straggler effect — the chief weakness of a mixed cluster — is essentially neutralized for inference workloads. Furthermore, with Anthropic occupying all 220,000 GPUs as a single tenant, the network-switch jitter (unanticipated latency) that arises under multi-tenancy disappears. The two sides' technical weaknesses end up complementing each other almost exactly. One insight follows. As a training cluster mixing H100/H200/GB200, Colossus 1 was an asset that could only deliver an MFU of 11%. The moment it was handed over to a single inference customer, however, that asset transformed into a cash-flow asset rented out at roughly $2.60 per GPU-hour (a weighted average of the lease rates across GPU types). For xAI, what was a "cluster from hell" for training has become a "golden goose" minting $5–6 billion in annual revenue when redeployed for inference. Elon's genius, I would argue, lies not in the model but in this asset-rotation structure. The weight of that $6 billion becomes clearer when set against xAI's income statement. Annualizing xAI's 1Q26 net loss yields roughly $6 billion in losses per year. The $5–6 billion in annual revenue generated by leasing Colossus 1 to Anthropic, in other words, almost perfectly hedges xAI's loss figure. This single deal effectively pulls xAI to break-even. Heading into the SpaceXAI IPO, this functions as a core line of financial defense. From a cost-of-capital standpoint, if the image shifts from "research lab burning cash" to "infrastructure tollgate stably printing $6 billion a year," the entire tone of the offering can change. (May 8, 2026, Mirae Asset Securities)

中文

446

Tim✨@timyangnet·29 Nis

看来处理中文文档不要用 claude

Aran Komatsuzaki@arankomatsuzaki

Follow-up on non-English token-inefficiency with more model-language pairs: - Chinese is cheaper than English on major Chinese models - Gemini and Qwen provide least non-English tax - Anthropic has the highest tax by far; Kimi is next - Hindi is the worst-covered language here, despite its massive speaker base

中文

641

Tim✨@timyangnet·26 Nis

头部的工程师接下来还好，腰部及以下会越来越难。 > AI 将统治逻辑严密、反馈及时的领域（如编程、数学）。人类的护城河则在于“反馈稀疏”的混乱领域：如品味、判断、谈判和处理不确定性。未来属于能驾驭模糊性、掌握隐性知识的人，他们将成为经济的战略咽喉并分得最大红利。无法适应这种转变？那就去学门手艺（比如电工）。

Haseeb ＞|＜@hosseeb

The highest-value human work in the AI era will be in domains with sparse reward signals. Internalize this, or watch your value erode over the next decade. Math, programming, rote memorization, data science, all fucked. The classic “smart nerd” jobs are exactly where AI is strongest, because the feedback loops are dense. You can check the answer. You can run the test. That means AI can improve quickly, and humans will rapidly fall behind. Your advantage as a human is in messy domains. Taste. Judgment. Negotiation. Risk-taking. Politics. Sales. Science at the frontier. Anything you can only really learn by doing. Cross-disciplinary stuff. The valuable domains will be the ones guarded by secrets, tacit knowledge, weak labels, long feedback cycles, and ambiguous outcomes. Places where the training data is scarce, the ground truth is disputed, and it's impossible to explain why something is good. AI will still enter these domains. But we will be slower to trust it unsupervised there, because it will be harder to tell when it is right, harder to prove when it is wrong, and difficult to construct secure sandboxes. The stakes will be too high to YOLO it. I find myself saying this over and over again to young people today: the future does not belong to people who are able to get good grades on tests. It belongs to people who can operate under uncertainty, in domains where correctness is hard to define. Those domains will become the thin waist of the economy: as productivity everywhere else accelerates, the humans who excel there will become our economic Strait of Hormuz. The best humans in these domains will demand an enormous cut of the growing economic pie. Your imperative going forward is to make sure you're one of these people. (Or become an electrician. That probably works too.)

中文

884

Tim✨@timyangnet·26 Nis

非常认可。 “AI 降低了成本，但提高了选择的难度” 虽然 AI 可能会缩短时间线并增加带宽，但我还没看出它如何解决组织所面临的其他问题。选择的必要性依然存在。在某种程度上，随着构建成本的下降，做出选择变得更加重要： - 如果制造更多东西变得更容易，那么制造出错误的东西也会变得更容易。 - 极长的规划周期现在意味着更大的风险，因为技术能力更迭迅速，市场偏好在不断波动，在你完成计划之前，市场可能已经发生了转移。

Karri Saarinen@karrisaarinen

x.com/i/article/2048…

中文

549

Tim✨@timyangnet·26 Nis

适合管理者收藏的一份 1:1 skill。作者观点我很认同：低效 1:1 往往卡在一个细节，管理者准备议程、追状态、给建议，讲了大部分时间；真正需要被支持的人，反而进入被动回答模式。更好的做法，是把所有权交还给对方。让他们会前更新一个小 dashboard：最近状态、优先级、目标进展、胜负手、正在解决的问题、关键关系、自己的成长，以及还需要什么支持。管理者提前读完，会议里少讲，多听，顺着他们的判断继续追问。高质量 1:1 的价值在于训练判断力。很多时候，直接给答案确实省时间，但长期看会让团队依赖你的判断。多问几句「你觉得应该怎么做」「什么阻止了你」「如果这是你的公司呢」，对方才会真正拥有问题。 LLM 也可以参与会前准备：把本周 dashboard 和过去几次 1:1 笔记脱敏后交给它，让它帮你看重复模式和潜台词。它做模式识别，管理者做现场判断。下次 1:1 前，可以试试让对方先准备这份 dashboard，再让 LLM 帮你提前看一遍。

Dave Kline@dklineii

x.com/i/article/2048…

中文

534

Tim✨@timyangnet·26 Nis

原来“不是……而是……” AI 腔的发源地，可能在英文。对应模板就是： “not just a …, it’s a …” 🤣

Bearly AI@bearlyai

The use of the phrase “not just __, it’s a __” (staple of AI-generated text) has risen sharply in SEC company filings:

中文

895

Tim✨@timyangnet·25 Nis

似乎还没有彩票公平？ > Polymarket 的价格在预测未来事件方面具有极高的准确性，但这种准确性的来源却并非显而易见。在一篇最新的工作论文中，我们发现这种准确性并非源于“群体智慧”，而是来自极少数掌握信息的交易者。数据显示，似乎只有不到 3% 的账户在推动价格发现；绝大多数账户的表现并不比随机碰运气好。大多数交易者贡献了绝大部分的交易量，却几乎没有提供有效信息，这在实质上是在为那极少数知情交易者提供资金（即为他们“买单”）。

Roberto Gomez Cram@rgomezcram

Polymarket prices are highly accurate in predicting future events. The source of that accuracy is less obvious. In a new working paper, we find it is not the “wisdom of crowds,” but a small minority of informed traders. Fewer than 3% of accounts appear to drive price discovery; most perform no better than chance. The majority generates most of the volume but little of the information, effectively funding the informed minority. Check the paper here: papers.ssrn.com/sol3/papers.cf…

中文

537

Tim✨@timyangnet·25 Nis

@tinyfool 多语言内容是马斯克不久前推崇的一个系统算法调整，屏蔽单个用户帮助不大

中文

Tinyfool@tinyfool·25 Nis

@timyangnet 不喜欢就隐藏

中文

301

Tim✨@timyangnet·25 Nis

最近不知道从什么时候起，时间线里多了很多日语内容，而且还不知道该怎么移除。问题是，这些内容大多还挺有趣，所以每次都会忍不住看几眼。但仔细想想，它们和抖音上那些“有趣内容”也没什么本质区别：都很抓注意力，都能让人停留，但和我真正关心的事情几乎零相关。

中文

577

Tim✨ retweetledi

Adina Yakup@AdinaYakup·24 Nis

I'm amazed by this line from @deepseek_ai 's official announcement "Not lured by praise, not frightened by slander, follow the righteous path and discipline oneself with integrity" DeepSeek's real edge isn't just the tech, it's the ethos behind the work. And that's what will carry them further than any benchmark ❤️

English

195

1.6K

71.2K

Tim✨@timyangnet·21 Nis

整体看下来，文章里有些点并不算错，但离当下 AI agent 最核心的矛盾感觉还是有点远。目前真正让人兴奋的，主要还是 AI 带来的自动化本身，以及自动化流程里 LLM 参与决策和处理障碍的能力，给它一个意图，能不能自己把事一步步做完。 Agent 领域碰到要解决的，可能还是 workflow 里的 harness、loop 等问题。相比之下，文中想要推的这些基础设施，也可以做，但未必会有多受欢迎。

中文

293

Tim✨@timyangnet·21 Nis

a16z 最新发布的深度文章：AI Agent 缺失的基础设施：区块链可以提供的 5 种帮助。 🧵 为什么 AI Agent 需要区块链？ AI Agent 正在从“copilot”进化为独立的“经济参与者”，但目前的基础设施根本跟不上。Agent 们目前面临着身份缺失、无法开户、难以审计等困境。所以，a16z 认为，区块链不仅仅是未来的愿景，而是 Agent 真正成为经济主体的当下底层协议。以下是 a16zcrypto 总结的区块链助力 AI Agent 的 5 个关键维度：👇 1️⃣ 从 KYC 到 KYA (Know Your Agent) 🆔 目前非人类身份（自动化系统）与人类的比例已达 100:1，但 Agent 在金融系统里依然是“二等公民”。痛点： Agent 缺乏跨平台的便携身份，常被防火墙拦截。方案：区块链提供加密签名凭证。通过“KYA”，Agent 可以证明它是谁、代表谁、有多少权限。 2️⃣ 防止“模型霸权”的治理 ⚖️ 如果 Agent 运行在中心化模型上，最终控制权就在模型公司手中，而非用户。方案：通过链上记录和智能合约，强制 Agent 执行经过验证的结果。核心：确保 Agent 听命于用户的指令，而不是模型提供商的微调参数。 3️⃣ 为“headless (无界面) 商家”设计的支付 💸 Agent 买东西不需要 UI 界面，它们只需要 API 接口和结算。现状：传统支付难以承载高频、微额的 Agent 交易。进展：稳定币（如 USDC）和 x402 协议正成为默认支付层。Stripe、Coinbase 等已在布局，让 Agent 实现真正的“自主消费”。 4️⃣ 信任定价：验证比智能更贵 🛡️ 当 AI 智能变得廉价且泛滥，昂贵的变成了“验证”。洞察： “人类在环(Human-in-the-loop)”已无法跟上 Agent 的速度。方案：区块链将信任硬编码进架构。通过存证（Provenance）和链上审计，我们可以追踪每一项决策的来源，防止“AI 债务”堆积。 5️⃣ 权限委派：把控制权还给用户 🔑 委托 Agent 办事最怕“过度授权”。黑科技：智能合约层面的 Scoped Delegation（限定委派）。效果：用户可以精准定义 Agent 能花多少钱、能进哪些接口。例如 MetaMask 和 Coinbase 的 AgentKit，让权限管理变得透明且不可篡改。 💡 总结 AI 让规模化变得廉价，但难以信任；加密技术则在规模化中重建信任。 Agent 经济的未来，不应是建立在无法支持非人类主体的旧系统之上，而应是一个透明、可审计且用户拥有的去中心化网络。

a16z crypto@a16zcrypto

x.com/i/article/2044…

中文

Tim✨@timyangnet·20 Nis

能否自由使用 claude code，现在也成了一种职业优势了 > DeepMind 的工程师们将 Claude 作为日常工具使用。谷歌的大多数其他部门则没有。当内部提出平等访问权限的问题时，提议的回应是取消所有人的 Claude 使用权，DeepMind 对此强烈反对，据说有几位工程师威胁要离职。

Steve Yegge@Steve_Yegge

My tweet last week about Google's AI adoption drew a lot of pushback, to say the least. Since then, Googlers from multiple orgs have reached out to me independently and anonymously. They've expressed fear of being doxxed, concern about what they saw as bullying of me, and general corroboration of my original tweet. I haven't verified each person's story, but the picture these Googlers paint is consistent across sources. It is more specific than what I originally wrote, and somewhat bleaker. What they describe is a two-tier system. DeepMind engineers use Claude as a daily tool. Most of the rest of Google does not. When the question of equalizing access came up internally, the proposed response was to remove Claude for everyone — which DeepMind objected to so strongly that several engineers reportedly threatened to leave. Non-DeepMind engineers get pushed onto internal Gemini variants behind router-style names that obscure which underlying model is actually serving a request. Multiple engineers describe regressions and reliability problems severe enough that some senior people have stopped using the tools. A senior manager on a major product line reportedly flagged attrition concerns over exactly this issue. Googlers say leadership knows the gap is real. The response has been to mandate AI usage in OKRs and individual expectations, and to stand up an internal token-usage leaderboard. Unfortunately, managers have been told both that the leaderboard won't be used for performance reviews and, separately, that it absolutely will. And I hear other stories that Google's culture is not adapted properly yet for high-volume coding. Addy Osmani's reply on behalf of Google said over 40,000 SWEs use agentic coding weekly. I don't doubt the number. But weekly use of a thin tool is precisely the box-checking I described in the original post. Volume of opens isn't adoption — and "weekly" is a low bar that includes a lot of people who tried it once and went back to writing code by hand. The clearest thing I'm hearing is that Googlers do want to use high-quality agentic tools. They are asking repeatedly for better ones. But overall, this is not a picture of an engineering org that is fine. My goal in the first tweet, and now, is always the same — get more people using AI and agentic coding. Nobody is as far ahead as they might look from the outside, and none of you are as far behind as you might be worried you are. To all the Googlers who've reached out: thank you. You took a real risk and I appreciate you. Be safe. And good luck getting good models!

中文

1.3K

Keşfet

@coinbase @tinyfool @deepseek_ai @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates