PengX

1.5K posts

PengX banner
PengX

PengX

@lipiisme

AI coding enthusiast, and Rust lover. Continue Building https://t.co/V9l0canjIB

上海 Katılım Şubat 2013
2.9K Takip Edilen262 Takipçiler
Michael Guo
Michael Guo@Michaelzsguo·
While DeepSeek is pursuing the goal, my Codex agent and I monitor it in the sidecar and guide or correct it as needed. So I thought I would ask Codex to objectively judge DeepSeek’s capability based on multiple rounds of interaction. Keep in mind, Codex does not know it is talking to DeepSeek. It thought it was another Codex agent. Here is Codex’s evaluation of DeepSeek V4 Pro:
Michael Guo tweet mediaMichael Guo tweet media
English
3
0
11
559
First Squawk
First Squawk@FirstSquawk·
DEEPSEEK IS SET TO CLOSE ITS FIRST FUNDING ROUND SOON, TARGETING AROUND 50 BILLION YUAN.
English
4
1
33
11.3K
徂林
徂林@zuhaitz_zh·
@lipiisme 还不算文化人,只是很好奇 😂
中文
1
0
0
5
徂林
徂林@zuhaitz_zh·
大家好!我是 Zuhaitz。 我是一名系统程序员。我给自己起了一个中文名字,叫 徂林。 以后我会在这里用中文分享技术内容,希望能以此来学习中文。 谢谢大家!
中文
66
3
211
24.1K
PengX
PengX@lipiisme·
@zuhaitz_zh wow,我看了一下,读 cu 。看来你是个文化人
中文
1
0
1
8
PengX
PengX@lipiisme·
@HedgieMarkets In that case, the Agent platform revenue should ramp up fast
English
0
0
0
11
Hedgie
Hedgie@HedgieMarkets·
🦔Microsoft canceled its internal Claude Code licenses this week after token-based billing made the cost untenable, even for a company with effectively infinite cloud resources. Uber's CTO sent an internal memo warning the company burned through its entire 2026 AI budget in just four months. American AI software prices have jumped 20% to 37%, and GitHub (owned by Microsoft) is dropping flat-rate plans for usage-based billing across its products. My Take The AI subsidy era is ending in real time. The same company that put $13 billion into OpenAI and built the Azure infrastructure powering most of Anthropic's compute just looked at the bill from a competitor's coding tool and decided it was not worth paying. That is not a productivity failure on Anthropic's end. Token-based pricing is forcing every enterprise customer to confront the actual cost of running these models at scale, and the number turns out to be far higher than the flat-rate experiments suggested. This ties directly to my Gemini Flash post yesterday. Anthropic, OpenAI, and Google all raised effective prices in the last six months. Enterprises that built workflows assuming AI costs would keep falling are now watching annual budgets evaporate in months. Two outcomes look likely from here. Either enterprises scale back AI usage to fit budgets, which slows the revenue ramp the labs need to justify their valuations ahead of IPOs, or the labs cut prices and absorb the losses, which makes the unit economics worse at exactly the wrong moment. Both paths land in the same place, the numbers stop working, and somebody has to take the writedown. Hedgie🤗
Hedgie tweet media
English
983
3.7K
18.3K
7.1M
PengX retweetledi
antirez
antirez@antirez·
It's official, @AMD is kindly sending me Strix Halo so I'll be able to support ROCm and the Halo in particular for DS4. This means I'll be able to merge the "rocm" branch and the other optimizations the community is doing, and to make sure it does not break after I change stuff.
English
25
30
748
27.6K
PengX
PengX@lipiisme·
@antirez Inference inside the agent is a good paradigm.
English
0
0
0
95
antirez
antirez@antirez·
ds4-agent, the coding agent part of ds4, is starting to work. It is a very low-latency experience because the inference is inside the agent itself. If you try it, tell me your feelings and what you would like to see implemented 🤖🤖🤖
English
17
12
279
14.7K
PengX
PengX@lipiisme·
@rakyll how to do isolation
English
0
0
0
144
Jaana Dogan ヤナ ドガン
🌟 Today, we are releasing Google’s open source distributed agent runtime. Agent Executor (AX) is a general purpose runtime and aims to solve dynamic scheduling, resumption, auto recovery, auditing, and trajectory branching from kernel snapshots in agentic workloads.
Jaana Dogan ヤナ ドガン tweet media
English
20
41
275
33.1K
PengX retweetledi
Michael Guo
Michael Guo@Michaelzsguo·
To help understand @antirez’s new invention around a local DeepSeek model and agent, here is an illustration of how it works. Again, this shows that the harness, the agent layer, is just as important as the model. A better, or simply better-fit, agent lets you get more out of the same model. I saw the same thing earlier in the very different experiences I had running Qwen through Claude Code, Codex, Qwen Code, and on the Pi. Can't wait for the ds4-agent release.
Michael Guo tweet media
antirez@antirez

New blog post: a new EDIT tool for LLM agents: antirez.com/news/166

English
0
3
30
2.1K
SzymonOzog
SzymonOzog@SzymonOzog_·
The urge to drop the entire stack and rewrite everything with C++ and megakernels
English
8
3
59
3.3K
PengX
PengX@lipiisme·
构建的热潮开始衰退,因为大家发现,构建一个真正可用的产品不那么容易 那这是时候,帮助简化构建的门槛,提高构建产品的质量的工具就比较重要了
中文
0
0
1
14
PengX
PengX@lipiisme·
实际上现在的状况是这样:发布了新模型、或者发布了新的 GPU。推理引擎就需要适配模型,优化算子、适配硬件,然后稳定运行 vllm/sglang 主要精力是 Nvidia 最新的硬件和最新的模型 那小众的模型和小众的硬件谁来做呢,就只能是硬件厂商自己来折腾 这样导致的问题就是:更大的碎片化,N*M 的模型对硬件的排列组合巨大 而对性能的巨大要求,导致引擎疯狂的追求压榨硬件能力,增加了更多独特的优化方法。这就更加造成无法形成统一的标准 Agent 的加入增加了碎片化,Agent 如果也需要硬件来支持,例如 KV Cache 等 碎片化到最后大家可能都受不了了,就会形成一些局部的没有竞争壁垒的标准,减少重复工作量 在这个情况下,定义好“不构成核心竞争壁垒、但能极大减少重复劳动”的抽象层反倒是更受欢迎
中文
0
0
0
33
PengX retweetledi
鸭哥
鸭哥@grapeot·
Agent Runtime 正在成为 AI 的下一个主战场。 Cline 在 Terminal-Bench 2.0 上跑了一组关键数据:同一个 claude-opus-4.7,在 Cline 上是 74.2%,在 Claude Code 上是 69.4%。4.8 个百分点的差距,大致相当于把模型从 opus-4.6 升到 4.7 的收益。Cline 自己的 hill climbing 实验更极端——不换模型,只优化 harness 的 prompt、工具定义和上下文管理,从 47% 拉到 57%,+10pp。 自上而下的信号同样强烈。DeepSeek 正在招 Agent Harness PM(5 月 16 日热招第一,还没招到),OpenAI 成立了 Deployment Co. 做全栈 Agent 服务,Anthropic 发布了 Claude Cowork 和 Partner Network。所有模型公司都在往下游走。 驱动这场迁移的逻辑很清楚:token 价格正在归零(DeepSeek V4-Flash 推理成本只有 GPT-5.5 的 1/107),模型层的护城河也在消失——一旦 harness 做得足够好,换 provider 几乎无摩擦。价值捕获只能向上走,而 runtime 层是唯一能建立切换成本的地方。 全文分析了 runtime 层四个关键设计决策(prompt、工具定义、上下文管理、错误反馈),横向对比了市场上已有的 agent runtime(Cline SDK、Claude Agent SDK、Codex SDK、LangChain Deep Agents、OpenAI Symphony),以及对中国 builder 来说这意味着什么。 yage.ai/share/agent-ru…
中文
14
19
149
15.5K
思维怪怪
思维怪怪@0xLogicrw·
xAI 开启首款命令行 Agent 工具 Grok Build 的 Beta 测试。工具聚焦代码编写、应用构建与工作流自动化,目前仅向 SuperGrok Heavy 订阅用户开放。 开发者可直接在终端用自然语言下达需求。Grok Build 收到指令会先输出实施计划,经确认后自动接管后续的代码修改。 面对大型工程,系统会启动并发子体机制。主干任务会被直接拆解给多个平行的子 Agent 协同处理,由不同子体分别负责调试代码延迟、优化部署链路和同步更新文档。 工具原生支持深层工作树(多分支隔离开发)与无头模式(后台静默运行),并内置插件市场。
xAI@xai

An early beta of Grok Build, an agentic CLI for coding, building apps, and automating workflows is now available for SuperGrok Heavy subscribers. Through this early beta, we will improve the model and product based on your feedback. Try it at x.ai/cli

中文
1
0
1
1.7K
PengX retweetledi
思维怪怪
思维怪怪@0xLogicrw·
AI 基础设施平台 Baseten 主营模型微调与部署。他们的主管 Charlie O'Neill 发现,包括 Cursor、Notion 和 Cognition 在内的头部 AI 应用目前正不约而同地放弃调用大厂通用 API,转而基于开源权重后训练专属模型。 首先是因为在特定细分场景下,专属模型的表现已能比肩甚至超越前沿大模型,而运行成本却降至十分之一。而更核心的优势在于应用层掌握着独家的数据反馈。 以 Cursor 为例,用户最终保留或删除的代码就是最精准的强化学习奖励信号。依靠真实工作流积累数据来迭代模型,大厂仅凭公开语料堆叠算力已难以追赶。 大模型厂商的商业模式是用单一通用模型服务所有客户,这决定了他们无法深入垂直场景建立专有的数据闭环。随着开发门槛不断降低,AI 应用维持竞争力的唯一路径就是将模型与核心业务深度绑定。 独占大厂无法获取的真实用户交互数据,正成为这些应用不可复制的护城河。
Charlie O'Neill@oneill_c

x.com/i/article/2054…

中文
0
1
2
692
Andrew Lamb
Andrew Lamb@andrewlamb1111·
A new Database product based on @ApacheDataFusio was announced today from @LangChain -- focused on agent observability. It is really neat to see how people are building (very) customized data + query systems faster than ever now that they don't have to build the whole stack
LangChain@LangChain

Just announced at Interrupt! SmithDB. Agent traces have outgrown the databases built to hold them. That’s why we built SmithDB, a purpose-built distributed database for agent observability. Read the announcement from Co-Founder @ankush_gola11langchain.com/blog/introduci…

English
8
13
92
17.7K
Paul Iusztin
Paul Iusztin@pauliusztin_·
Refined my workshop on multi-agent systems from scratch after doing it at the @aiDotEngineer and Uphill conferences In the latest iteration, we introduced an `implement_yourself` option that lets you implement everything yourself using agentic coding best practices. In other words, you build agents with agents. We added all the skills and subagents to incrementally build elements from the system, read it, understand it and run it, until you have the e2e system working. After iterating and refining it with @Whats_AI so many times, it's an amazing free resource to get into: - MCP servers - Designing agents - Adding observability and evals You can find everything on GitHub: github.com/iusztinpaul/de…
Paul Iusztin tweet mediaPaul Iusztin tweet media
English
2
4
36
1.4K