breezedeus
533 posts

breezedeus
@breezedeus
AI连续创业(未成功)者,不过依旧坚信善意的AI能生产幸福。

Wow, this tweet went very viral! I wanted share a possibly slightly improved version of the tweet in an "idea file". The idea of the idea file is that in this era of LLM agents, there is less of a point/need of sharing the specific code/app, you just share the idea, then the other person's agent customizes & builds it for your specific needs. So here's the idea in a gist format: gist.github.com/karpathy/442a6… You can give this to your agent and it can build you your own LLM wiki and guide you on how to use it etc. It's intentionally kept a little bit abstract/vague because there are so many directions to take this in. And ofc, people can adjust the idea or contribute their own in the Discussion which is cool.


🧠💥今天英语科技圈彻底炸锅。 @eonsys 公布了一个极具历史节点意义的 demo: 他们用 12.5 万个真实神经元连线(Connectome),在虚拟世界里“复活”了一只果蝇。 一条截然不同的智能路线正在成型。 视频不是动画模拟👇 1️⃣ 到底发生了什么? 研究人员把真实果蝇的大脑完整连接图,接入了基于 MuJoCo 物理引擎的虚拟躯体。 结果,这只“数字果蝇”开始自主走路、整理身体(grooming)、甚至进食。 形成了一个真正的闭环:感觉输入 -> 神经元网络放电 -> 动作输出。 2️⃣ 为什么这比大模型更让人震撼? 过去几年,主流 AI 的核心路径是“拟合行为输出”(学它看起来像什么 / what it does),本质上是在拿海量数据训练一个像果蝇的策略网络。 但这次的逻辑变了。 研究者是从生物本体出发(重建它本身是什么 / what it is)。 没有任何行为数据的强化学习训练。 当真实的生物脑结构在代码中被精确跑起来之后,那些复杂的生物行为,就直接从神经连线里自然涌现出来了。 3️⃣ 它的终极意义 脑仿真真正跨过门槛的一刻,“技术奇点”不再是深度学习的专属。 它向我们证明了一件极其重要的事情: 一旦神经结构足够完整,行为本身就是可计算、可复现、可嵌入身体的。




Anthropic 和五角大楼这周彻底闹翻了。五角大楼要求 Claude 向军方开放所有合法用途,不设任何限制。 Anthropic 拒绝了,坚持 AI 不能用于大规模监控美国公民,也不能用于全自主致命武器。 五角大楼给了最后通牒,限今天下午 5:01 前答复,否则取消 2 亿美元合同,还威胁动用《国防生产法》,把 Anthropic 列为供应链风险。 Dario Amodei 公开回应:我们在良心上无法答应他们的要求。 Trump 随后在 Truth Social 发帖,命令所有联邦机构立即停用 Anthropic 技术。 Sam Altman 今天也表态了。他给 OpenAI 全员发了内部备忘录: 不管事情怎么走到这一步,这已经不只是 Anthropic 和五角大楼之间的问题了,这是整个行业的问题。我们一直认为 AI 不应被用于大规模监控或自主致命武器。这些是我们的主要红线。 OpenAI 和 Google 的员工也联合签署了声援 Anthropic 的公开信。 Ilya Sutskever 对此发了一条推文,大意如下: Anthropic 没有退让,这非常好。OpenAI 采取了类似立场,这很有分量。未来还会有更多这种性质的挑战,而且会更严峻。到那时候,相关领导者必须挺身而出,激烈的竞争对手必须放下分歧。很高兴看到今天这一幕。




AI phones - large models or small models? We recently open-sourced OpenPhone📱— a 3B parameter mobile agent foundation model! After a year of trial and error, here's what we learned about AI phones ✨ Open-Sourced AI Phone Agents: github.com/HKUDS/OpenPhone 🤔 How do AI phones actually work? Simple: AI helps you operate your phone. But how does AI communicate with different apps? Option 1: API Calls 🔌 Ideally, we'd just call app APIs directly. Reality check — there are basically none! Big tech won't open their APIs because apps ARE their traffic moat. Building individual MCPs for each app? Engineering nightmare 💥 Option 2: GUI Interaction 🖱️ Since no APIs, let's do what humans do — look at screens and tap stuff. This approach is super generalizable, should work with any app. That's why most AI phones go the GUI Agent route now. GUI Agents are basically multi-modal models: - Input: screenshot + task description - Output: coordinates for next tap - Capability: screen understanding + task reasoning 📱 Three technical approaches for Phone Agents - Pure cloud ☁️ What most AI phones do currently — heavily rely on cloud-based large models. Performance is definitely better than small models, but privacy🔒 and cost💰 concerns are real. - Pure on-device models 📱 This is the direction OpenPhone is exploring. 3B parameters strikes a good balance — runs on phones, fast, private, and cost-effective. The trade-off is limited performance on complex tasks, given it's only 3B parameters. - Hybrid edge-cloud 🤝 Probably the most practical route. Simple stuff and anything privacy-sensitive stays on-device, complex reasoning hits the cloud. The trick is the routing strategy — when to make the switch? Interesting part is teaching the on-device model to recognize its own capability boundaries. 🔮 Some Random thoughts 1. GUI Agents still have plenty of issues: slow, error-prone, multi-app accuracy sucks. Rich MCP ecosystem would make life easier, but don't hold your breath. 2. Right now everyone's just collecting data, then SFT+RL to optimize models. Basically throwing data at the problem — hopefully we get smarter ways to do this. 3. AI phone ceiling isn't just tech — it's ecosystem. Future apps might go dual mode: APIs for agents, GUI for humans🚀 4. Computer-Use Agents are shifting toward coding — writing code instead of just clicking around💻, because code execution is way more accurate and efficient. Works great on desktop, mobile's still challenging. 5. Future Digital Agents might need to pack everything into one model: coding + multimodal + tool-use.

强烈推荐 Lenny @lennysan 的这个硅谷产品圈 Podcast 的原始音频文稿资料!!!含金量非常高,一共 320 份 .txt 的字幕原稿。 里面收录了非常多 AI 领域的名人访谈。例如: ✨ 李飞飞 Dr. Fei-Fei Li - AI 教母下个赌注:空间智能 ✨ Alexander Embiricos - OpenAI 内部视角 ✨ Anton Osika - 欧洲增长最快的 AI 公司,实战策略 (这几期链接我放在下方 threads 里面了) 我拿自己本地的 AI 学习工具,分析了原稿内容,生成了详细的调研报告(包含判断信息真伪、补充相关资料等)。文件资料包里包含了: 1. 嘉宾内容分析报告(.md .pdf) 2. 采访稿中英文双语原稿(.md .pdf) Markdown 的版本可以配合 AI 使用,PDF 则比较方便阅读。GitHub 两个格式都有 😎(贴心如我) GitHub 链接:github.com/Penny777btc/le…









