roadzhang

1.8K posts

roadzhang

@roadzhang

keep calm, keep simple.

北京, 中华人民共和国 Katılım Eylül 2007

2.2K Takip Edilen111 Takipçiler

roadzhang retweetledi

Andrej Karpathy@karpathy·19 May

Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.

English

7.9K

11.2K

149.3K

27.2M

roadzhang retweetledi

Akshay 🚀@akshay_pachaar·6 Nis

x.com/i/article/2040…

ZXX

452

2.8K

1.5M

roadzhang retweetledi

Thariq@trq212·8 May

HTML is the new markdown. I've stopped writing markdown files for almost everything and switched to using Claude Code to generate HTML for me. This is why.

Thariq@trq212

x.com/i/article/2052…

English

897

12.1K

4.4M

roadzhang retweetledi

Steve Ruiz@steveruizok·10 Nis

Here's how this works btw

Steve Ruiz@steveruizok

a little bit of <canvas> in my <Canvas/>

English

218

34.7K

roadzhang retweetledi

Steve Ruiz@steveruizok·23 Nis

making stuff with claude + tldraw desktop app

English

4.1K

roadzhang retweetledi

Pamela Fox@pamelafox·23 Nis

Great write-up from @dbreunig: "Learnings from a No-Code Library: Keeping the Spec Driven Development Triangle in Sync" dbreunig.com/2026/03/04/the… specs and tests are *not* enough!

English

roadzhang retweetledi

宝玉@dotey·16 Nis

推荐阅读若石写的这篇博客：模型不是笨，是 Harness 没配好 AI 智能体跑十步就崩，很多人第一反应是模型太蠢，但这篇文章却给出另一个视角：不是马不行，是缰绳没拴好。文章提出的 Harness Engineering，你可以理解成给 AI 模型戴上安全带、装上安全气囊的工程实践。过去两年，我们经历了两个阶段：Prompt Engineering（怎么问）、Context Engineering（喂什么料），但它们对付不了模型多步自主执行时的各种意外。文章中有一个生动的例子：让一个智能体写市场分析报告，前三步相当顺利，但到第七步突然开始胡编乱造，因为搜索返回的内容超出上下文窗口被默默截掉了；第十步输出一段残破的 JSON，整条链路就此夭折，只能重头再来。要解决这种问题，Harness Engineering 给出四个简单又实用的原则： 1. 能用代码约束的事儿，别指望模型自觉。比如 JSON 格式，别在提示词里苦口婆心求模型输出合法内容，直接上 Schema 验证器，非法输出直接回炉。 2. 关键状态必须外置，不让模型在脑子里憋着。就像你写代码不会只存在内存里一样，模型跑到哪一步、哪些任务完成了、哪些没做，都记到一个外部的 state.json 文件里，这样即使中途崩了，重新启动后还能接着来。 3. 模型输出不能自卖自夸，必须找第三方验收。永远不要让模型给自己的作业评分，因为它总觉得自己很棒。需要一个独立的 Evaluator 模型，它不看原始思考过程，只对结果验收。最好还真能执行一下（跑跑编译器、打开页面看UI），而不是靠想象力评价。 4. 失败要限制在局部，不能一人出错全家连坐。工具调用失败了，就让这一步重试，别搞得整个流程跟着陪葬。文章后半段还提到了几个反直觉的坑：首先是「上下文焦虑症」。上下文一旦占了 70% 以上，模型就变得焦躁，开始跳步骤、草草收尾，好像急着下班一样。解决办法也很直观：别死守污染的上下文，干脆存盘、清空、重启一个干净实例继续干。其次是「自评骗局」。模型把稀烂的代码夸成“结构清晰、可读性佳”，根本不可信。真实验收标准必须独立而且有执行过程，不然你迟早翻车。最后是「记忆整理周期」。长期运行的智能体日志像凌乱的备忘录，新旧信息打架、浪费 token。这时候要做定期整理，把杂乱的日志压缩成清晰的状态文件。有团队靠这个技巧，把 32K token 的日志压到 7K，还一点不掉关键信息。当然，让你一开始就搭出这种七层塔楼有点难度。文章中提到了个一天内能落地的最小版本： - 一个 state.json 存任务状态； - 工具调用加 try/catch，失败就指数退避重试； - 模型输出全都 Schema 校验； - 工具返回的数据统一截断，绝不爆 token。如果能做到这些，就能大幅提升智能体的任务成功率。推荐阅读原文。

若石@iceboundrock

Agents don't fail because models are weak. They fail because systems are undefined. blog.ltbase.dev/posts/agents/h…

中文

211

897

121.9K

roadzhang retweetledi

Andrej Karpathy@karpathy·4 Nis

Wow, this tweet went very viral! I wanted share a possibly slightly improved version of the tweet in an "idea file". The idea of the idea file is that in this era of LLM agents, there is less of a point/need of sharing the specific code/app, you just share the idea, then the other person's agent customizes & builds it for your specific needs. So here's the idea in a gist format: gist.github.com/karpathy/442a6… You can give this to your agent and it can build you your own LLM wiki and guide you on how to use it etc. It's intentionally kept a little bit abstract/vague because there are so many directions to take this in. And ofc, people can adjust the idea or contribute their own in the Discussion which is cool.

Andrej Karpathy@karpathy

LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge (stored as markdown and images). The latest LLMs are quite good at it. So: Data ingest: I index source documents (articles, papers, repos, datasets, images, etc.) into a raw/ directory, then I use an LLM to incrementally "compile" a wiki, which is just a collection of .md files in a directory structure. The wiki includes summaries of all the data in raw/, backlinks, and then it categorizes data into concepts, writes articles for them, and links them all. To convert web articles into .md files I like to use the Obsidian Web Clipper extension, and then I also use a hotkey to download all the related images to local so that my LLM can easily reference them. IDE: I use Obsidian as the IDE "frontend" where I can view the raw data, the the compiled wiki, and the derived visualizations. Important to note that the LLM writes and maintains all of the data of the wiki, I rarely touch it directly. I've played with a few Obsidian plugins to render and view data in other ways (e.g. Marp for slides). Q&A: Where things get interesting is that once your wiki is big enough (e.g. mine on some recent research is ~100 articles and ~400K words), you can ask your LLM agent all kinds of complex questions against the wiki, and it will go off, research the answers, etc. I thought I had to reach for fancy RAG, but the LLM has been pretty good about auto-maintaining index files and brief summaries of all the documents and it reads all the important related data fairly easily at this ~small scale. Output: Instead of getting answers in text/terminal, I like to have it render markdown files for me, or slide shows (Marp format), or matplotlib images, all of which I then view again in Obsidian. You can imagine many other visual output formats depending on the query. Often, I end up "filing" the outputs back into the wiki to enhance it for further queries. So my own explorations and queries always "add up" in the knowledge base. Linting: I've run some LLM "health checks" over the wiki to e.g. find inconsistent data, impute missing data (with web searchers), find interesting connections for new article candidates, etc., to incrementally clean up the wiki and enhance its overall data integrity. The LLMs are quite good at suggesting further questions to ask and look into. Extra tools: I find myself developing additional tools to process the data, e.g. I vibe coded a small and naive search engine over the wiki, which I both use directly (in a web ui), but more often I want to hand it off to an LLM via CLI as a tool for larger queries. Further explorations: As the repo grows, the natural desire is to also think about synthetic data generation + finetuning to have your LLM "know" the data in its weights instead of just context windows. TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM. I think there is room here for an incredible new product instead of a hacky collection of scripts.

English

1.1K

2.8K

26.7K

7.1M

roadzhang retweetledi

GitHubDaily@GitHub_Daily·31 Ara

开发 AI Agent 时，最头疼的不是模型能力，而是如何让 Agent 在真实场景中稳定工作，各种教程只展示玩具 Demo，实际产品又藏着关键细节。恰巧在 GitHub 上看到 Awesome Agentic Patterns 这份开源资源合集，系统整理了 AI Agent 开发的实战模式和架构技巧。按照编排控制、上下文记忆、反馈循环、工具使用、可靠性评估等维度分类，收录了上百个可复用的 Agent 设计模式。 GitHub：github.com/nibzard/awesom… 每个模式都有真实案例支撑，比如任务分解、子 Agent 生成、工具路由、滑动窗口管理、自愈重试、人机协作等，还包括安全防护和评估方法。项目源自 Sourcegraph 构建 AI 编码 Agent 的实战经验，持续更新中，正在开发 AI Agent 产品或探索落地方案的同学值得收藏。

中文

304

28K

roadzhang retweetledi

tldraw@tldraw·25 Mar

make real real

English

979

83.9K

roadzhang@roadzhang·18 Ara

LLM-based agents for Agent-Based Modeling github.com/SakanaAI/shachi

English

roadzhang@roadzhang·16 Ara

“......LLM generation may not be achieved by generally learning rule sets and strategies, but rather by implicitly learning a class of underlying potential functions that may transcend different LLM architectures and prompt templates.”arxiv.org/html/2512.1004…

English

roadzhang retweetledi

DAIR.AI@dair_ai·12 Ara

Training of Physical Neural Networks Could we train AI models 1000x larger than today's? Could we run them privately on edge devices like smartphones? The answer might be yes, but not with GPUs. This paper suggests that the path forward may require physical neural networks. Physical Neural Networks (PNNs) use properties of physical systems to perform computation. Optical systems, photonics, analog electronics, and even mechanical substrates. Physics can compute certain operations far more efficiently than digital transistors. The problem isn't inference. The problem is training. Backpropagation has powered deep learning's success, but implementing it in physical hardware faces fundamental challenges. Weight transport, gradient communication across layers, and precise knowledge of activation functions. This review maps the landscape of PNN training methods: 1) In-silico training: Create digital twins of physical systems, optimize them computationally, then deploy to hardware. Fast iteration but limited by model fidelity. Fabrication imperfections, misalignments, and detection noise break the digital-physical correspondence. 2) Physics-aware training: Physical system performs forward pass, digital model handles backpropagation. A hybrid approach that mitigates experimental noise while maintaining gradient-based optimization. Successfully demonstrated across optical, mechanical, and electronic systems. 3) Equilibrium Propagation: For energy-based systems that naturally minimize a Lyapunov function. Weight updates use local contrastive rules comparing equilibrium states. Implemented on memristor crossbar arrays with potential energy gains of 4 orders of magnitude versus GPUs. 4) Local learning methods: Avoid global gradient communication entirely. Physical Local Learning uses forward-mode differentiation through physical perturbations. No digital model required. Demonstrated on multimode optical fibers with 10,000+ trainable parameters. The emerging hardware spans optical correlators, photonic integrated circuits, spintronic devices, memristor crossbars, exciton-polariton condensates, and quantum circuits. No method yet scales to backpropagation's performance on digital hardware. But the trajectory is clear: diverse training techniques are converging on practical PNN implementations. As AI scaling hits GPU limits, physical computing offers a path to models orders of magnitude larger and more energy-efficient than what's currently possible. Paper: hal.science/hal-05294738v1… Learn to build with LLMs and AI Agents in our academy: dair-ai.thinkific.com

English

484

30K

roadzhang retweetledi

LangChain@LangChain·13 Ara

📞 Phone Calling Agents Course Made by the LangChain Community Learn to build production AI call centers from scratch. Create voice agents that handle real phone calls through Twilio with real-time conversations and property search capabilities. 📚 Course repo: github.com/neural-maze/re…

English

227

1.4K

80.2K

roadzhang retweetledi

Peter Steinberger 🦞@steipete·12 Ara

People bragging that some harnesses can do multi-agent handoff. Yes, this can be built but folks don't realize the costs: your thinking tokens are likely gone, output of each model will be worse. Ofc that's not something multi-agent harnesses will tell you, but just study the APIs. mariozechner.at/posts/2025-11-…

English

111

32K

roadzhang retweetledi

Simon Willison@simonw·11 Ara

Here's a collection of useful patterns I've found after vibe-coding 150 different single-file HTML tools over the past couple of years simonwillison.net/2025/Dec/10/ht…

English

140

1.4K

293K

roadzhang retweetledi

Surya Ganguli@SuryaGanguli·15 Kas

We have 14 survey lectures for our @SimonsFdn Collaboration on the Physics of Learning and Neural Computation! All videos available at: physicsoflearning.org/webinar-series Here is the list: @zdeborova: Attention-based models and how to solve them using tools from quadratic networks and matrix denoising @KempeLab: Recent lessons from LLM reasoning @MBarkeshli: Sharpness dynamics in neural network training @KrzakalaF: How Do Neural Networks Learn Simple Functions with Gradient Descent? Michael Douglas: Mathematics, Economics and AI Yuhai Tu: Towards a Physics-based Theoretical Foundation for Deep Learning: Stochastic Learning Dynamics and Generalization @SuryaGanguli: An analytic theory of creativity for convolutional diffusion models Eva Silverstein: Hamiltonian dynamics for stabilizing neural simulation-based inference @adnarim066: Generation with Unified Diffusion Bernd Rosenow: Random matrix analysis of neural networks: distinguishing noise from learned information @jhhalverson Nerual networks and conformal field theory @KempeLab Synthetic data: friend or foe in the age of scaling @WyartMatthieu Learning hierarchical representations with deep architectures @CPehlevan Mean-field theory of deep network learning dynamics and applications to neural scaling laws

English

249

22.2K

roadzhang retweetledi

Sebastian Raschka@rasbt·25 Kas

@karpathy Yes, I guess what would make the most sense is a) Learn at home, use AI as a teaching tool to ask questions about the material, have it prep and test you, etc. b) Come to the classroom to do your homework, exams, etc.

English

178

17.1K

roadzhang retweetledi

Andrej Karpathy@karpathy·18 Kas

I’m starting to get into a habit of reading everything (blogs, articles, book chapters,…) with LLMs. Usually pass 1 is manual, then pass 2 “explain/summarize”, pass 3 Q&A. I usually end up with a better/deeper understanding than if I moved on. Growing to among top use cases. On the flip side, if you’re a writer trying to explain/communicate something, we may increasingly see less of a mindset of “I’m writing this for another human” and more “I’m writing this for an LLM”. Because once an LLM “gets it”, it can then target, personalize and serve the idea to its user.

English

596

1.1K

13.4K

2.9M

roadzhang@roadzhang·17 Kas

@aastha_mhaske agent

English

Keşfet

@dbreunig @SimonsFdn @zdeborova @KempeLab @MBarkeshli @KrzakalaF @SuryaGanguli @adnarim066