Edward Z. Yang

8.8K posts

Edward Z. Yang banner
Edward Z. Yang

Edward Z. Yang

@ezyang

I work on PyTorch at Meta. Chatty alt at @difficultyang.

Edison, NJ Katılım Mayıs 2008
1.4K Takip Edilen16.2K Takipçiler
Edward Z. Yang
Edward Z. Yang@ezyang·
@fasttosmile In this style of AI programming, you are in fact reading all the code. But you are "reading" the code, in the same way that when you write code you are implicitly reading it. Fast typist, not autonomous software engineer.
English
0
0
1
101
Rudolf A. Braun
Rudolf A. Braun@fasttosmile·
@ezyang How does one steer without reading the code? Is the idea to focus more on outputs and tests? Or are you saying one should focus more on the higher level abstractions rather than line by line stuff.
English
1
0
0
113
Edward Z. Yang
Edward Z. Yang@ezyang·
@deepestbrew It's cool for me too, when a tool I use that traditionally had terrible error messages gets better messages, not because I was complaining, but because they're trying to get the agents banging on their API to do the right thing.
English
0
0
0
17
Edward Z. Yang
Edward Z. Yang@ezyang·
One of the cool thing about the AI revolution is that spending a lot of effort on error messages is cool again, because the agents actually read the error messages!
English
7
9
110
6.3K
Edward Z. Yang
Edward Z. Yang@ezyang·
Similarly, the inability of LLMs to have memories is a blessing, because you can keep tweaking the error message until the LLM one shots the fix
English
2
0
10
1.5K
Edward Z. Yang
Edward Z. Yang@ezyang·
@Jon85Ma My general attitude is that LLMs haven't changed any of the old problems with formal verification. If the project was amenable to formal verification before, LLMs can help you do it faster and cheaper, pushing it into feasibility. If it wasn't, it still isn't.
English
0
0
1
21
Jon
Jon@Jon85Ma·
@ezyang What about writing proofs with LLM, then having some machine learning for the compilation?
English
1
0
0
21
Edward Z. Yang
Edward Z. Yang@ezyang·
A question of intense interest to me is how compilers (and more specifically, compilers for deep learning) should evolve in the era of LLM coding. 🧵
English
13
18
266
21.8K
Edward Z. Yang
Edward Z. Yang@ezyang·
@jamonholmgren I see that the parent post is about day shift / night shift, though, so there probably won't actually be an opportunity to review plans in this case.
English
1
0
3
376
Edward Z. Yang
Edward Z. Yang@ezyang·
@jamonholmgren The point of reviewing the plan is not to /review review/ it (that's what the code review at the end is for), it's to make sure the agent didn't misinterpret your intention or go completely off the rails.
English
1
0
21
1.8K
Jamon
Jamon@jamonholmgren·
To reiterate a few things so they don't get lost: 1. I never want to review another agent-produced plan again. Waste of my time, overwhelming, not worth it. It's valuable *to the agent*, but not to me. 2. I will burn all the tokens, run all the tests, do all the validations to make sure that when the work product lands on my desk, it's as good as the agents can make it. My time and energy is the most important thing here. 3. The feedback loop is critical: I'll work on the process, docs, and specs as much as I need to, in order to reap the benefits in future sessions. No more manual guidance via interactive sessions (with the exception of exploratory hacking).
Jamon@jamonholmgren

My current agentic workflow is about 5x faster, better quality, I understand the system better, and I’m having fun again. My previous workflows have left me exhausted, overwhelmed, and feeling out of touch with the systems I was building. They also degraded quality too much. This is way better. I’m not ready to describe in detail. It’s still evolving a bit. But I’ll give you a high level here. I call this the Night Shift workflow.

English
33
21
712
124.7K
Edward Z. Yang
Edward Z. Yang@ezyang·
@boyuan_chen I am extremely simulator-pilled but also for some reason no one uses them so there must be some sort of Chesterton's fence effect going on here
English
0
0
0
519
Boyuan (Nemo) Chen
Boyuan (Nemo) Chen@boyuan_chen·
The Triton vs cuteDSL framing in (1) is sharp. Curious where you think the escape hatch should be for graph-level optimizations though - fusion, memory planning, operator scheduling. Those feel harder to verify cheaply than kernel-level perf, which makes the reward signal problem from (2) even worse at higher IR levels.
English
1
0
0
639
Edward Z. Yang
Edward Z. Yang@ezyang·
@spillai I too am puzzled what to do about distribution of libraries of bespoke kernels. Humans hate having to interface with this sort of thing. Perhaps it is best used as grist for the next LLM kernel gen...
English
1
0
2
61
Sudeep Pillai @ GTC2026
There’s at least another 20% more we can squeeze from this approach before we see diminishing returns. The crazy bit here is that we’re really folding all the known optimizations for a specific model architecture which vllm might not be willing to upstream because it becomes too bespoke for general LLM inference.
English
1
0
0
62
Edward Z. Yang
Edward Z. Yang@ezyang·
@spillai Richard was telling me about the kernel bench rms norm: cool stuff, and also a bit scary!
English
0
0
4
753
Sudeep Pillai @ GTC2026
@ezyang Such an insightful thread - we’ve been adding a handful of LLM-driven optimisations to vllm especially for VLM inference, and so far we’re seeing 30-40% improvement in throughput/latency.
English
3
0
8
966
Edward Z. Yang
Edward Z. Yang@ezyang·
Claude's opinion on this thread (Claude doesn't have a personality -- said nobody, ever)
Edward Z. Yang tweet media
English
2
0
17
1.9K