John Lam

7.2K posts

John Lam

@john_lam

I work on building our AI coding experiences @Microsoft. Accidentally co-created GitHub spec-kit a few months ago (https://t.co/B2UXf4w6Cs).

Redmond, WA เข้าร่วม Mart 2008

590 กำลังติดตาม5.4K ผู้ติดตาม

ทวีตที่ปักหมุด

John Lam@john_lam·7 Oca

2026 will be the year of tools for agentic thought

English

895

John Lam@john_lam·17h

i used autoresearch yesterday to refactor a moderately large codebase (90kloc) to reduce complexity. it's not the perfect approach but it's surprising how optimizing for "loc in largest module" helped to drive behavior. the key thing is to reason over all the experiments afterwards - it was good to see that gpt-5.4/xhigh did a good job at reflecting and seeing when it was benchmark/goal maxxing.

English

116

Mario Zechner@badlogicgames·1d

final result: actual code was all garbage, either in quality, or by breaking stuff despite test battery. but it helped identify the biggest win, which i can now just implement on main "manually".

Mario Zechner@badlogicgames

going to try pi-autoresearch to see if we can optimize startup time and memory usage in pi that way. excited to burn some tokens for something worthwhile! github.com/davebcn87/pi-a…

English

236

25.2K

John Lam รีทวีตแล้ว

Dwarkesh Patel@dwarkesh_sp·2d

The Terence Tao episode. We begin with the absolutely ingenious and surprising way in which Kepler discovered the laws of planetary motion. People sometimes say that AI will make especially fast progress at scientific discovery because of tight verification loops. But the story of how we discovered the shape of our solar system shows how the verification loop for correct ideas can be decades (or even millennia) long. During this time, what we know today as the better theory can often actually make worse predictions (Copernicus's model of circular orbits around the sun was actually less accurate than Ptolemy's geocentric model). And the reasons it survives this epistemic hell is some mixture of judgment and heuristics that we don’t even understand well enough to actually articulate, much less codify into an RL loop. Hope you enjoy! 0:00:00 – Kepler was a high temperature LLM 0:11:44 – How would we know if there’s a new unifying concept within heaps of AI slop? 0:26:10 – The deductive overhang 0:30:31 – Selection bias in reported AI discoveries 0:46:43 – AI makes papers richer and broader, but not deeper 0:53:00 – If AI solves a problem, can humans get understanding out of it? 0:59:20 – We need a semi-formal language for the way that scientists actually talk to each other 1:09:48 – How Terry uses his time 1:17:05 – Human-AI hybrids will dominate math for a lot longer Look up Dwarkesh Podcast on YouTube, Apple Podcasts, or Spotify.

English

102

555

3.9K

802.3K

John Lam@john_lam·2d

@rseroter in their methodology they used o1 to evaluate user prompts. so this was likely from gpt-4 or earlier! wonder how much this changed recently?

English

662

Richard Seroter@rseroter·3d

5% of employees are doing sophisticated work with AI—treating as a reasoning partner, delegating complex tasks, ambitious usage—and the best users are above manager level! Interesting data: hbr.org/2026/03/what-t…

English

224

26.7K

John Lam@john_lam·3d

@mitsuhiko @djgrant_ @MKBHD complains bitterly about keeping them clean.

English

Armin Ronacher ⇌@mitsuhiko·3d

@djgrant_ I’ll let you know in a month

English

277

Armin Ronacher ⇌@mitsuhiko·4d

The best upgrade by far on this new Macbook Pro is that I went to nanotexture display. So much better.

English

137

18.9K

John Lam@john_lam·3d

there's a set of things that i don't think will change regardless of how much we simplify things or how much smarter the agents get - mainly getting the human and the agent on the same page. humans have shitty memory, tiny context windows, and are slow (amazing what we can do with our tiny context windows though!). agents have huge context windows, and can read very fast. acknowledging these differences and building tools that enhance collaboration between humans and agents will continue to be a thing.

English

Thorsten Ball@thorstenball·4d

Lately, whenever I open this app and see the latest tricks, and hacks, and notes, and workflows, and spec here and skill there, I can't help but think: All of this will be washed away by the models. Every Markdown file that's precious to you right now will be gone.

English

100

820

104.3K

John Lam@john_lam·4d

the distinction isn't between "natural" and "artificial" or "fluorescent" it is in the spectrum of light that you get, i.e., the distribution of wavelengths and relative intensities. sunlight is a continuous spectrum (all wavelengths represented smoothly) and there are absolutely artificial lights that mimic the solar light spectrum. the light that @SahilBloom is using isn't a "fluorescent" light which is represents only a few wavelengths due to the nature in which it is generated.

English

716

Sahil Bloom@SahilBloom·5d

@davidnimaesq Ok, so please suggest how I get natural light at 430am in Boston at any time of year, then.

English

348

69.2K

David Nima@davidnimaesq·5d

Right idea, but bad methodology. Yes your body wants light in the morning. But that desire is for natural sunlight. Not some artificial fluorescent light purchased from Amazon. This is not a long time sustainable solution that you can do for 50 years. Is the equivalent of eating creatine powder versus natural proteins from eggs. Right idea wrong methodology.

Sahil Bloom@SahilBloom

Random thing that improved my life: I got this ring light that I put next to my desk to shine bright light in my eyes early in the morning. I wake up at 430am and definitely saw an improvement in morning alertness and sleep quality. Also felt like it helped avoid winter lows.

English

135.5K

John Lam@john_lam·5d

i really like this - it feels like you're getting very close to what @lateinteraction is talking about in the recursive language models paper. at some point i think you get to a place where prompts aren't the thing fed to an execution engine, but rather a program that itself can call agents deterministically so you can have flow control, token efficient storage of intermediate results etc. arxiv.org/abs/2512.24601.

English

978

Nico Bailon@nicopreme·5d

pi-prompt-template-model is a pi extension that lets you create slash commands that switch to the right model and config for the job, then auto switch back when it's done. New release adds `--loop` so you can re-run the same prompt multiple times and it automatically stops early when there's nothing left to change. pi install npm:pi-prompt-template-model github.com/nicobailon/pi-…

Nico Bailon@nicopreme

Just added a convenient way to chain prompt templates (slash commands) in Pi coding agent. Each step runs a different prompt template with its own model, skill, and thinking level. pi install npm:pi-prompt-template-model github.com/nicobailon/pi-…

English

152

35.6K

John Lam@john_lam·9 Mar

this was a fun side project this morning. this is a font that lets me type 6502 opcodes like this 3007 C95B B002 0920 60 297F 60 live and watch the opcodes appear - all in textedit on my mac. 100% inspired by github.com/nevesnunes/z80… for z80 and 100% of code written by gpt-5.4/xhigh the gh repo for this is here: github.com/jflam/6502-sans complete with a built ttf in the releases dir that you can try yourself: github.com/jflam/6502-san…

English

229

John Lam@john_lam·5 Mar

I love this. Thank you for sharing! I've been thinking a lot about this recently as well, and describing it as "no plan survives contact with the enemy". ChatGPT explained the origin of the phrase which led me to this great Eisenhower quote: "Plans are useless, but planning is indispensable.".

English

Drew Breunig@dbreunig·4 Mar

Spec driven development isn't a linear process; it's a feedback loop. The act of writing code improves the spec. Just as software doesn't truly work until it meets the real world, a spec doesn't truly work until it's implemented. dbreunig.com/2026/03/04/the…

English

105

9.3K

John Lam@john_lam·3 Mar

agents really don't like removing code. it's a constant struggle to get them to remove code. literally just now when getting rid of a legacy codepath it did so by adding code in the interests of backward compat (not necessarily bad, but i explicitly told it to remove it).

English

Mario Zechner@badlogicgames·3 Mar

the base thesis is that since wr e can shit out huge amounts of code now, therr's no way to review it all via human brains. sure. but nobody seems to ask why we want shit out huge amounts of code in the first place. mythical man month of the AI age.

Latent.Space@latentspacepod

🆕 How to Kill The Code Review latent.space/p/reviews-dead the volume and size of PRs is skyrocketing. @simonw called out StrongDM’s “Dark Factory” last month: no human code, but *also* no human review (!?) in this week’s guest post, @ankitxg makes a 5 step layered playbook for how this can come true.

English

564

55.4K

John Lam@john_lam·23 Şub

@mitsuhiko I wound up using local mcp via stdio which was a transparent proxy with retry logic to get around statefulness. But this needs to be solved at the protocol level.

English

Armin Ronacher ⇌@mitsuhiko·23 Şub

No matter what I try I cannot get it to work nearly as well as skills. It feels like it was designed for agents that don’t write code, but increasingly it looks like even non coding agents will write and execute code.

English

4.5K

Armin Ronacher ⇌@mitsuhiko·23 Şub

The weirdest part about MCP is the actual architecture. It’s rather token inefficient for the harness but it’s also rather resource intensive for the server because it’s stateful. I think if someone were to take last year’s learnings, a new MCP would look dramatically different.

English

172

22.9K

John Lam@john_lam·20 Şub

@Wattenberger i've been saying to all who will listen to me that 2026 is the year of agentic ux! so many problems to solve here.

English

Amelia Wattenberger 🪷@Wattenberger·19 Şub

we're so focused on "smarter / faster teammates" and not enough on how we work with them

Amelia Wattenberger 🪷@Wattenberger

x.com/i/article/2024…

English

9.8K

John Lam@john_lam·18 Şub

@nummanali What about ACP access to claude code?

English

155

Numman Ali@nummanali·18 Şub

Explicitly confirmed, no authorised usage of Claude subscription in: - OpenClaw - Pi Agent - OpenCode - Any 3rd party tool - Agents SDK No OAuth flow is allowed bar within Claude official tools If you do - you’re at high risk of a ban Was good while it lasted, be careful!

Rob Zolkos@robzolkos

Major Claude Code policy clear up from Anthropic: "Using OAuth tokens obtained through Claude Free, Pro, or Max accounts in any other product, tool, or service — including the Agent SDK — is not permitted"

English

166

937

323.5K

John Lam@john_lam·18 Şub

But what happens if your coding work spans multiple GH repos and other places? We need a source of truth that agents can analyze, e.g., "remind me aboutl what decisions we made when we designed foo last week. i think we were designing for X but i need to add Y as a new invariant - write a plan for how we can close this gap".

English

Nicolas Bustamante@nicbstme·18 Şub

@john_lam That’s exactly how I work too. Now my agents generate a lot of logs, MD files, etc., that I commit to GitHub. Git is becoming my source of truth and system of record for my agents’ work.

English

321

John Lam@john_lam·18 Şub

I largely agree with the "end of arcane ui as a moat" part of this article (the rest is great too!) My use of git now looks like: merge the feature-x pr and rebase our new branch against main after the merge Instead of: git checkout main git pull git merge feature-x git push git checkout my-new-branch git rebase main git push --force-with-lease

Nicolas Bustamante@nicbstme

x.com/i/article/2023…

English

1.3K

John Lam@john_lam·15 Şub

Something that I just started doing a couple of days ago that I couldn't before Opus 4.6 / Codex 5.3: I give my agents goals now. Instead of telling them what to do, encourage them to exercise their agency. After watching some debugging sessions, I realized that the last thing you should be doing is micro managing the agent. Instead say: "Hillclimb on doing X. Don't stop until you have reached Y. Do experiments along the way to validate ideas that you have. Write them down in , and use what you learned from them to plan your next step. Then overcome each and every obstacle that you encounter until you reach Y. Only then should you come back and show me what you have done". For coding, this is the key to unlocking the alpha in the current frontier models. It's inverted from what I did before - give instructions, build clear plans etc. But now, I give it goals, tell it to perform experiments (and investing in infrastructure / tooling in the project to improve its ability to perform experiments) feels like the way now.

English

714

John Lam@john_lam·13 Şub

@bran_don_gell i call my single agent "chief of staff" - i think that's a good mental model for this

English

324

Brandon Gell@bran_don_gell·13 Şub

I'm now 100% convinced: independent app UIs are dead. In 2-3 years, everyone will work directly with their personal agent in whatever app they already live in (for me: iMessages and Discord) and those agents will work with other services. Right now it's clunky and slow. Agents use a mix of APIs (often incomplete), MCPs (not fully capable), or browser automation (slow, breaks constantly). But even with these limitations, what I've built with my personal AI agent has convinced me. This is the future. Some predictions: 1. Apple will put Siri in the messages app. Short horizon tasks are great for voice. But long horizon tasks (book this reservation, add it to my cal, invite my wife and sister and brother in law) require chat. 2. Apple has the biggest opportunity here and will probably bungle it. If your agent can't do 100% of things, it can do 0% of things. For security and control reasons Apple won't go all in. A single paper cut will make people abandon it. 3. Single agent with subagents is the way to go. You'll never go to a specialized agent directly. Companies over-complicating this with "agent armies" are missing the point. 4. Someone independent will (hopefully) nail this for the masses — not Apple or Google or Anthropic or OpenAI, though they will all try. The winner needs to be platform-agnostic. 5. "I'll have your agent talk to my agent" will become completely normal. Especially at work. 6. Computer errands will be done entirely in messages. I'm already living this: sending money, adding items to my Whole Foods cart, booking reservations, searching email, managing inbox by phone call. The @every team and I are living 3 years in the future right now but it feels like the rest of the world is catching up.

English

431

127.1K

John Lam@john_lam·13 Şub

@dangreenheck it's like playing factorio

English

Dan Greenheck@dangreenheck·12 Şub

I think this is my biggest issue with AI right now. I’ve switched over to 100% AI coding over the last few months. Overall, the experience has been great and I’m starting to get a handle on my new workflow. While my productivity has easily 5X’d and my brain is enjoying thinking at a higher level of abstraction, the mental fatigue is real. As someone who is self-employed, it has made it incredibly difficult to draw the line at the end of the day and close the laptop. Don’t get me wrong, I already worked too much and stayed up too late before AI, but now when a feature is potentially a few prompts and 5-10 minutes away from completion, it’s so easy to say “just one more prompt.” and boom it’s 2AM. Obviously, it’s a solvable problem and on me to address, but curious how others that aren’t tied to fixed schedules deal with this?

Rohan Paul@rohanpaul_ai

A super interesting new study from Harvard Business Review. A 8-month field study at a US tech company with about 200 employees found that AI use did not shrink work, it intensified it, and made employees busier. Task expansion happened because AI filled in gaps in knowledge, so people started doing work that used to belong to other roles or would have been outsourced or deferred. That shift created extra coordination and review work for specialists, including fixing AI-assisted drafts and coaching colleagues whose work was only partly correct or complete. Boundaries blurred because starting became as easy as writing a prompt, so work slipped into lunch, meetings, and the minutes right before stepping away. Multitasking rose because people ran multiple AI threads at once and kept checking outputs, which increased attention switching and mental load. Over time, this faster rhythm raised expectations for speed through what became visible and normal, even without explicit pressure from managers.

English

165

377.8K

John Lam@john_lam·11 Şub

not just tests that are quick to run, but tests that are token efficient in their outputs. for example "test xxx passed" just burns tokens - agents are only interested in failures so something like: 300/302 tests passed. failures: #1 ... #2 .... #3 ... where failures include detailed info about the failure that the agent can reproduce

English

195

Greg Brockman@gdb·6 Şub

Software development is undergoing a renaissance in front of our eyes. If you haven't used the tools recently, you likely are underestimating what you're missing. Since December, there's been a step function improvement in what tools like Codex can do. Some great engineers at OpenAI yesterday told me that their job has fundamentally changed since December. Prior to then, they could use Codex for unit tests; now it writes essentially all the code and does a great deal of their operations and debugging. Not everyone has yet made that leap, but it's usually because of factors besides the capability of the model. Every company faces the same opportunity now, and navigating it well — just like with cloud computing or the Internet — requires careful thought. This post shares how OpenAI is currently approaching retooling our teams towards agentic software development. We're still learning and iterating, but here's how we're thinking about it right now: As a first step, by March 31st, we're aiming that: (1) For any technical task, the tool of first resort for humans is interacting with an agent rather than using an editor or terminal. (2) The default way humans utilize agents is explicitly evaluated as safe, but also productive enough that most workflows do not need additional permissions. In order to get there, here's what we recommended to the team a few weeks ago: 1. Take the time to try out the tools. The tools do sell themselves — many people have had amazing experiences with 5.2 in Codex, after having churned from codex web a few months ago. But many people are also so busy they haven't had a chance to try Codex yet or got stuck thinking "is there any way it could do X" rather than just trying. - Designate an "agents captain" for your team — the primary person responsible for thinking about how agents can be brought into the teams' workflow. - Share experiences or questions in a few designated internal channels - Take a day for a company-wide Codex hackathon 2. Create skills and AGENTS[.md]. - Create and maintain an AGENTS[.md] for any project you work on; update the AGENTS[.md] whenever the agent does something wrong or struggles with a task. - Write skills for anything that you get Codex to do, and commit it to the skills directory in a shared repository 3. Inventory and make accessible any internal tools. - Maintain a list of tools that your team relies on, and make sure someone takes point on making it agent-accessible (such as via a CLI or MCP server). 4. Structure codebases to be agent-first. With the models changing so fast, this is still somewhat untrodden ground, and will require some exploration. - Write tests which are quick to run, and create high-quality interfaces between components. 5. Say no to slop. Managing AI generated code at scale is an emerging problem, and will require new processes and conventions to keep code quality high - Ensure that some human is accountable for any code that gets merged. As a code reviewer, maintain at least the same bar as you would for human-written code, and make sure the author understands what they're submitting. 6. Work on basic infra. There's a lot of room for everyone to build basic infrastructure, which can be guided by internal user feedback. The core tools are getting a lot better and more usable, but there's a lot of infrastructure that currently go around the tools, such as observability, tracking not just the committed code but the agent trajectories that led to them, and central management of the tools that agents are able to use. Overall, adopting tools like Codex is not just a technical but also a deep cultural change, with a lot of downstream implications to figure out. We encourage every manager to drive this with their team, and to think through other action items — for example, per item 5 above, what else can prevent a lot of "functionally-correct but poorly-maintainable code" from creeping into codebases.

English

413

1.6K

12.3K

2.1M

ค้นพบ

@rseroter @mitsuhiko @djgrant_ @MKBHD @SahilBloom @davidnimaesq @lateinteraction @elonmusk