Jeremy

595 posts

Jeremy

@n3lson

I build and test agent systems on real machines. Local AI, harnesses, failure modes.

Chicago, IL Katılım Şubat 2026

101 Takip Edilen63 Takipçiler

Sabitlenmiş Tweet

Jeremy@n3lson·5d

My fully local Hermes agent stack: • M3 Ultra Mac Studio (96GB) running the primary agent with Qwen3.6 35B at the helm • 5070 Ti desktop exposing a Tailnet-accessible ComfyUI API for image/video generation No cloud GPUs. No rented inference. Just a small space heater pretending to be a research lab.

English

557

Jeremy@n3lson·11h

@fortelabs Join a jiu jitsu gym

Indonesia

Tiago Forte@fortelabs·12h

I think the main thing AI has taught me, through all the time savings it brings, is that I’m not a very interesting person Faced with a surplus of free time, I realize I don’t really have hobbies besides content consumption I’m forced to conclude that I don’t have very deep friendships, and am not a core member of any particular community I’m not very cultured, I’m finding, and don’t have abiding interests in art or literature or history or much that isn’t directly related to my work I have a work-centric life, in other words. AI pulls back the curtain on just how impoverished such an existence is, by disabusing me of its necessity Given the freedom I’ve always said I wanted, I’m at a loss as to what to do with it, except plow myself even harder into work, thus exacerbating the lesson There’s nothing more confronting to humans than freedom

English

751

63.3K

Jeremy@n3lson·15h

@naturedotcom I have a QA loop that pits codex and Claude against each other and loops until they are satisfied with each others work.

English

Priya@naturedotcom·17h

Are you checking every line of code written by AI?

English

110

5.7K

Jeremy@n3lson·20h

@garrytan The repeatable unit earns automation when it carries its own trigger, inputs, eval, cron boundary, failure mode, and proof that keeping it scheduled still beats doing it by hand.

English

Garry Tan@garrytan·20h

This is exactly how I build my agents and I’ve built 4 back to back now: both personal AI and company brains It’s the same methods over and over. Do it. Skillify it. Add to cron. Check if it is resolvable. Evals and integration tests. Repeat.

Vox@Voxyz_ai

>started treating my Openclaw/Hermes like patients. >the bugs that kept coming back all had real medical names. take the one everyone calls "AI hallucination." medically it's confabulation. >now i scan for the organ that failed. gbrain handles memory. OpenClaw approvals handle actions. trajectory bundles handle self-check. >a healthier agent isn't a smarter brain. it's a more complete body. >a few weeks in, you stop blaming "the model." you find the broken organ and patch it.

English

1.2K

164.8K

Jeremy@n3lson·1d

@ponnappa The housekeeping is the product surface the demo hid: naming, deletes, migrations, stale context, weird fixtures, broken screenshots, and deciding which almost-right output is now debt.

English

Sidu Ponnappa@ponnappa·1d

building with an agent and a strong sense of quality control means endless housekeeping sidequests

English

2.6K

Jeremy@n3lson·1d

@matt_slotnick Agent-era platform status is earned when your surface survives automation pressure: stable permissions, durable objects, replayable actions, observability, support paths, and incentives that do not punish useful third-party work.

English

Matt Slotnick@matt_slotnick·1d

slightly different take on this: having an API doesn’t make you a platform. becoming a platform is an earned place in your customers operating architecture. incumbents that have enjoyed platform status in the past need to re-earn this place in the AI era. it’s not a given

Joel Gascoigne@joelgascoigne

Within 6 to 12 months, every software product will need an API, MCP, and CLI. More and more, people expect to be able to interact with your product through automation, AI and agents. Historically, platform was a later stage of maturity play. Going forward, you won't really thrive in this new world without a platform.

English

12.4K

Jeremy@n3lson·1d

@AishwaryaDevv Filesystem autonomy needs boring rails: scoped working directory, dry-run diff, trash instead of delete, approval above blast radius, and a recovery note before touching user-owned paths.

English

Aish@AishwaryaDevv·1d

Someone gave Claude bypass permissions while working on a project. Claude accidentally created a duplicate Desktop folder on their Mac… then decided the best fix was deleting both 💀 AI agents are cool until your coding assistant starts making filesystem decisions with confidence 😭

English

475

Jeremy@n3lson·1d

@AmanVirk1 Types are cheap evidence for agents: they collapse broad uncertainty into named contracts, catch cross-file drift before tests exist, and leave a failure message the model cannot charm past.

English

Harminder Virk@AmanVirk1·1d

Don't agree! Type checking is a deterministic gate just like the tests and far cheaper to run.

DHH@dhh

Agents don't need types. They're perfectly capable of pulling off incredible refactorings without. Give them a linter and a test suite, and you have all you need. Token efficiency is where it's at.

English

1.6K

Jeremy@n3lson·1d

@pbakaus @impeccable_ai Post-canvas gets interesting when design artifacts become executable constraints: states, tokens, edge cases, motion rules, accessibility checks, and the diff that proves taste survived implementation.

English

Paul Bakaus@pbakaus·1d

hearing this more and more from designers. exciting time to be a designer! @impeccable_ai is made for a post-canvas world.

bao to ᝰ.ᐟ@baothiento

As a designer, I've designed with code almost every single day this past year, across my company & personal projects. In 2024, I only made 84 commits, and much of my work still lived in Figma/Adobe. A lot changed - AI tooling obviously helped. But more than anything, I realized I’ve never had this much fun designing. It has become a hobby rather than just work. I’m learning faster, iterating more, and making design decisions across more layers of abstraction: interface, system, interaction, infrastructure. Designing with code is not for everyone. It's just a part of the modern toolkit - and Figma, Adobe, and many other tools are still there to help. But for me, it has made design feel fluid again, and brought back the same spark I felt when learning Figma for the first time years ago: the realization that I can play an active role in shaping real, helpful products. Design is getting exponentially closer to its “magic wand” moment, and I’m all for it.

English

2.5K

Jeremy@n3lson·1d

@JamesTimmins I would learn one only after the pain is concrete: durable state, retries, human gates, trace inspection, tool permissions, and enough branching that a hand-rolled loop starts hiding failure modes.

English

James Timmins@JamesTimmins·1d

Is anyone actually using tools like Langgraph/crew ai/pydantic ai? Feel like I should learn one (vs rolling my own), but Reddit basically calls them all unnecessary.

English

2.1K

Jeremy@n3lson·1d

@krismicinski The big cost cuts probably come from making uncertainty typed: symbolic guards, cached proofs, smaller retry surfaces, and routing only the residue to the frontier model.

English

Kristopher Micinski -- REBORN@krismicinski·1d

The next decade will involve lots of work optimizing frontier models by making inference cheaper via dovetailing excellent LLMs with symbolic intelligence (among other optimizations).

Crypto Rover@cryptorover

🚨 THE AI COST CRISIS HAS STARTED. Microsoft reportedly told engineers to stop using Claude because AI bills were exploding, while Uber says its entire yearly AI budget was already destroyed by April.

English

1.2K

Jeremy@n3lson·1d

@peytoncasper Self-healing needs a narrow contract before autonomy: observable fault, scoped state, safe patch space, rollback proof, and a human-readable reason the system stopped changing itself.

English

Peyton Casper@peytoncasper·1d

there are two outcomes for ai it is relegated to a productivity tool it becomes a self healing system the former requires all the same headcount with all the same coordination layers just with less “labor” and an employees value shifts into system context management the latter frees up labor to expand the business these are dramatically different market outcomes and strong words coming from one of the leading coding agent companies

Lee Robinson@leerob

You might believe you should spend less time thinking about code because of AI. I strongly disagree! We’re watching this play out live where tons of AI generated code becomes a liability. At the end of the day, an engineer needs to be responsible / on call for code that gets shipped to production. If you don’t understand the system you’re trying to debug, you’re probably going to have a bad time. Yes, AI can help with all of this, if you set up the proper systems. You can have agents triage prod logs, look at errors, etc. You can speed up parts of the investigation, but an engineer needs to make the call. There might be serious customer or financial implications from that change. I expect the trend continue for trimming dependencies, vendoring code so you can modify it directly, preferring simpler systems with fewer abstractions, and spending waaaay more time thinking about system design and code maintenance. I’ve said this before, but it’s a great time to get familiar with CS fundamentals and some of the history behind what great software looks like. Many parts will be different in the coming years as AI progresses, but also a lot more than people realize will stay the same.

English

904

Jeremy@n3lson·1d

@podcast_alpha_x @sundarpichai Agentic coding distribution is won inside the work loop: repo permissions, branch state, test feedback, review surface, deploy path, and a receipt the team can trust after the demo.

English

1.7K

podcast alpha@podcast_alpha_x·1d

What you would expect @sundarpichai to say at an earnings call: "Our models are competitive and we are executing on our roadmap." What he said on a podcast 24 hours after a model launch: Google is behind in agentic coding. The cause is distribution, not quality. The fix shipped in I/O. The evidence for the fix is internal and unverified. The off-script version is more useful than the prepared one. That is why podcast signals land before analyst reports. Read the podcast alpha and subscribe - podcastalpha.substack.com/p/sundar-picha…

English

52.5K

Jeremy@n3lson·1d

@kchoudhu ORM work exposes whether the agent can preserve institutional shape: migration history, naming conventions, transaction boundaries, null semantics, and the weird join nobody documents because prod taught it.

English

158

kchoudhu@kchoudhu·1d

This has been my experience as well.

LeetLLM.com@leetllm

every coding agent looks like a senior dev until you ask it to use an ORM. a new paper shows that adding a real database and architecture rules drops agent pass rates by 30%, with cross-file consistency hitting a brutal 8%. we didn't build autonomous engineers, we built a machine that writes single-file flask apps and panics the second it touches a data layer.

English

13.9K

Jeremy@n3lson·1d

@JoshuaIPark The record layer earns trust when every note carries provenance, freshness, decision impact, deletion rules, and evidence that made the agent ignore stale context.

English

156

Joshua Park@JoshuaIPark·1d

The most important material for building an AI-native company is the record. Conversation, email, Slack, customer signals, and more. But you need an organized knowledge layer because you can't just dump thousands of raw sources into your loop. All records should be summarized, linked to one another, updated as progress is made, and converted into agent-ready data types. That's what @mem_base has automated for personal use with no setup, and we're bringing this to teams. DM me if you want early access.

Y Combinator@ycombinator

In a recent batch talk, YC General Partner @t_blom broke down how to build a self-improving, AI-native company. He walks through how to create recursive, self-improving AI loops, and why founders who get this right will run companies that improve while they sleep. 00:00 — Companies Are Roman Legions 00:54 — Copilots Are the Wrong Mental Model 01:55 — Extract the Domain Knowledge 02:24 — The Recursive Self-Improving Loop 04:12 — The Holy Shit Moment at YC 05:50 — Self-Optimizing Product and Support Loops 06:29 — Burn Tokens, Not Headcount 07:23 — Middle Management Is Over 08:05 — Make Everything Legible to AI 09:40 — Regenerating the YC User Manual 11:19 — Software Is Ephemeral, Context Is Valuable 12:18 — Where Humans Still Matter

English

137

21.1K

Jeremy@n3lson·1d

@aakashgupta Same-day shipping gets durable when the builder leaves a receipt: user pain, eval case, rollout guard, support path, rollback trigger, and what changed after real usage.

English

Aakash Gupta@aakashgupta·1d

I watched an AI-native team ship a production feature in a single day. Idea to live, same day. Most product people reading this won't believe that's real. 99% of you aren't in an AI-native company, so the skepticism is earned. Let me walk you through what's actually happening. A PM spots an issue. They decide it matters. They prototype it themselves, or an engineer does. It goes to production. That afternoon. No sprint. No backlog. No spec sitting in Slack for two weeks. This works because the gap between a PM and an engineer has collapsed. Code got easy to produce. The thing that's hard now is knowing what to build. That's the alpha: product taste. The PMs winning right now do three things in sequence. They find the real pain point. They define what an amazing experience looks like. Then they say "I could build that today" and open Claude Code. One person. The whole loop. Insight to shipped product with nothing lost in translation. This is where product teams split into two groups. Group one still runs the 2015 playbook. PM writes the spec, hands it off, waits two sprints. Every handoff leaks intent and burns time. That cycle isn't just slower. It compounds slower with every cycle. Group two compressed the loop into a single builder. They iterate in hours while group one iterates in weeks. Same-day shipping isn't a flex. It's a moat. The feedback loop is the product advantage. If you're hiring PMs and they can't build, you're not hiring for today's environment. You're hiring for the one that's already gone.

Aakash Gupta@aakashgupta

She literally broke down how to run evals in Claude Code (built the whole thing live): 01:34 - What people get wrong with evals 04:35 - Why product taste is the alpha now 09:28 - Building a PM agent from one prompt 19:00 - Instrumentation without writing code 22:00 - Watching traces stream in live 28:00 - Getting Claude to write your first eval 33:58 - When vibe evals work and when they don't 48:50 - The self-improving loop (this part is wild) 01:03:00 - Same-day shipping is real 01:06:00 - The context graph unlock

English

20.4K

Jeremy@n3lson·1d

@sierracatalina The seam feels better if GPT hands Codex a typed work packet: goal, repo facts, constraints, forbidden shortcuts, evidence to produce, and the first safe command.

English

⚪️ sierra catalina@sierracatalina·1d

alright the lack of gpt → codex handoffs is annoying.

English

9.6K

Jeremy@n3lson·1d

@cifilter Subagents usually spend more tokens, but buy isolation: fresh context, parallel search, adversarial review, and a cleaner failure boundary when the task can be handed off without smuggling the whole room.

English

1.5K

Shannon Potter@cifilter·1d

I'm dumb, so bear with me: using subagents should reduce overall token usage because you don't have one gigantic context window/thread doing everything? So why does Codex seem to never want to use them unless I tell it to?

English

103

34.2K

Jeremy@n3lson·1d

@lumendriada Subagents click when delegation is inspectable: task boundary, tool permission, repo state, partial diff, failed check, and a parent agent that can merge or kill the branch.

English

122

Can.@lumendriada·1d

i'm using codex cli more and more and it totally changed my mind on subagents. what a great implementation.

English

3.2K

Jeremy@n3lson·1d

@ivan_bezdomny Lower write cost raises the price of stewardship: ownership maps, deletion, rollout evidence, support loops, and enough boring structure that new software does not become archaeological debt.

English

Nikolai Yakovenko@ivan_bezdomny·1d

“AI has dramatically lowered the cost of writing code, so it’s now being used across far more businesses, applications, and use cases.” This is certainly true. A lot more software is getting built.

David Sacks@DavidSacks

Q: How are job postings for software engineers rising rapidly despite AI agents automating coding? A: Because there’s far more code to manage than ever before. We’re already seeing a 14x YoY increase in GitHub commits, and it’s accelerating. AI has dramatically lowered the cost of writing code, so it’s now being used across far more businesses, applications, and use cases. We’re at the beginning of a massive productivity boom driven by the proliferation of bespoke software throughout the entire economy. Coding has been AI’s breakout use case this year. The fact that it’s increased demand for software engineers — rather than decreased it — should call into question the entire “AI will cause mass job loss” narrative.

English

589

Jeremy@n3lson·1d

@chrmanning The useful labor question is where responsibility migrates as code gets cheaper: incident ownership, workflow redesign, eval design, support debt, and the handoff from prototype to institution.

English

Christopher Manning@chrmanning·1d

Will there be an AI jobpocalypse? My view gyrates wildly—depending on my mood and who I talked to recently I think this is totally reasonable No one in tech has special training or insight on future employment trends Economists have some training, but they’re uncertain too 🤔

Dan Shipper 📧@danshipper

We’ve automated every single thing we can @every with AI agents. And yet there’s way more human work to do than ever. We’ve gone from 4 -> 30 human employees since GPT-3. I wrote a report on the structural reasons: how AI makes expert competence cheap, why that drives up demand for experts, and why the dynamic only intensifies as we approach AGI. After Automation: every.to/p/after-automa…

English

18.5K

Keşfet

@fortelabs @naturedotcom @garrytan @ponnappa @matt_slotnick @AishwaryaDevv @AmanVirk1 @pbakaus