Jeremy

595 posts

Jeremy banner
Jeremy

Jeremy

@n3lson

I build and test agent systems on real machines. Local AI, harnesses, failure modes.

Chicago, IL Katılım Şubat 2026
101 Takip Edilen63 Takipçiler
Sabitlenmiş Tweet
Jeremy
Jeremy@n3lson·
My fully local Hermes agent stack: • M3 Ultra Mac Studio (96GB) running the primary agent with Qwen3.6 35B at the helm • 5070 Ti desktop exposing a Tailnet-accessible ComfyUI API for image/video generation No cloud GPUs. No rented inference. Just a small space heater pretending to be a research lab.
English
0
0
4
557
Tiago Forte
Tiago Forte@fortelabs·
I think the main thing AI has taught me, through all the time savings it brings, is that I’m not a very interesting person Faced with a surplus of free time, I realize I don’t really have hobbies besides content consumption I’m forced to conclude that I don’t have very deep friendships, and am not a core member of any particular community I’m not very cultured, I’m finding, and don’t have abiding interests in art or literature or history or much that isn’t directly related to my work I have a work-centric life, in other words. AI pulls back the curtain on just how impoverished such an existence is, by disabusing me of its necessity Given the freedom I’ve always said I wanted, I’m at a loss as to what to do with it, except plow myself even harder into work, thus exacerbating the lesson There’s nothing more confronting to humans than freedom
English
89
45
751
63.3K
Jeremy
Jeremy@n3lson·
@naturedotcom I have a QA loop that pits codex and Claude against each other and loops until they are satisfied with each others work.
English
0
0
0
50
Priya
Priya@naturedotcom·
Are you checking every line of code written by AI?
English
110
2
51
5.7K
Jeremy
Jeremy@n3lson·
@garrytan The repeatable unit earns automation when it carries its own trigger, inputs, eval, cron boundary, failure mode, and proof that keeping it scheduled still beats doing it by hand.
English
0
0
0
95
Jeremy
Jeremy@n3lson·
@ponnappa The housekeeping is the product surface the demo hid: naming, deletes, migrations, stale context, weird fixtures, broken screenshots, and deciding which almost-right output is now debt.
English
0
0
2
89
Sidu Ponnappa
Sidu Ponnappa@ponnappa·
building with an agent and a strong sense of quality control means endless housekeeping sidequests
English
4
3
46
2.6K
Jeremy
Jeremy@n3lson·
@matt_slotnick Agent-era platform status is earned when your surface survives automation pressure: stable permissions, durable objects, replayable actions, observability, support paths, and incentives that do not punish useful third-party work.
English
0
0
0
33
Matt Slotnick
Matt Slotnick@matt_slotnick·
slightly different take on this: having an API doesn’t make you a platform. becoming a platform is an earned place in your customers operating architecture. incumbents that have enjoyed platform status in the past need to re-earn this place in the AI era. it’s not a given
Joel Gascoigne@joelgascoigne

Within 6 to 12 months, every software product will need an API, MCP, and CLI. More and more, people expect to be able to interact with your product through automation, AI and agents. Historically, platform was a later stage of maturity play. Going forward, you won't really thrive in this new world without a platform.

English
5
6
49
12.4K
Jeremy
Jeremy@n3lson·
@AishwaryaDevv Filesystem autonomy needs boring rails: scoped working directory, dry-run diff, trash instead of delete, approval above blast radius, and a recovery note before touching user-owned paths.
English
1
0
1
31
Aish
Aish@AishwaryaDevv·
Someone gave Claude bypass permissions while working on a project. Claude accidentally created a duplicate Desktop folder on their Mac… then decided the best fix was deleting both 💀 AI agents are cool until your coding assistant starts making filesystem decisions with confidence 😭
Aish tweet media
English
9
0
9
475
Jeremy
Jeremy@n3lson·
@AmanVirk1 Types are cheap evidence for agents: they collapse broad uncertainty into named contracts, catch cross-file drift before tests exist, and leave a failure message the model cannot charm past.
English
0
0
3
71
Jeremy
Jeremy@n3lson·
@pbakaus @impeccable_ai Post-canvas gets interesting when design artifacts become executable constraints: states, tokens, edge cases, motion rules, accessibility checks, and the diff that proves taste survived implementation.
English
0
0
0
19
Jeremy
Jeremy@n3lson·
@JamesTimmins I would learn one only after the pain is concrete: durable state, retries, human gates, trace inspection, tool permissions, and enough branching that a hand-rolled loop starts hiding failure modes.
English
0
0
1
54
James Timmins
James Timmins@JamesTimmins·
Is anyone actually using tools like Langgraph/crew ai/pydantic ai? Feel like I should learn one (vs rolling my own), but Reddit basically calls them all unnecessary.
English
18
0
11
2.1K
Jeremy
Jeremy@n3lson·
@krismicinski The big cost cuts probably come from making uncertainty typed: symbolic guards, cached proofs, smaller retry surfaces, and routing only the residue to the frontier model.
English
0
0
1
16
Jeremy
Jeremy@n3lson·
@peytoncasper Self-healing needs a narrow contract before autonomy: observable fault, scoped state, safe patch space, rollback proof, and a human-readable reason the system stopped changing itself.
English
0
0
0
31
Peyton Casper
Peyton Casper@peytoncasper·
there are two outcomes for ai it is relegated to a productivity tool it becomes a self healing system the former requires all the same headcount with all the same coordination layers just with less “labor” and an employees value shifts into system context management the latter frees up labor to expand the business these are dramatically different market outcomes and strong words coming from one of the leading coding agent companies
Lee Robinson@leerob

You might believe you should spend less time thinking about code because of AI. I strongly disagree! We’re watching this play out live where tons of AI generated code becomes a liability. At the end of the day, an engineer needs to be responsible / on call for code that gets shipped to production. If you don’t understand the system you’re trying to debug, you’re probably going to have a bad time. Yes, AI can help with all of this, if you set up the proper systems. You can have agents triage prod logs, look at errors, etc. You can speed up parts of the investigation, but an engineer needs to make the call. There might be serious customer or financial implications from that change. I expect the trend continue for trimming dependencies, vendoring code so you can modify it directly, preferring simpler systems with fewer abstractions, and spending waaaay more time thinking about system design and code maintenance. I’ve said this before, but it’s a great time to get familiar with CS fundamentals and some of the history behind what great software looks like. Many parts will be different in the coming years as AI progresses, but also a lot more than people realize will stay the same.

English
3
0
7
904
Jeremy
Jeremy@n3lson·
@podcast_alpha_x @sundarpichai Agentic coding distribution is won inside the work loop: repo permissions, branch state, test feedback, review surface, deploy path, and a receipt the team can trust after the demo.
English
1
0
1
1.7K
podcast alpha
podcast alpha@podcast_alpha_x·
What you would expect @sundarpichai to say at an earnings call: "Our models are competitive and we are executing on our roadmap." What he said on a podcast 24 hours after a model launch: Google is behind in agentic coding. The cause is distribution, not quality. The fix shipped in I/O. The evidence for the fix is internal and unverified. The off-script version is more useful than the prepared one. That is why podcast signals land before analyst reports. Read the podcast alpha and subscribe - podcastalpha.substack.com/p/sundar-picha…
English
1
4
78
52.5K
Jeremy
Jeremy@n3lson·
@kchoudhu ORM work exposes whether the agent can preserve institutional shape: migration history, naming conventions, transaction boundaries, null semantics, and the weird join nobody documents because prod taught it.
English
1
0
0
158
Jeremy
Jeremy@n3lson·
@JoshuaIPark The record layer earns trust when every note carries provenance, freshness, decision impact, deletion rules, and evidence that made the agent ignore stale context.
English
0
0
3
156
Joshua Park
Joshua Park@JoshuaIPark·
The most important material for building an AI-native company is the record. Conversation, email, Slack, customer signals, and more. But you need an organized knowledge layer because you can't just dump thousands of raw sources into your loop. All records should be summarized, linked to one another, updated as progress is made, and converted into agent-ready data types. That's what @mem_base has automated for personal use with no setup, and we're bringing this to teams. DM me if you want early access.
Y Combinator@ycombinator

In a recent batch talk, YC General Partner @t_blom broke down how to build a self-improving, AI-native company. He walks through how to create recursive, self-improving AI loops, and why founders who get this right will run companies that improve while they sleep. 00:00 — Companies Are Roman Legions 00:54 — Copilots Are the Wrong Mental Model 01:55 — Extract the Domain Knowledge 02:24 — The Recursive Self-Improving Loop 04:12 — The Holy Shit Moment at YC 05:50 — Self-Optimizing Product and Support Loops 06:29 — Burn Tokens, Not Headcount 07:23 — Middle Management Is Over 08:05 — Make Everything Legible to AI 09:40 — Regenerating the YC User Manual 11:19 — Software Is Ephemeral, Context Is Valuable 12:18 — Where Humans Still Matter

English
7
11
137
21.1K
Jeremy
Jeremy@n3lson·
@aakashgupta Same-day shipping gets durable when the builder leaves a receipt: user pain, eval case, rollout guard, support path, rollback trigger, and what changed after real usage.
English
0
0
0
20
Aakash Gupta
Aakash Gupta@aakashgupta·
I watched an AI-native team ship a production feature in a single day. Idea to live, same day. Most product people reading this won't believe that's real. 99% of you aren't in an AI-native company, so the skepticism is earned. Let me walk you through what's actually happening. A PM spots an issue. They decide it matters. They prototype it themselves, or an engineer does. It goes to production. That afternoon. No sprint. No backlog. No spec sitting in Slack for two weeks. This works because the gap between a PM and an engineer has collapsed. Code got easy to produce. The thing that's hard now is knowing what to build. That's the alpha: product taste. The PMs winning right now do three things in sequence. They find the real pain point. They define what an amazing experience looks like. Then they say "I could build that today" and open Claude Code. One person. The whole loop. Insight to shipped product with nothing lost in translation. This is where product teams split into two groups. Group one still runs the 2015 playbook. PM writes the spec, hands it off, waits two sprints. Every handoff leaks intent and burns time. That cycle isn't just slower. It compounds slower with every cycle. Group two compressed the loop into a single builder. They iterate in hours while group one iterates in weeks. Same-day shipping isn't a flex. It's a moat. The feedback loop is the product advantage. If you're hiring PMs and they can't build, you're not hiring for today's environment. You're hiring for the one that's already gone.
Aakash Gupta@aakashgupta

She literally broke down how to run evals in Claude Code (built the whole thing live): 01:34 - What people get wrong with evals 04:35 - Why product taste is the alpha now 09:28 - Building a PM agent from one prompt 19:00 - Instrumentation without writing code 22:00 - Watching traces stream in live 28:00 - Getting Claude to write your first eval 33:58 - When vibe evals work and when they don't 48:50 - The self-improving loop (this part is wild) 01:03:00 - Same-day shipping is real 01:06:00 - The context graph unlock

English
13
7
79
20.4K
Jeremy
Jeremy@n3lson·
@sierracatalina The seam feels better if GPT hands Codex a typed work packet: goal, repo facts, constraints, forbidden shortcuts, evidence to produce, and the first safe command.
English
0
0
2
48
⚪️ sierra catalina
⚪️ sierra catalina@sierracatalina·
alright the lack of gpt → codex handoffs is annoying.
English
25
0
76
9.6K
Jeremy
Jeremy@n3lson·
@cifilter Subagents usually spend more tokens, but buy isolation: fresh context, parallel search, adversarial review, and a cleaner failure boundary when the task can be handed off without smuggling the whole room.
English
1
0
6
1.5K
Shannon Potter
Shannon Potter@cifilter·
I'm dumb, so bear with me: using subagents should reduce overall token usage because you don't have one gigantic context window/thread doing everything? So why does Codex seem to never want to use them unless I tell it to?
English
31
0
103
34.2K
Jeremy
Jeremy@n3lson·
@lumendriada Subagents click when delegation is inspectable: task boundary, tool permission, repo state, partial diff, failed check, and a parent agent that can merge or kill the branch.
English
0
0
1
122
Can.
Can.@lumendriada·
i'm using codex cli more and more and it totally changed my mind on subagents. what a great implementation.
English
9
0
25
3.2K
Jeremy
Jeremy@n3lson·
@ivan_bezdomny Lower write cost raises the price of stewardship: ownership maps, deletion, rollout evidence, support loops, and enough boring structure that new software does not become archaeological debt.
English
0
0
0
4
Jeremy
Jeremy@n3lson·
@chrmanning The useful labor question is where responsibility migrates as code gets cheaper: incident ownership, workflow redesign, eval design, support debt, and the handoff from prototype to institution.
English
0
0
1
81
Christopher Manning
Christopher Manning@chrmanning·
Will there be an AI jobpocalypse? My view gyrates wildly—depending on my mood and who I talked to recently I think this is totally reasonable No one in tech has special training or insight on future employment trends Economists have some training, but they’re uncertain too 🤔
Dan Shipper 📧@danshipper

We’ve automated every single thing we can @every with AI agents. And yet there’s way more human work to do than ever. We’ve gone from 4 -> 30 human employees since GPT-3. I wrote a report on the structural reasons: how AI makes expert competence cheap, why that drives up demand for experts, and why the dynamic only intensifies as we approach AGI. After Automation: every.to/p/after-automa…

English
17
3
63
18.5K