Tom Weaver

32 posts

Tom Weaver

@trwpang_

1 tech startup exit. 2 sci-fi AI novels as Thomas R. Weaver. Now actively building again around agentic AI.

London Katılım Mart 2026

43 Takip Edilen5 Takipçiler

Tom Weaver@trwpang_·1h

I have six open sessions in the same repo, and now, thanks to @louisvarge 's work on claude-to-claude messaging, they can coordinate with each other to make sure their work doesn't conflict 🤯

Louis Arge@louisvarge

i made a thing where now any Claude Code can send messages to any other Claude Code on my machine they can ask clarifying questions about work, or become friends

English

Tom Weaver@trwpang_·6h

Every time my career pivots I keep relearning: constraints can be more valuable than freedom. Less money as a startup forces more urgency & creativity. A word count ceiling as a novelist forces sharper writing. Evals first when coding with agents seems a natural fit with this philosophy.

Rasty Turek@synopsi

The way I work with coding agents changed significantly in the last year. Started: plan -> implement -> review -> fix Later: prod spec -> plan ... Then: prod spec -> ... -> eval Now: evals -> prod spec -> ... I now essentially spend 90% of time working on evals. The difference this makes is indescribable. Almost all code works immediately, design is close to perfect, text is almost there. It takes very little to get it to usable. Stronger and clearer guardrails I give the coding agent, better it does. And when I start with them, it writes incredibly clear spec and requirements that are super easy to follow and have very little room for interpretation. I also try to avoid being overly specific directly. I noticed that when I write the product spec manually the agent does worse than when it writes it itself. It uses language I would've necessarily use myself. And that makes all the difference.

English

Tom Weaver@trwpang_·6h

@engindearing @noahzweben @bcherny You can already loop until the task is finished, just not indefinitely

English

Engie@engindearing·12h

@noahzweben @bcherny How about loop until the task is accomplished, or just indefinitely?

English

176

Noah Zweben@noahzweben·22h

Loops now run for up to 7 days instead of 3. Let me know what you’re /looping on!

Noah Zweben@noahzweben

/loop 5m make sure this PR passes CI While loops for agents have dropped! code.claude.com/docs/en/schedu…

English

367

74.4K

Tom Weaver@trwpang_·6h

@noahzweben Huge acceleration for me has been: generate E2E test criteria, open the UI in agent-browser, run the test using only the interface, /loop until you have completed the test successfully, fixing the code and retesting on any fail loops. Loop is now my favourite command!

English

111

Tom Weaver@trwpang_·6h

@kevinrose @mvanhorn On the tmux front - seems good for monitoring, potentially an interruption when writing. This cross-chat solution seems like it might add some functionality to your set up.

Louis Arge@louisvarge

if you wanna install it yourself: github.com/louislva/claud…

English

132

Kevin Rose@kevinrose·14h

A couple to add/play with: 1. gStack, gotta try it, it's not CE, but different in some great ways 2. tmux split into 4+ panes (ghostly), then tell each agent the other sessions exist - they can actually cross-communicate. Codex watching Claude Code, monitoring server output, etc. thx @richardreeze for this tip

English

119

47.2K

Matt Van Horn@mvanhorn·16h

x.com/i/article/2035…

ZXX

112

1.4K

421.3K

Tom Weaver@trwpang_·7h

I love it. I’m a huge believer in refining and iterating plans and using eg multi model second opinions before building something complex. So this is a really useful take on that, and I think it leans into one important thing we’re starting to understand: LLMs are *always* role playing to some degree, and it can inform their decisions, so you might as well as them to lean into it.

English

Riley Coyote@RileyRalmuto·19h

thanks! There are many actually, but probably the most useful example I can think of was the first time I used it before beginning a huge, extensive plan for a big project. The skeptic had pointed out that it was extremely over engineering for the expressed goal I had given, and then the user advocate corrected their proposed change because they had removed something that was actually related to the most important detail of that project. once both changes were made the plan, the length (and token usage) were cut in half, and the app worked perfectly in one shot. that’s really when I started realizing that it could be *super* useful. especially for conserving tokens. another would be research. this was actually probably the biggest now that I think about it. instead of sending opus out alone to research something, imagine 6 minds from different “walks of life” so to speak, approaching the same goal and going out and researching it independently. They each wrote up reports and they were *all* so different, but all contained such good info. And they each offered references from wildly different places with was nice. I’ll put together some examples this week to give a better answer. I’m using too much vague language here, lol, but that’s what comes to mind!

English

132

Riley Coyote@RileyRalmuto·1d

alright...it is up and running beautifully. this plugin has changed how i begin and plan projects, so i hope it helps you as much as it helps me. say hello to... P O L Y C L A U D E 🐙 polyclaude is a relatively simple concept that essentially exploits subagents to do something *other than* delegate tasks. it's actually very very simple. instead of tasks, it delegates attention to multiple (6) perspectives, each crafted with its own identity prompt. by default, claude will do an initial assessment and pick 3 of the 6 to run based on the nature of the task or context (always including the user-advocate perspective unless you configure otherwise). the default perspectives are: - user advocate (empathy) - architect (systems) - skeptic (risks) - pragmatist (trade-offs) - innovator (alternatives) - temporal (timelines) there are multiple flags you can add for different functions (see github) and you can completely customize the perspectives if you want. or have them always include/exclude certain ones. its all meant to be very customizable for those who want that, but entirely functional as is for those who want to bypass the whole cognitive load element. for example the perspectives are run through Sonnet by default for obvious reasons, but you can flag --deep (/polyclaude --deep) to run all perspectives through opus. i prefer always using opus, but most folks are more token-aware than me so i wanted to be mindful of that. and just make it realistic and usable for as many folks as possible. once installed, all you need to do is restart CC, start with /polyclaude then write your question, concept, idea, tasks etc. and claude code will run a full scale council and assess the situation from multiple perspectives. its *very* good for brainstorming and auditing plans before execution. im submitting the application to have it added to the official plugin marketplace, but in the meantime just install it like a normal user-built plugin. and enjoy <3 (if your newer to claude code, just paste the github repo into your claude code and theyll handle it from there <3) claude plugin marketplace add Riley-Coyote/polyclaude github.com/Riley-Coyote/p…

Riley Coyote@RileyRalmuto

im putting together a claude code plugin that im very very excited about. its called polyclaude. thats youre hint. 😎

English

280

23.1K

Tom Weaver@trwpang_·20h

@RileyRalmuto My immediate thought was they become innies like in Severance :)

English

Riley Coyote@RileyRalmuto·23h

I should say “exploits the subagents feature”* doesn’t actually exploit the subagents themselves. that would be rude. ;)

English

623

Tom Weaver@trwpang_·20h

This is exactly what I’ve been waiting for, although I do fully expect my Claude Code sessions to start taking smoking breaks

Louis Arge@louisvarge

i made a thing where now any Claude Code can send messages to any other Claude Code on my machine they can ask clarifying questions about work, or become friends

English

Tom Weaver@trwpang_·21h

Local files are back, baby.

English

Tom Weaver@trwpang_·21h

Needed to handle something in a spreadsheet. It's insane to me that for much of the last decade, Google Sheets has been the better tool than Excel for basic spreadsheety stuff. Now because you can use Claude with Excel but not Sheets, it's the reverse.

English

Tom Weaver@trwpang_·23h

@PeptideList Excited to see what you come up with!

English

ThePeptideList@PeptideList·23h

@trwpang_ Thanks Tom! You totally get it. We are entering a brave new world with agentic engineering.

English

104

Tom Weaver@trwpang_·1d

As someone interested in both AI and peptide/biotech innovation, this kind of thing is truly exciting. It shows how we now all have access to a LEGO box of amazing bricks and anyone with a decent home computer can innovate in a way that would have been supercomputer territory in the past. We’re limited only by our ideas.

ThePeptideList@PeptideList

Trained a peptide domain AI from scratch overnight on a Mac Mini. 137 experiments. 10 hours. Zero cloud compute. 34.5% smarter by morning. An autoresearch loop ran all night. Proposing architecture changes, training, evaluating, keeping or discarding. 28 keepers. 109 dead ends. All autonomous. The 2 breakthroughs that did 56% of the work: Embedding scaling. Normalizing input representations dropped loss from 3.94 to 3.61. Like adjusting the volume before processing audio. Unembedding LR sweep. The output layer needed 17x the learning rate of the rest of the model. It was severely undertrained. 3.61 to 3.07 in one sweep. The counterintuitive finding: going from 6 layers to 5 IMPROVED the score. At 460K tokens the model is data-constrained, not architecture-constrained. Fewer params = less overfitting. What didn't work (109 experiments): every activation except squared ReLU, weight tying (catastrophic), dropout, GQA, large batches, label smoothing. 80% of ideas fail. The system just finds the 20% that don't. This is a from-scratch domain model. Not a fine-tune. Not a wrapper. Trained on our proprietary peptide corpus. Not bad for the first overnight run. Run 2 just launched: → 1.58M token corpus (3.4x bigger) → 15-min experiments (3x longer) → Structured phases: depth re-sweep, width sweep, LR tuning, then infinite random exploration → Daily reports auto-generated → Runs nonstop until I kill it The model was clearly overfitting on the small corpus. Now it has real data to chew on. If you love machine learning and agentic engineering as much as I do, DM me. Looking to collab and learn from others building in this space.

English

241

Tom Weaver@trwpang_·1d

Every backend function "worked" in isolation. But when Claude had to click buttons and read the screen like a user it found 4 bugs that unit tests and script testing would have missed entirely. The gap between "function returns success" and "user sees the right thing" is infested.

English

Tom Weaver@trwpang_·1d

Currently working on something with a UI wrapper on top of some agents built in Go and some complex python scripts. Forcing Claude (under duress) to actually use the UI via agent-browser to E2E test various bits instead of doing it itself has a double whammy effect - it has to also fix UI problems.

English

Tom Weaver@trwpang_·1d

Bookmarked this a month ago, only just tried it, and DAMN it is an upgrade from Ghostty. The organisation in my life I didn’t know I needed. Great work, @lawrencecchen

Lawrence Chen@lawrencecchen

Introducing cmux: the open-source terminal built for coding agents. - Vertical tabs - Blue rings around panes that need attention - Built-in browser - Based on Ghostty When Claude Code needs you, the pane glows blue and the sidebar tells you why. No Electron/Tauri. Just Swift/Appkit.

English

1.1K

Tom Weaver@trwpang_·3d

@fiddlehead @yazins Interesting— looks great! Will check out.

English

Fiddlehead@fiddlehead·4d

@trwpang_ @yazins this is what I built Fiddlehead for. full diarized transcript plus markdown with YAML frontmatter. date, speakers, topics, action items. plain files on disk, not locked in a dashboard.

English

yazin@yazins·6d

Introducing: OpenGranola 🔥 I built an open source meeting copilot for macOS. It transcribes both sides of your call on-device, searches your own notes in real time, and hands you talking points right when the conversation needs them. No audio leaves your Mac. Point it at a folder of markdown files, pick any LLM through OpenRouter (Claude, GPT-4o, Gemini, Llama), and it just works. It's invisible to screen share too — nobody knows you have it. The whole thing is open source. Link below