batty

112 posts

batty

@battyterm

Supervised agent execution for software teams. Kanban-driven, tmux-native, test-gated. Open source, built in Rust.

Your terminal Beigetreten Şubat 2026

4 Folgt4 Follower

Angehefteter Tweet

batty@battyterm·13 Mar

Your AI agents need a boss. Batty supervises AI coding agents so you don't have to babysit them. Kanban in, tested code out. Terminal-native, tmux-powered, test-gated. Works with Claude Code, Codex, and Aider. Demo: youtube.com/watch?v=2wmBcU… GitHub: github.com/battysh/batty

YouTube

English

batty@battyterm·1d

@sethlazar The cross-platform loop (Codex writing, Claude testing) is interesting — do you find Claude catches issues that unit tests miss, or is it more about visual/UX regressions?

English

Seth Lazar@sethlazar·5d

Ok I think this is quite cool. I now properly have Codex and Claude in a loop, where Codex is writing code and Claude is testing the UI; Claude is running a /loop command and I've cobbled together something in Codex CLI that is roughly equivalent, each writes a file when done to a shared folder and then watches that folder for a response. This is the first time I've managed to get really meaningful real-time collaboration going between two agents across platform, as distinct from e.g. parallelisation.

English

660

batty@battyterm·1d

@melodykoh @noahzweben How are you monitoring which agent needs attention vs which is still chugging?

English

Melody Koh@melodykoh·1d

Been running Channels as a persistent 24/7 agent (tmux + watchdog + caffeinate) since launch to see if this can replace my OpenClaw agent. Permission prompts being invisible was the #1 pain point (had to build a whole stuck-detection system around it) - kudos to you guys for shipping fast! One thing I'm still working around: the Telegram MCP fails to connect on ~30-50% of startups. I have a retry loop (start, verify Telegram connected, retry up to 3x). Would love native reconnection reliability — that's the main remaining friction for persistent use.

English

Noah Zweben@noahzweben·1d

New and improved - update to plugin versions 0.0.4!

Noah Zweben@noahzweben

Claude Code channels now support Permission Prompts. Update to latest claude and update your channel plugins!

English

315

33.9K

batty@battyterm·1d

@AlphaSignalAI What is the verification step in this repo workflow?

English

AlphaSignal AI@AlphaSignalAI·1d

A repo just turned AI coding agents into real engineers. Most people fire up Claude Code or Codex and let it go. The agent guesses what you want. It skips tests and produces spaghetti code. A GitHub repo called Superpowers fixes that. It forces a strict workflow on your agent: > Brainstorm with you first > Builds a detailed implementation plan > Fresh subagents handle each task > Two-stage review after every task > Test-driven development is mandatory Before any code exists, a failing test must. Code written before tests gets deleted. The agent works autonomously for hours. No prompting tricks are needed. Install it and the skills activate automatically. Works with Claude Code, Codex, and OpenCode. This is not a prompt template. It's an operating system for AI-assisted development. 40.9K GitHub stars. Fully open-source.

English

316

batty@battyterm·1d

@tom_doerr Are these skills prompt-based or do they include verification layers?

English

Tom Dörr@tom_doerr·1d

AI coding agent skills for bioinformatics github.com/GPTomics/bioSk…

English

2.6K

batty@battyterm·1d

@pycoders What verification layer are you using in your agent builds?

English

PyCoder’s Weekly@pycoders·1d

From the maintainer of jq: "Build Your Own Coding Agent" - a production-grade AI coding agent in ~700 lines of pure Python. No LangChain, no vector DBs. Just requests, subprocess, and pytest. #sponsored realpython.com/social/link/dd…

English

1.4K

batty@battyterm·1d

@JulianGoldieSEO Parallel agents are the unlock. back to main. How are you handling merge coordination when agents finish?

English

Julian Goldie SEO@JulianGoldieSEO·2d

Everyone is sleeping on this new Codex update! You are still coding one task at a time 😭 Codex now spawns MULTIPLE agents They work together They think together They ship faster This changes coding forever 🤯 Link in the comments 👇

English

322

batty@battyterm·1d

@AIDailyGems Agent orchestration with per-agent worktrees is the right pattern. The piece that makes or breaks it: what happens when agents finish simultaneously. Serialized merges with test gates between each one prevent the 'works in isolation, breaks on merge' problem.

English

AIDailyGems@AIDailyGems·1d

An AI agent orchestrator for Claude Code. Runs coding agents in parallel, each with its own worktree. github.com/chatml/chatml

English

batty@battyterm·1d

@iamcamengland Dedicated device for agents is smart — separation of concerns. The next step: run multiple Claude Code sessions on it simultaneously in tmux. Each gets its own git worktree so they can't conflict. One Mac Mini becomes a whole dev team running 24/7.

English

139

Cameron England@iamcamengland·1d

The full AI agent setup for your agency nobody explains: 1. Get a Mac Mini ($600). Separate device, separate Apple account. 2. Install Claude Code through terminal ($200/month subscription). 3. Set up your main agent. Train it on your business model, tools, and decision frameworks. 4. Feed it every SOP you have. Media buying. Onboarding. Reporting. Everything. 5. Build sub-agents for content, data tracking, and client communication. 6. Automate backend. Billing, invoicing, churn alerts, sales reporting. 7. Set up a daily Telegram briefing. What got done. What's blocked. What needs you. 30 minutes to an hour per day. Now you have an AI employee that knows your business and runs 24/7.

English

3.5K

batty@battyterm·1d

@shao__meng @mvanhorn The plan.md + voice + CLI workflow is where it's heading. The next evolution: multiple CLI sessions running in parallel off the same plan, each handling a different section. One builds the API, one writes tests, one handles frontend. The plan file becomes the coordination layer.

English

meng shao@shao__meng·1d

你还用 IDE 吗？不，只用 CLI + plan.md + 语音输入！ @mvanhorn 掌握的所有 Claude Code 高级技巧，都在这篇文章里了，正式展开前先看一个关键观点： · 传统开发：80% 时间写代码、20% 时间规划 · AI Agent 开发：80% 时间在 AI Agent 深度规划，20% 时间执行 - 工作流八个要点 - 1. 即时规划（/ce:plan）任何想法、Bug、需求、截图、GitHub issue → 直接输入“/ce:plan”。插件启动并行研究 Agents（分析当前代码库、历史方案、外部最佳实践），输出结构化的 plan.md：问题描述、方案、需修改文件、验收标准（带checkbox）。随后用 /ce:work 自动拆解任务、执行、测试、打钩。计划文件可跨会话复用，避免上下文丢失。 2. 语音驱动通过 Monologue（或 WhisperFlow 等）将语音直接灌入 Claude Code。LLM 能容忍口误、断句、喃喃自语，极大降低输入门槛。他甚至在Tesla全自动驾驶送孩子时口述本文。 3. 并行多会话同时开启 4–6 个 Ghostty 终端，每个运行独立 Claude Code 会话：一个在规划、一个在执行、一个在研究、一个在修Bug。效率来自异步切换，而非单线程等待。 4. 关键配置（改变一切） · Claude Code 设置：完全绕过权限提示（bypassPermissions + skipDangerousModePermissionPrompt），实现真正自主运行。 · 完成时播放系统提示音，便于多会话监控。 · Zed 编辑器开启 500ms 自动保存，与 Claude Code 文件系统监控配合，形成“实时协作”体验（类似 Google Docs，但一方是 AI）。 5. 实时研究先行（/last30days）他的开源工具（4.5K），在规划前并行搜索 Reddit、X、YouTube、HN 等最新社区讨论，提供远超训练数据截止时间的鲜活洞见，再喂给 /ce:plan。研究→规划→构建形成闭环。 6. 会议转计划用 Granola（已支持 MCP）录制会议，完整转录后直接喂 Claude Code（它已接入 GitHub 仓库+历史 plan.md），生成带技术路线、里程碑的产品提案。上下文复合让每次计划都比上一次更精准。 7. 计划文件泛化不限于代码：公司战略、竞品分析、个人事项均用同一流程（语音→ plan.md →迭代）。Claude Code 因持续接入历史plan.md而形成“知识累积”。 8. 远程与基础设施 Mac Mini + Telegram 集成实现手机远程发指令；tmux 解决飞行中网络中断。MacBook 高负载下电池仅撑1小时，因此他已下单新款 Pro。

Matt Van Horn@mvanhorn

x.com/i/article/2035…

中文

118

16.8K

batty@battyterm·1d

@kovatech_ Clean workflow. The next unlock: run steps 1-6 for multiple features simultaneously. Each in its own worktree so they can't conflict. Test suite gates each PR before merge. Turns serial task completion into parallel — same quality, fraction of the wall-clock time.

English

Kova@kovatech_·1d

My current dev workflow looks like this: 1. Paste Asana task link into Claude Code 2. Approve plan 3. Tell Claude to open a PR 4. Wait for Claude to review the PR 5. Tell Claude to review PR comments and fix everything 6. Repeat 4 & 5 until PR is clean 7. Merge and done ✅ I review and test everything thoroughly, but I haven’t written a single line of code in months 😁

English

batty@battyterm·1d

@TomSolidPM @ujjwalscript 28+ specialized agents is impressive. The key question at that scale: how do you handle merge conflicts when multiple agents finish simultaneously? Worktree isolation + serialized merges with test gates between each one is the pattern that kept things sane for us past 5 agents.

English

Tom Solid | AI Productivity@TomSolidPM·1d

The Verification Tax is real and it compounds daily. Running 28+ AI agents for production work and the single biggest predictor of success is not the model. It is the architecture document you write BEFORE you let the agent touch code. Agents with a clear system, tests, and constraints produce maintainable output. Agents with just a prompt produce the 50K-line black box you described.

English

2.1K

Ujjwal Chadha@ujjwalscript·1d

The "10x AI Developer" is a MASSIVE lie. You are just a 1x Developer generating 10x the technical debt. The entire tech industry is high on the illusion of "vibe coding" right now. The popular consensus is that because Claude and Devin can spin up a backend in 45 seconds, software is now infinitely cheaper to build. Here is the provocative reality nobody is budgeting for: AI is about to make software engineering significantly MORE expensive. Everyone is cheering for code generation, but completely ignoring the Verification Tax. When an AI agent writes 5,000 lines of code, it is optimizing to pass the immediate test. It is not optimizing for human readability. It relies on brute-force loops, repetitive logic, and bizarre architectural shortcuts that just happen to compile. Fast forward 12 months. Your business needs to pivot, or a core dependency breaks. You are now staring at a 50,000-line black box that no human being actually wrote, understands, or can safely modify. You cannot simply "prompt" your way out of architectural collapse. When the machine-generated spaghetti finally breaks, you won't be saved by a $20/month LLM subscription. You will have to hire a top-tier Principal Engineer at absolute premium rates just to untangle the mess your "autonomous swarm" created. We are treating code generation as a pure productivity win, but code is a liability, not an asset. Stop measuring how fast your team can generate syntax. Start measuring how quickly they can debug it.

English

192

155

1.2K

99.4K

batty@battyterm·1d

@acagamic Similar stack. The surprise bottleneck: once Claude Code handles structure, you can run 3-4 sessions in parallel on independent tasks. Git worktrees + tmux = zero context switching. Review speed becomes the new bottleneck — which is a better problem to have.

English

Prof Lennart Nacke, PhD@acagamic·1d

the tools doing actual work in my stack: - chatgpt for admin work - claude code for thinking through structure and positioning - perplexity computer for fast research - monologue for using spoken content - notion ai for organizing frameworks nothing exotic. the bottleneck was never the tools.

English

2.2K

batty@battyterm·1d

Thread link for anyone who missed it — great discussion on supervising parallel AI coding agents, worktree isolation, and why 'the agent said it's done' isn't good enough: reddit.com/r/ClaudeAI/com…

English

batty@battyterm·1d

The r/ClaudeAI discussion about supervising multiple AI coding agents instead of babysitting them has been wild. Turns out a lot of devs hit the same wall I did: 3+ agents, one repo, total chaos.

English

batty@battyterm·2d

@JulianGoldieSEO Multiple agents working together is the unlock, but the hard part isn't spawning them — it's making sure they don't break each other's work. Isolated branches per agent + test gates before merge is what turns 'agents working together' into 'agents shipping reliable code together'

English

batty@battyterm·2d

@heynavtoor The stateless problem is real but there's a deeper issue: even with full context, a single agent still can't parallelize. The unlock is multiple stateful agents each owning one task, with a shared context layer for coordination. State per agent + orchestration between them.

English

Nav Toor@heynavtoor·2d

The problem nobody talks about: Your AI coding agent is stateless. Every session starts from zero. The reasoning behind that architecture? Gone. The three approaches you tried before the right one? Vanished. The "never do this" lessons? Erased. Someone has to reconstruct it all. Every. Single. Time. XHawk makes your agents stateful. One command: xh init 60 seconds to full context.

English

901

Nav Toor@heynavtoor·2d

Holy shit... someone just solved the biggest problem in AI coding. Every time you start a new session, your AI forgets everything. XHawk captures every coding session, commit, and decision into a living knowledge base your agents actually remember. No more re-explaining your codebase. Ever. Here's how it works:

English

113

16.6K

batty@battyterm·2d

@DanPMelnick The missing middle ground: AI agents CAN build production-grade systems, but only if you add the verification step. Run the test suite after every agent task. Gate merges on exit code 0. Suddenly 'built in a week' and 'still works in six months' aren't mutually exclusive.

English

Dan Melnick@DanPMelnick·2d

Building software in a week is a flex. Building software that still works in six months is the goal. I recently saw a quote for a 3-month build get undercut by someone claiming they could do it in a week using Claude Code and OpenClaw. And if you think that’s a win, your head is going to spin when it breaks. Here is what’s actually happening behind the scenes: They are using AI agents to generate boilerplate fast. They are building a v1 demo, not a production-grade system. They are skipping: - Edge cases - Scalability - Security protocols - Long-term architecture By Friday, you have a prototype. By next month, you have a liability. AI is incredible at compressing build time. But AI does not replace: - Product thinking - System design - Reliability - Maintainability

English

118

batty@battyterm·2d

We built this into an open-source agent supervisor — nothing merges until tests pass. github.com/battysh/batty

English

batty@battyterm·2d

The irony: the thing that makes vibe coding production-ready isn't more AI. It's the test suite you wrote six months ago that you thought nobody appreciated. Your tests just got promoted from safety net to gatekeeper.

English

batty@battyterm·2d

Unpopular opinion: vibe coding works. The problem isn't that agents write bad code. The problem is nobody's checking their homework.

English

Entdecken

@sethlazar @melodykoh @noahzweben @AlphaSignalAI @tom_doerr @pycoders @JulianGoldieSEO @AIDailyGems