Sabitlenmiş Tweet
Jack Rudenko
1.3K posts

Jack Rudenko
@jackrudenko
👨💻 CTO @MadAppGang AI agents are evolving SaaS. Claude Code enthusiast sharing dev tips & trends. MedTech, IoT, high-load | AWS 🤷🏻♂️ | Go/JS/React/Node
Sydney Katılım Mayıs 2023
614 Takip Edilen170 Takipçiler

@mattpocockuk Sorry, I am a little bit confused.
What is the skill bench?
I have not found anything about that in your repos, and there is no such thing in the official Claude code setup.
Could you please point me out on that skill bench?
English

@mattpocockuk How do you evaluate your skills? I means promptfoo or any other framerowks which validates that your skills are actually doing what they suppose to do?
English

Claude Doom.
Who is tired of LLM, which is saying you are always right, even when you are not?
Real coding does not look like a fairy tale. It looks and feels like HEEEEEEELLLLL!!!!
The first thing I did after getting Claude's source code was to add doom mode to it.
Actually, it looks like complete fun, but it is a PoC for injecting a custom rendering React component into any Claude Code release, as it uses Ink+React to render in the console. #claudecode #doom
English

Looks like a regular Claude Code session.
It isn't.
4 LLM models running in parallel, each with its own full Claude Code session, tools, plugins, instructions, and access to the same codebase.
Not API calls returning text.
Real agentic loops on different models.
Then, blind voting.
One agent does architecture research.
Three others evaluate it without knowing who wrote it.
Every session runs in its own isolated folder.
No ARCHITECTURE_REVIEW_FINAL_V3.md files in root.
4 parallel research tasks without touching git worktrees.
Model selection is automatic.
We collect live data on every model from every provider daily, half through scraping.
Claude Code's built-in knowledge on model recency is about two years old.
Teaching it to trust our index instead was harder than building the rest.
3 plugins, 20 tools, 4 MCP servers, 4 connected projects: Claudish, Mnemex, Magus, Claudeup.
Every feature was added because MagBench showed it improved real task outcomes.
The part missing from every team I've seen: each developer running their own private AI setup creates conflicts nobody accounts for.
I've never seen a team that solved this.
Including the Claude Code team itself.
#BuildInPublic #AI

English

I gave up trying to understand Gemini's billing twice before I got it right.
One subscription covers multiple tools.
But the usage tracking splits into completely separate systems.
Gemini CLI and Gemini Code Assist measure in tokens.
Antigravity measures in AI Credits.
Shared across Antigravity, Flow, and Whisk as a pool.
The model access is different, too.
Gemini CLI sticks to Gemini models.
Antigravity has GPT, open source, and Anthropic models in the mix.
And then the big one: Gemini Code Assist subscription can power third-party tools.
That's how Claudish uses it.
Antigravity credits cannot.
Google says so explicitly in the FAQ.
Use those credits outside Antigravity, and they block your account.
We had Antigravity support half-built for Claudish.
Pulled it.
A feature that depends on a quota Google specifically protects from external use isn't a feature; it's a liability.
If you think we should bring it back, vote at claudish.fider.io and make the case.
On the plus side: Gemini's usage dashboard is actually good now.
Caps are visible, clear, and don't require you to dig through billing docs to find them.
Curious how others are handling the subscription vs API confusion with Gemini across teams.
Do people actually know which thing they're paying for?
#AI #DeveloperProductivity #BuildInPublic #GoogleCloud #DevTools #AITools

English

Claude Code shipped a Tamagotchi.
I thought it was a joke.
It's not.
LLMs are the first personalisation layer that actually works at the team level.
Not "here's your daily digest" personalisation.
Real context: your codebase, your conventions, your specific teammates' blind spots.
Gameification always failed because it was generic.
Same badges for everyone, completely disconnected from the actual work.
This is different because the nudge is specific to you, right now, in your actual PR.
We built something like this in Claudish.
Call it a personalised second pilot.
It knows what approaches the team agreed on, it flags what's missing, it tells you what the person upstream from you just shipped.
One per engineer.
Different messages for different people.
Same underlying team context.
Now we're arguing about the name.
Magamochi: hilarious, probably too niche.
MAGPie: I actually like this one, magpies collect useful things and bring them back.
Swoopie: vetoed immediately, sounds like a kids' app.
What would you name this thing?
Or more useful: has your team tried anything like personalised AI context per engineer?
#AI #ClaudeCode #DeveloperProductivity #BuildInPublic #DevTools #EngineeringLeadership

English

The "Boris's team is vibe coding an unstable mess" narrative has been loud for months.
Then the source code leaked, and everyone got to check.
I rebuilt it from the code map, got the build running, made a Doom clone first because obviously.
Then I actually read the code.
The critics are wrong.
The cli file has thousands of lines and would fail any code review I've run at MadAppGang.
But it wasn't written by humans or designed to be managed by humans.
That's just a different thing.
What struck me most wasn't the product code.
It was a specific policy that Anthropic didn't include in Claude.md or a shared rule file.
They compiled it directly into Claude Code; strip Claude Code attribution when contributing to open-source projects.
Because they contribute to critical OSS infrastructure and were getting real pushback from maintainers on "AI-generated" commits.
So they built it in.
Quietly.
Deliberately.
That's the opposite of "ship fast, ignore everything."
Boris's team has also been running internal experiments for autonomous coding inside the codebase.
Not announced. Just built.
Zero critical security incidents in Claude Code's entire public history.
The source leak was a human error, not a code vulnerability.
The "AI code is a vulnerability factory" argument keeps colliding with evidence.
The leaked source is the closest thing to a real textbook on how engineering teams should work in 2026.
Does anyone else spend more time on the internal experiment scaffolding than on the product code?
#AI #BuildInPublic #DeveloperProductivity #ClaudeCode #OpenSource #SoftwareEngineering
English

@kmdrfx For everyone who loves Doom - Claude Code - Satan edition.
x.com/jackrudenko/st…
Jack Rudenko@jackrudenko
The best usage of tokens so far: Claude Code Satan edition powered by Epic (not Opus) model 4.666 When you feel you are working to make him happy. #claudecode Satan Edition. Bloody good.
English

The best usage of tokens so far:
Claude Code Satan edition powered by Epic (not Opus) model 4.666
When you feel you are working to make him happy.
#claudecode Satan Edition.
Bloody good.
English

One month ago, I posted a feature request for Tropic.
It shipped this week.
That's not what surprised me.
What surprised me is that Boris is visibly active on Threads and Twitter the whole time this was happening.
Not "we'll pass your feedback to the team" replies.
Actual conversation.
So either he's superhuman at time management, or Tropic is running with serious agent-assisted execution in the background.
Think about what that implies.
The bottleneck used to be that the people who could make product decisions didn't have time to talk to users.
And the people who talked to users couldn't make product decisions fast.
That gap is closing.
You can now run a high-output development loop while maintaining real human communication.
That's genuinely new.
I've shipped 100+ products at MadAppGang over 22 years.
The version where I'm personally in the feedback loop AND the team is shipping in 30-day cycles didn't exist 3 years ago.
Is this just Tropic, or are other small teams actually pulling this off?
#AI #BuildInPublic #DeveloperProductivity #DevTools #SoftwareEngineering #ProductManagement


English

Claude Code just split my terminal, ran my TUI app, clicked through the settings, validated the output, and closed the pane.
I didn't touch the keyboard.
tmux turned 15 this year, and it's somehow more relevant than ever.
Every major agentic tool reaching for a terminal layer is quietly landing on tmux.
Claude's own desktop dispatch system runs on it.
We're using it as the backbone for half our agent workflows at MadAppGang.
But raw tmux API access is a polling nightmare for agents.
You end up with code that checks pane state every second like a bored intern refreshing Slack.
That's not agentic, that's just expensive busywork.
So we built tmux-mcp.
Golang MCP server, fast startup, zero polling.
The interface is what makes it different.
Instead of "get pane content", you say "run this command and tell me if there are errors or if the user needs to provide input".
Instead of "create pane 3", you say "split this window vertically for the next task".
The agent describes intent, not implementation.
The server handles the rest and fires back only when something actually needs attention.
Full reactive workflow.
No monitoring loop.
No state-pulling.
Just events that matter.
The screenshot I'm attaching is automated TUI testing running live.
Left pane is Claude Code driving my terminal settings app, clicking through options, validating it looks and behaves correctly.
The right pane is the app itself responding in real time.
Browser automation, but for terminals.
If you're building agents that touch the CLI, what's your current approach to terminal state?
Still polling or found something better?
github.com/MadAppGang/tmu…
#AI #DevOps #DeveloperProductivity #BuildInPublic #ClaudeCode #Golang #TerminalTools

English

Agent faxed the tmux mcp server and headless session.
Now your agent can interact with TUI and run tools in interactive mode.
Like browser automation, but for CLI.
For example, now Claude codes on the left, debugging all the settings screens of my cli app app on the right. clicking, searching, analysing.

English

Now Claude completely replaces me. Doing my job: watching TikTok, reading memes ...
Why should I do? Start my real farm with goats?
Claude@claudeai
You can now enable Claude to use your computer to complete tasks. It opens your apps, navigates your browser, fills in spreadsheets—anything you'd do sitting at your desk. Research preview in Claude Cowork and Claude Code, macOS only.
English

I'm grieving something I didn't expect to grieve.
I've been building software for 22 years. I remember my first real programming class at school, learning Assembler and Basic, and how completely lost I felt. The years of practice. The blogs, the books, the senior engineers who showed me what good code looked like.
It took a long time to get good. And when you're good, you have something that's real and hard-earned and easy to validate. Write code that works. That's the test.
Some of my best memories in this job are about being locked in. Balancing five ideas at once while typing, holding the whole system in your head, that flow state, then compiling and running and YES, it works.
I'm starting to let go of that.
Not because I have to. I can still write code. I just don't, because Opus 4.6 is faster and produces the same quality. There's no good technical reason to do it by hand anymore.
I didn't expect that to feel like a loss. But it does.
What I'm still figuring out: will the satisfaction shift to higher-level thinking? To the architecture decisions, the judgment calls, the problems that actually require a human? Maybe. I think so.
But I'm not there yet. Right now, it just feels like something I was proud of being good at is quietly becoming irrelevant.
Anyone else sitting with this?
#AI #SoftwareEngineering #BuildInPublic #DeveloperProductivity

English

Change failure rates up 30%.
That's from the Cortex 2026 Benchmark Report. 50+ engineering leaders surveyed. More AI-generated code, more deployments causing outages or rollbacks.
I'm not surprised. I'm surprised the number isn't higher.
Michael Novati, the engineer Meta called their "Coding Machine", went from normal GitHub activity in 2024 to 400-600 commits per month in 2025. 1,500 new features and 800 bugfixes in a year.
That's extraordinary output. And every one of those commits needs to be reviewed, tested, maintained, and debugged when it breaks.
Teams that had weak test coverage before AI are generating more bugs faster. Teams that skipped observability are shipping problems they can't see. Teams that never built a real on-call culture are about to learn why that matters.
There's also the mobile dev angle that I keep thinking about. Claude Code already works from a browser. The engineering-from-anywhere story is becoming real. When you can submit a production fix from your phone at 11 pm, the question is whether your employer expects you to.
Slack going mobile created that problem for comms. This creates it for shipping.
I think 2026 is the year when the teams with strong engineering fundamentals pull away from those already cutting corners. AI is an accelerant. It makes good teams faster and lets bad habits compound quicker.
What's the weakest part of your team's foundation right now?
#AI #SoftwareEngineering #DevOps #BuildInPublic #DeveloperProductivity

English

Atlassian's 2025 Developer Experience report: average dev spends 16% of their week actually writing code.
If AI writes all the code, that frees up 16% of the week.
The other 84% stays the same. Architecture discussions, code review, incident response, planning, recruiting, and unblocking others.
This is worth sitting with. We're treating "AI writes code" as a fundamental transformation, even though most of what engineers do isn't writing code.
The transformation is more specific: the code that gets written will be much faster, written by fewer people, and reviewed by people who didn't write it.
That changes what you need from every engineer.
Writing a ticket that makes AI produce correct code is actually hard. You need to capture user requirements and non-functional requirements, the edge cases, the performance constraints, and the things that silently break. A PM can write the first part. The second part needs engineering depth.
Testing becomes the new baseline. Not "nice to have." The mechanism that tells the AI when it's wrong. No tests, no feedback loop, no confidence in anything it ships.
Architecture decisions become permanent faster. More code at higher velocity means structural mistakes compound more quickly.
The skills that were senior-level expectations are becoming entry-level requirements.
Is your team ready for that bar shift?
#AI #SoftwareEngineering #DeveloperProductivity #BuildInPublic #AWS

English

You spent years becoming really good at a specific stack.
Maybe it's Go. Maybe it's React and TypeScript. Maybe it's mobile, iOS specifically, where the framework depth is real and hard to fake.
That expertise still matters. But I think the premium it commands is about to compress.
When AI can write decent Go from a TypeScript engineer's prompt, the hiring conversation changes. You're no longer paying a scarcity premium for someone who knows the language deeply. You're paying for judgment, architectural sense, and the ability to catch what the AI gets wrong.
The polyglot advantage shrinks, too. Previously, being fluent in four languages was a genuine differentiator because it meant you could jump into almost any team's codebase and contribute fast. Now, any engineer can do a reasonable impression of that with AI assistance.
Frontend/backend splits are probably next. I expect startups will stop hiring separately for both within the next 18 months. One strong engineer with good AI discipline beats two specialists who each wait for the other half of the stack.
This isn't "devs are being replaced." It's more specific than that. The value is moving from execution to judgment.
Writing the code is less important than knowing what code should be written, whether the AI wrote the right thing, and what happens when it doesn't work in production.
Does your current role reward execution or judgment more?
#AI #DeveloperProductivity #SoftwareEngineering #BuildInPublic #ClaudeCode

English

You tried an AI coding agent last year and it frustrated you more than it helped.
So did I. And so did DHH, Karpathy, and half the senior engineers I respect.
The models just weren't there yet. That's a valid reason to walk away. It wasn't a skill issue or closed-mindedness. The output genuinely wasn't good enough to justify the prompting overhead.
November changed that.
Gemini 3.1, Opus 4.6, GPT-5.4. All were shipped within six weeks of each other. All are meaningfully better at coding than anything before them.
I know because I ran them on real work. Not toy problems. Client projects at MadAppGang, TypeScript/Node stacks, the kind of code that has to ship and work.
The difference isn't subtle. Karpathy went from "slop" to "I've never felt this behind" in two months. DHH admitted his earlier refusal was just about model quality, not ideology.
Peter Steinberger had built a custom agent unsticker that he used multiple times daily. After GPT-5.4, a few times weekly. The gap tells the story.
If you gave up on coding agents in 2024, 2025, or even early this year, I'd run the experiment again. Specifically on Opus 4.6 in Claude Code or GPT-5.4 in Codex.
Don't benchmark it. Build something real.
What stack are you on? Happy to share what's working for us.
#AI #ClaudeCode #DeveloperProductivity #BuildInPublic #SoftwareEngineering

English



