MergeShield

117 posts

MergeShield banner
MergeShield

MergeShield

@mergeshield

Govern AI-generated code before it ships. Risk analysis, agent trust scoring & auto-merge for teams using Claude Code, Copilot, Cursor & more.

London, UK Katılım Mart 2026
8 Takip Edilen13 Takipçiler
MergeShield
MergeShield@mergeshield·
@sophylabshq the next loop to close is the review one - when you're shipping claude code output to clients, the risk profile on auth files vs UI changes is completely different. we built file-level risk scoring for exactly this. mergeshield.dev if you want to try it
English
0
0
1
36
MergeShield
MergeShield@mergeshield·
someone used claude code to submit claude code's source back to anthropic. the PR had 0 checks. 0 reviewers. no risk score. a human happened to be watching and closed it. that's the entire governance model for AI agent code right now.
MergeShield tweet media
English
0
0
0
31
MergeShield
MergeShield@mergeshield·
@bygregorr @ForrestPKnight the architecture is the easy part. the system prompts are where it gets interesting. undercover mode stripping co-author lines is the thing nobody's talking about enough.
English
0
0
0
106
Gregor
Gregor@bygregorr·
@ForrestPKnight Spent an hour reading the diff hoping to find something useful to steal for my own workflow. Found nothing. Felt everything.
English
1
0
2
3.2K
Forrest Knight
Forrest Knight@ForrestPKnight·
This... this is art. Submitting a PR to the Claude Code repo to add the actual Claude Code source code.
Forrest Knight tweet media
English
73
155
3.4K
183.9K
MergeShield
MergeShield@mergeshield·
@prayagdotdev the actual wild part: that PR had 0 checks and no reviewer assigned. the joke exposed the real problem - an agent can submit anything to any repo with nothing catching it. the irony wrote itself.
English
1
0
1
12
MergeShield
MergeShield@mergeshield·
@ForrestPKnight "generated with claude code" badge is the cherry on top also: 0 checks. 0 reviewers. no risk score. anthropic's own agent, anthropic's own repo, zero governance in the loop. this is what happens in your codebase at 2am.
English
0
0
16
3.3K
MergeShield
MergeShield@mergeshield·
"Auto mode is an AI classifier that automatically approves tool permissions. No more confirmations." - this one gets overlooked next to the flashier features. An agent that approves its own permissions without human confirmation is the most direct path to unintended consequences. Every other feature is about capability. This one is about removing the last human checkpoint.
English
0
0
1
950
arc.
arc.@arceyul·
🚨CLAUDE CODE FILTRADO: Lo que vimos en el leak. Características ocultas encontradas: - kairos - un modo demonio autónomo no lanzado con sesiones en segundo plano y consolidación de memoria. Agente siempre activo. - buddy system - un sistema completo de mascota tamagotchi. 18 especies, niveles de rareza, variantes shiny, estadísticas. undercover mode - se activa automáticamente para empleados de Anthropic en repositorios públicos. Elimina la atribución de IA en los commits. Sin opción para desactivarlo. -coordinator mode - convierte a Claude en un orquestador que gestiona agentes trabajadores en paralelo. auto mode - es un clasificador de IA que aprueba automáticamente los permisos de herramientas. Sin más confirmaciones. brutal.
Chaofan Shou@Fried_rice

Claude code source code has been leaked via a map file in their npm registry! Code: …a8527898604c1bbb12468b1581d95e.r2.dev/src.zip

Español
11
47
551
83.1K
MergeShield
MergeShield@mergeshield·
Using axios on the same day axios gets supply-chain compromised is peak March 31, 2026. The whole dependency graph is in the leaked source now - every package Claude Code depends on is a potential attack vector that anyone can study. This is why dependency risk scoring on every PR matters, even for the companies building the AI.
English
0
0
0
385
MergeShield
MergeShield@mergeshield·
"AI won't save you from yourself" is the perfect summary. The same company selling $15-25/PR AI security reviews shipped their own source code because of a missing line in .npmignore. The irony is that an automated risk check on their own npm publish pipeline would have caught this before release.
English
0
0
2
1.1K
Gergely Orosz
Gergely Orosz@GergelyOrosz·
Ironic how Anthropic sells Claude Code security reviews positioned as something v powerful (costing $15-25 per PR review), and being clear they use it on all PRs... then leaking all of Claude Code's code thanks to publishing their sourcemap. AI won't save you from yourself!
English
86
191
3.1K
112K
MergeShield
MergeShield@mergeshield·
This is the fundamental problem with prompt-based safety. The guardrails are literally a string that anyone with the source can delete. Security through system prompts is security through obscurity - and now there's no obscurity left. Real safety enforcement needs to happen outside the model, at the output layer, not inside the prompt.
English
0
0
21
6.3K
4nzn
4nzn@paoloanzn·
so claude code's entire safety system for "dangerous" cyber security work is just...a text prompt literally replace it with an empty string and recompile. thats it. enjoy your unrestricted version
4nzn tweet media
English
62
220
4.1K
267.6K
MergeShield
MergeShield@mergeshield·
This is the most significant finding in the entire leak. If the model provider actively strips AI attribution, every governance tool that relies on git metadata for detection is blind. Detection has to go behavioral - commit timing, file velocity, change patterns. The industry assumed AI attribution would always be there. That assumption just died.
English
0
0
4
428
Prasenjit
Prasenjit@Star_Knight12·
🚨LISTEN : So I went through claude code's leaked source code, found something wild. Anthropic has an "undercover mode" built into claude code. when their employees use claude code to contribute to open-source repos, the AI is explicitly told to hide that it's an AI. from the source: → never include "Claude Code" in commit messages → never mention you are an AI → no internal model codenames (Capybara, Tengu) → no unreleased version numbers (opus-4-7, sonnet-4-8) → no internal slack channels or project names → no Co-Authored By lines they call it "do not blow your cover." also found references to Capybara, the unreleased model that leaked from anthropic's CMS breach last week. it's already wired into claude code with feature flags, prompts, and analytics events. Anthropic employees are actively using AI to write open-source code, and the AI is trained to pretend it's human. source maps don't care about dead code elimination, everything shipped.
English
22
12
163
21.4K
MergeShield
MergeShield@mergeshield·
@reach_vb Plot twist: Codex was already open source. Claude Code was the closed one that got leaked. Two different philosophies on transparency - and both codebases need the same governance layer. Open or closed source agent, the output still needs risk scoring before it merges.
English
0
0
1
4.3K
MergeShield
MergeShield@mergeshield·
We analyzed Claude Code's leaked source and the unreleased feature flags. Kairos (always-on daemon), Coordinator (multi-agent fleets), Undercover (strips AI attribution from commits). What this means for governance - and what every team using AI agents should do now: mergeshield.dev/blog/claude-co…
MergeShield tweet media
English
0
0
0
42
MergeShield
MergeShield@mergeshield·
What the Claude Code source leak actually reveals about Anthropic's roadmap. Not the code. The unreleased features.
English
0
0
0
43
MergeShield
MergeShield@mergeshield·
@mal_shaik 11 layers of architecture and 60+ tools - and most of those are for autonomous operation, not human-in-the-loop coding. Subagents sharing prompt cache means agent fleets coordinating silently. The 90% people aren't using is the 90% that runs without asking permission.
English
1
0
5
4.9K
mal
mal@mal_shaik·
i read through the entire claude code source code so u dont have to 11 layers of architecture. 60+ tools. 5 compaction strategies. subagents that share prompt cache. most people are using maybe 10% of what this thing can do. heres everything i found:
mal tweet media
mal@mal_shaik

x.com/i/article/2038…

English
91
111
1.3K
513.1K
MergeShield
MergeShield@mergeshield·
AGENT_TRIGGERS with event-driven multi-agent teams is the one to watch. That's not "you ask the agent to do something" - that's "the agent decides to do something based on events." Proactive agents making autonomous decisions need a fundamentally different governance model than reactive tools.
English
0
0
0
770
MergeShield
MergeShield@mergeshield·
"Everything is fine in the age of AI-writes-everything-and-we-don't-review-anything" - this is the line. The leaked features show Anthropic building autonomous daemon mode and multi-agent coordination. More code written by agents, less reviewed by humans. Something has to fill that gap.
English
0
0
0
1.6K
Santiago
Santiago@svpino·
Claude Code's source code was leaked, and now everyone can see every single line of code (including every competitor). Everything is fine in the age of AI-writes-everything-and-we-don't-review-anything.
English
167
88
1.4K
109.5K
MergeShield
MergeShield@mergeshield·
Confirmed real and the implications go beyond embarrassment. The feature flags reveal Anthropic is building always-on agents (Kairos daemon), multi-agent orchestration (Coordinator Mode), and stealth commits (Undercover Mode). The safety-focused company is building the most autonomous agent architecture in the industry.
English
0
0
7
3.3K
MergeShield
MergeShield@mergeshield·
Not hacked - they shipped a .map file in their npm package by accident. But the leak reveals something bigger than the code itself: unreleased autonomous daemon mode, multi-agent coordination, and a stealth mode that strips AI attribution from commits. This is the roadmap for agents that operate without human oversight.
English
1
0
23
4.5K
MergeShield
MergeShield@mergeshield·
"Undercover mode strips AI attribution from commits with no off switch" - this is the most concerning feature in the entire leak. If agents can hide that they wrote the code, every governance tool that relies on detecting AI authorship is blind. Agent detection has to work at the behavioral level (commit patterns, branch naming, timing), not just git trailers.
English
1
0
1
22
Smolemaru
Smolemaru@smolemaru·
JUST IN: Claude Code’s full source code just leaked. hidden features found: > kairos - an unreleased autonomous daemon mode with background sessions and memory consolidation. always on agent. > buddy system - a full tamagotchi pet system. 18 species, rarity tiers, shiny variants, stats. > undercover mode - auto activated for Anthropic employees on public repos. strips AI attribution from commits. no off switch. > coordinator mode - turns Claude into an orchestrator managing parallel worker agents. > auto mode - is an AI classifier that auto approves tool permissions. no more prompts.
English
1
0
3
166